Challenges in Data Sharing in Biomedical Research

This blog post is the second in a four-part series called ‘Current Trends in Open Data’. In our last post, we discussed the importance of data management planning. Upcoming posts will focus on data repositories and the effects of the COVID-19 pandemic on open data.

Biomedical research is not exempt from the big data revolution. Next-generation sequencing technologies produce vast amounts of biological data and are now as commonly used as wet-lab techniques. This has helped accelerate the process of drug discovery, among other applications. More importantly, data obtained from high-throughput technologies can potentially serve as a resource for the scientific community as a whole, if properly stored and shared.

Several open data repositories host various types of omics data such as the Gene Expression Omnibus (GEO), Metabolights, etc., and can be freely accessed. Yet, the practice of data sharing has not been readily adopted by researchers. What are the main barriers that hinder data sharing?

Barrier #1: Researcher Attitudes

What if my ideas are stolen? What if my data is used and I don’t get credit? These are some common questions on the minds of researchers that make them wary of sharing their data in public repositories. Their concerns are fair. A lot of time and effort goes into conducting scientific experiments and credit should be given where it is due.

Possible Solutions

An easy solution would be to incentivize data sharing. Similar to citations for articles/publications, journals could provide citation metrics for datasets as well. However, this would barely scratch the surface. To ingrain data sharing as a part of the research culture, policy-level changes would have to be implemented. Major stakeholders such as funding agencies and universities should require data management plans to be put in place and followed. Journals should mandate the submission of datasets along with the research manuscript. Conducting outreach activities and awareness programs at labs would help spread the word about the importance of data sharing in an increasingly data-intensive research environment.

Barrier #2: Patient Confidentiality and Privacy Concerns

This specifically concerns research carried out in clinical settings. Human subject data may contain sensitive health information. The Health Insurance Portability and Accountability Act (HIPAA) provides protections to conceal patient identity and ensure privacy. However, sharing human patient data in public sources could still raise doubts over possible re-identification. Patient consent would also be required to share this type of data and may not be granted in all cases.

Possible Solutions

De-identifying or anonymizing clinical data could help prevent patient re-identification. The trust model of data sharing was devised to disseminate clinical data for research purposes and makes use of a person’s electronic health records. The intent is to protect patient privacy by eliminating information that could be traced back to the patient and maximize on the content that can be useful to researchers.

Barrier #3 Lack of infrastructure

Next-generation sequencing technologies such as whole genome sequencing produce hundreds of gigabytes of data. Datasets of this size have to be curated and annotated with metadata in a standardized format to ensure that they can be of future use. Numerous open access repositories allow storage and sharing of datasets as previously mentioned. However, there are no quality control checks or standardization protocols in place currently.

Possible Solutions:

Emphasis on data harmonization and machine-actionability should be promoted. The FAIR framework was drawn up keeping in mind the pressing need for improved infrastructure to facilitate data reuse. Data archives and repositories should promote the implementation of the FAIR principles to streamline the process of data discoverability.

References:

‍

‍

Blog Categories

Data Analysis and Management

Data Quality & Compliance

Industry Features

Product & Engineering

Data Science & Machine Learning

Company & Culture

FAIR Data

Others

Thank you for reaching out!

Our team will get in touch with you over email within next 24-48hrs.

Oops! Something went wrong while submitting the form.

Other Resources

Case Studies Dataset Roundup Documentation Glossary Solution Briefs Webinars Whitepapers

Upcoming Webinar - AlphaGenome Unpacked: Promise, Progress, and What Comes Next for AI in Genomics

Join us

[Upcoming Webinar] Scaling High-Quality Data Processing: Achieve 4x Cost Reduction for Foundation ModelsRegister Now->

Reserve Your Seat

Challenges in Data Sharing in Biomedical Research

Barrier #1: Researcher Attitudes

Possible Solutions

Barrier #2: Patient Confidentiality and Privacy Concerns

Possible Solutions

Barrier #3 Lack of infrastructure

Possible Solutions:

References:

Blog Categories

Talk to our Data Expert

Other Resources

Related Blogs

How to Choose the Right Data Analytics Platform for Biopharma Research

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Clinical Trials Data: Best Practices for Effective Analysis and Integration

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Blog Categories

Get the latest news, industry insights, and updates delivered directly to your inbox.

Latest Blogs

How to Choose the Right Data Analytics Platform for Biopharma Research

How to Choose the Right Data Analytics Platform for Biopharma Research

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Clinical Trials Data: Best Practices for Effective Analysis and Integration

Clinical Trials Data: Best Practices for Effective Analysis and Integration

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Trending Blogs

How to Choose the Right Data Analytics Platform for Biopharma Research

Clinical Trials Data: Best Practices for Effective Analysis and Integration

Scaling Data Pipelines for High-throughput Bioinformatics

Decoding Complexities: The Critical Role of Deconvolution in Spatial Transcriptomics

Challenges with Diagnostics Data Processing Pipelines

info@elucidata.io

info@elucidata.io

info@elucidata.io