FAIR Data

NIH Data Management & Sharing Policy 2023

Deepthi Das
January 23, 2023

Life science research is becoming increasingly digitalized and is evolving into a quantitative scientific discipline. A key driver in this process is the pervasiveness of high-throughput technological platforms, which generate millions of data points on genes, gene expression, proteins, and various other biological entities across cells, tissues, organs, and organisms. The data must be collated, curated, stored, and integrated to perform meaningful analysis.

In its effort to keep pace with the evolving data-driven life sciences landscape, NIH has issued and is planning to implement the data management and sharing policy to optimize data reuse. This blog breaks down the policy for academia and discusses what it means to the scientific community as a whole.

The 2023 NIH Data Management & Sharing (DMS) Policy

Originally issued in 2020, the NIH Data Management & Sharing Policy (DMS) is all set to be implemented starting Jan 25th, 2023. With a focus on the importance of good data management practices, the policy also establishes expectations for maximizing the appropriate sharing of scientific data generated from NIH-funded or conducted research.

“Our goal is to lead a cultural shift that makes data sharing the norm. The degree of that shift, for some, may vary.  For example, many data sharing policies are already in place and researchers currently sharing data will likely not need to significantly alter their approach. But prospective planning for how to share data (i.e., developing plans, requesting NIH funds) may be new for some.”
~Dr. Lyric Jorgenson
Acting Associate Director for Science Policy & the Acting Director of the OSP at the NIH

The policy emphasizes the fact that sharing scientific data accelerates biomedical research by facilitating the validation of research results, providing accessibility to high-value datasets, and promoting data reuse for future research studies. It also lays equal importance on the data management plans that the researchers will follow while they execute the project and makes it mandatory to submit the plan and budget for it while they apply for NIH grants. The policy is binding to all NIH-supported research regardless of funding level, including extramural grants, extramural contracts, intramural research projects, and other funding agreements.

How Is the 2023 Policy Different from the Previous (2003) Data-Sharing Policy

The 2003 data-sharing policy made it mandatory for researchers to include a data-sharing plan in the project proposal and to release and share the final research data generated in the project. However, in the course of the next decade, the innovations in the biotech and computational areas popularized various high throughput technologies, and the data generated increased manifold.

NIH recognized that researchers’ ability to generate, store, share, and combine data has never been greater and that there is an important need to ensure that its data-sharing policy efforts must evolve to keep pace with scientific and technological advancements. In the 2003 policy, the mandate was that the data generated had to be shared. As the data generated per project is significantly more, NIH emphasizes having a well-laid-out data management process which is critical to make the data generated reusable.

Furthermore, it mandates the allocation of a specific budget for data management which can include reasonable, allowable costs associated with the following:

  1. Curating data and developing supporting documentation
  2. Preserving and sharing data through established repositories- data deposit fees and charges necessary for making data available and accessible.
  3. Local data management considerations

This is significantly different from the 2003 policy, where there was a mandate to share the data generated without any mention of the cost of doing so. The 2023 policy empowers researchers to employ software, apps, or tech support to carry out end-to-end data processing.  

Effect of DMS Policy on the Scientific Community

The policy encourages researchers to think about managing their data (negative and positive results, different data points generated) deliberately rather than just depositing the data they used for a publication. This has various consequences.

1. There Will Be More Data to Work With.

The DMS policy encourages investigators to use established repositories, selected based on factors such as the data's sensitivity, the dataset's size and complexity, and the anticipated volume of requests, for sharing data. More such policies are being introduced in tandem, like the OSTP, which mandates publications and their supporting data resulting from federally-funded research to be publicly accessible without an embargo on their free and public release by Dec 31st, 2025. This means there will be a lot more scientific data freely available for researchers to work on.

2. Data Management Is No Longer an Afterthought.

Data management costs were not part of the direct cost in a project budget. This policy changes that aspect for NIH grants. For NIH-funded projects, the data management plans have to be worked out well before the grant is approved and not as an afterthought once the data is generated. This mandates researchers to anticipate the data storage needs, the data formats of the repository they need to use, etc. This saves time, imparts more clarity to the whole research process, and ensures that the data generated is not lost in silos.

3. More Data Silos or Non-Reusable Data?

The Policy encourages researchers to make more data publicly available but does not have a clear mandate or guidebook on the tools, techniques, and formats used. They do talk about several aspects of data management, such as-

-FAIR Principles
-Length of Time to Maintain Data
-Metadata and Other Associated Documentation
-Naming Conventions
-Common Data Elements
-Data Storage Format
-Data Security

Most of these points are vaguely spelled out and, therefore, could be actioned upon differently.


For example, the point naming convention states that  “Within a project team, agreement on naming conventions for multiple objects or files—or multiple versions of files—could be useful before embarking on a project that generates large amounts of data that need names or unique identifiers.”
Not following a standard ontology or naming convention could create non-reusable data and conflict with implementing FAIR data principles.

Though this policy is a welcome step in the right direction, it seems to be an oversimplification of the nuances of biological data management.

It has called out data management as a part of the direct cost of the project.

This means that this amount has been earmarked for the data curation process. This ensures that the research data is curated to a certain extent and is out in the public domain. However, an ideal scenario would have been one where researchers need to publish all data (even the negative results) on a public platform following stringent data standards. This would have improved data reuse. The policy needs to be revisited and assessed on how it performs in a few years.  

After-Thoughts and Value Addition

To comply with the DMS policy, researchers must formulate a data management plan, allocate a budget for it, prepare and distribute their data and make it available to the public. Being experts in biological big data management, there are two aspects to which we at Elucidata can add value.

  • We have worked closely with top-tier universities looking for their data to be widely reused. The DMS policy requirements will further incentivize researchers to share their data publicly. We could advise scientists on maintaining the right data standards since data curation and management is now a vast subject area on its own.
  • We have a data-centric ML Ops platform- Polly, that hosts biological data and has strong expertise in building consistent data pipelines, data curation, and data warehousing. Researchers can connect with us directly to understand more about how we could support their team in the 2023 DMS policy compliance.

Apart from the expertise in manual and automated curation, our platform- Polly, also hosts the world’s largest collection of highly curated, ML-ready single cell and bulk RNA seq data sourced majorly from GEO. Do talk to us if you want to explore collaboration possibilities.

Blog Categories

Blog Categories

Request Demo