The human body is a microcosm of diverse microbial communities, forming the microbiome, and no two individuals share the same microbiome. The role of the microbiome in regulating cellular function and disease has been known and extensively studied for a long time.
With the rapid development of accurate and sensitive high-throughput sequencing technology and assays, it has become easier to obtain whole-genome sequencing (WGS) data from microorganisms, resulting in dissecting the intricate structure, function, types, and location of microbial clusters in the human body. The data obtained are essential to accelerate fundamental discoveries that lead to transformative drug discovery and development solutions. Furthermore, the resulting taxonomic and functional analyses reveal a far richer connection between the human body and the microbial communities than previously realized.
Microbiome research has been conducted using a wide range of methodology – from community surveys to analyzing microbiome genes, RNA, proteins, and metabolites. These analyses techniques are referred to as metagenomics, metatranscriptomics, metaproteomics, and metabolomics, respectively, and have been instrumental in elucidating the importance of the human microbiome in the drug discovery and developmental field.
Microbiome Data for Drug Discovery and Development Process
Recently, the increase in research efforts in the field of the human microbiome has shed light on the importance of the microbiome in the drug discovery process. The following mentioned are a few microbiome research areas that aid drug discovery and development -:
- A deeper knowledge of the interactions between a drug and the gut microbiome can impact how new drugs are developed and prescribed – once a metabolite is detected, scientists in drug development can start looking into the mechanism by which it is formed and if it is a cause for concern based on its abundance.
- The gut microbiome is becoming increasingly relevant for extended-release formulations and less soluble drugs that reach the large intestine.
- Studies are also being done on marketed drugs, and their interactions with the human gut microbiota show that many of the medicines have antibiotic effects, even though they are not sold as such.
- Researchers are exploring multi-omics approaches that include a combination of untargeted community metagenomics, transcriptomics, metabolomics, proteomics, and immune signatures (human microbiomes) to mine data for biomarkers that may help drive new therapeutics for new indications.
- Access to high-quality rich datasets can cut costs from the drug discovery processes by using genetic data to demonstrate the relevancy of drug targets.
- Deep molecular profiling (includes metabolome and proteome data) coupled with physiological measurements to skip the process of animal testing and move straight to human trials in some cases.
Challenges Related to Microbiome Data
Findability and reusability of good quality data are an integral part of any drug discovery R&D effort. The ability to start drug discovery with in-depth information is beneficial for finding microbiome-related therapies; hence, researchers need access to this high-quality, relevant data to make advancements in drug discovery and development processes.
One such area from which microbiome data is being collected is the dietary and microbiome data from individuals. These data are used to derive models of how diet affects the composition of the microbiome and then validate them with controlled dietary interventions. The microbiome consists of more than 5 million bacterial genes, representing a prolific reservoir of modifiable targets with potential therapeutic effects.
Unfortunately, all the data being generated are either unusable in their current format or scattered across disparate sources; this makes the data hard to discover and reuse effectively. Hence there is a need to make microbiome data machine-actionable and FAIR.
A project that aims to provide researchers with FAIR microbiome data is the Human Microbiome Project (HMP), which is aimed at obtaining metagenomic sequencing data from a large sample of human subjects to enhance our understanding of the relationship between the microbiome and human health, producing large amounts of data from microbes.
In addition to the HMP, various other societies and initiatives with stakeholders from different research fields are forming consortiums working towards making the microbiome data FAIR, one such society being the GO FAIR initiative, which has set an industry standard for applying the FAIR principles to microbiome data.
Further in this article, we will be talking about the steps taken by the GO FAIR initiative towards making the microbiome data FAIR, Elucidata’s FAIRification process, and our efforts towards hosting FAIR microbiome data on Polly.
FAIRification Process of Microbiome Data by The GO FAIR Initiative
GO FAIR is a stakeholder-driven and self-governed initiative that aims to implement the FAIR data principles to make various kinds of data FAIR, including microbiome data. This organization offers an open and inclusive ecosystem for individuals, institutions, and organizations working together through multiple groups—each group aimed towards FAIRifying a particular type of data.
The “FAIRification” process applies to metadata, data, and supporting infrastructures, such as search engines. Most of the requirements for findability and accessibility can be achieved at the metadata level. However, interoperability and reuse require more effort at the data level. GO FAIR FAIRifies microbiome data through the process depicted in figure 1.
The FAIRification process adopted by GO FAIR starts with retrieval of non-FAIR microbiome data from various public and propriety sources, processing of this data, as different data distributions require different methods for identification and analysis. For example, if the dataset is in a relational database, the relational schema provides information about the dataset structure, the types (field names), cardinality, etc. Then a semantic model is defined for the microbiome database which describes the meaning of entities and relations in the dataset accurately, unambiguously, and in a computer-actionable way, which can then be transformed into linkable data by applying the semantic model. Next, appropriate licenses are assigned to the microbiome datasets, and the metadata is defined. Finally, the FAIR microbiome data resources are deployed to be discovered and used.
FAIRification Process of Microbiome Data on Polly by Elucidata
At Elucidata, we host FAIR microbiome data from the Human Microbiome Project on our Polly platform and have developed proprietary NLP-based curation models to automate this FAIRification process throughout. HMP data on Polly are from two primary cohorts types, “Healthy Cohort” and “Disease Cohorts”; and comprise three primary data types: 16S metagenomic sequences, whole metagenomic shotgun sequences, as well as reference microbial genomes. Information from HMP will equip industries and research organizations with data in the field of Antibody Profiles, Microbial Pathways, Epigemonic Profiles, Cytokine Profiles, Metabolomics, and Metatranscriptomics and can be used to accelerate advancements in biomedical sciences and drug discovery pathways.
We have adopted FAIR principles to create a database of ML-Ready microbiome data from HMP, which can be accessed on our platform Polly or integrated into software outside of Polly. Our FAIRification process is in adherence with FAIR standards adapted by various top consortiums, like the GO FAIR initiatives. Figure 2 depicts an overview of our FAIRification workflow:
Our FAIRification process begins with non FAIR data being retrieved from various public and proprietary sources, cloud storage, publications, and omics processing software. This retrieved data is then audited, and a standard schema is defined for the datasets. Based on this defined standard schema, metadata for these datasets are harmonized using controlled vocabularies. Further, these datasets are also processed using state-of-the-art processing pipelines & and made available for downstream ML applications. Finally, our FAIRification process is completed by making this data available on Polly or integrating it into software outside of Polly.
For more information on how you can access curated microbiome data, contact us at: firstname.lastname@example.org