With the advances in next-generation sequencing technologies over the past decade, genomics has gradually caught up to the big data giants – YouTube, Amazon, and Twitter to name a few – in terms of its requirement for data storage and computational needs. By 2025, the storage requirements for human DNA sequences alone is projected to be 2-40 exabytes (1 exabyte is 1018 bytes or 109 gigabytes).
In this era of big data, several open access data initiatives like TCGA, GEO, ENCODE etc. have made a large amount of omics data publicly available to the scientific community . Such datasets are extremely valuable for drug discovery as they allow integration of large amounts of data from across different sources and give a more robust and comprehensive understanding of diseases. While there is ample data available to the scientist, the real challenge for a scientist now lies in being able to extract meaningful insights from it. With the advent of ‘omics’, the bottleneck in science has rapidly shifted from data generation to data interpretation and this is where Elucidata comes into the picture.
Elucidata enables bio-pharma companies to tap into these powerful datasets to answer their research questions. We provide them with the necessary bioinformatics expertise and data analysis support. For instance, if a company is interested in studying the mutational landscape of a particular set of genes across different cancer types, we can query them over the TCGA dataset. The Cancer Genome Atlas (TCGA) is an effort to characterize over 10,000 tumor samples across 33 different cancers using different technologies. We analyze the TCGA dataset to identify previously unknown genomic alterations or driver genes responsible for the disease, which helps identify novel targets for therapy and drug discovery.
The field of genomics has also seen an increase in the development of sophisticated bioinformatics tools for analysis and interpretation of the data. Majority of these tools are open-source allowing greater reproducibility in research and better maintained products. We, at Elucidata, embrace open-source, which enables us to build innovative software products, contribute to the community, and work with cutting-edge technologies like CRISPR. Going ahead, we envision that such open source tools and pipelines could be hosted on Polly as independent applications, allowing scientists to customize their workflow. The ability to perform integrative multi-omics analyses under a single roof makes Polly a one stop platform for drug discovery.
- Genome researchers raise alarm over big data, Nature
- Databases and web tools for cancer genomics study, ScienceDirect
- Leveraging big data to transform target selection and drug discovery, Wiley
- The Cancer Genome Atlas
- Sequencing the genome creates so much data we don’t know what to do with it, The Washington Post
- Broad Institute to release Genome Analysis Toolkit 4 as open source resource to accelerate research, Broad Institute
- The real cost of sequencing: scaling computation to keep pace with data generation, Genome Biology
- The Open Source Software Debate in NGS Bioinformatics, Mass Genomics