Biological multi-omics data hold tremendous potential for reuse and discovery. An enormous amount of data is being generated and made public by academic labs and organizations worldwide. However, the data is scattered across multiple sources and lacks standardization. It is un(FAIR) as the availability of data does not equate to its usability. Elucidata’s data warehouse, OmixAtlas, is a repository of FAIR (Findable, Accessible, Interoperable, Reusable) data. It is a collection of millions of datasets from public, proprietary, and licensed sources that have been curated, harmonized, and made ready for downstream machine learning and analytical applications. It is one central location to access data over 26 data types from over 30 public repositories and licensed sources.
All datasets on Polly go through a 2-step process:
Data schema: the data available within OmixAtlas is curated within defined indexes on the basis of the information it contains. These indexes are:
OmixAtlas provides access to thousands of tissue-derived or disease-specific multi-omics datasets from multiple sources in one place. The data can be accessed and analyzed on the same computational infrastructure.
The datasets on Polly can be accessed through GUI or programmatically with Polly Python.
Polly Python library provides convenient access to the below-mentioned functionalities through functions in Python language.
Polly library allows access to data in OmixAtlas over any computational platform like SageMaker, Polly, etc.
The details of datasets can be easily visualized easily over UI as well.
• While handling enormous data and while working on different omics datasets, do you have the need to group samples from multiple OmixAtlases so that it becomes easy to analyze data from different datasets/across repositories?
Look no further! We’ve got you covered with our super useful feature Cohorting which allows you to group datasets or samples based on metadata of interest on Polly. This feature enables you to study the difference between two cohorts- for example. Diseased vs Normal or Cancerous vs Non-Cancerous cells.
• Missing out on datasets while querying just because your search term does not match the ontological term?
For instance, while querying datasets for the disease IBD, the ideal result set must include datasets annotated with diseases - ‘inflammatory bowel diseases', ‘inflammatory bowel diseases, Crohn's disease’, and ‘inflammatory bowel diseases 8’. However, expansion of a keyword doesn’t happen under the hood, resulting in a lesser number of valid hits.
To overcome this, Polly has the ‘Ontology Recommendations’ functionality integrated into Polly-Python. This functionality aims to provide more valid hits in fewer user efforts. The expansion of the keyword happens implicitly, reducing the manual interventions.
For example, if the user tries to query the dataset for the disease ‘obesity’, the result set of ontological recommendations would also include the searches for the terms -
• With tons of data generated & published in the public repositories every year, do you find it challenging to find out the accurate resource required to curate & harmonize them to our needs?
Our Curation app is the solution to all the curation woes. It helps you curate, standardize & harmonize all the clinical data that you’ve generated in a double-blinded manner to convert them into analysis-ready formats!
Along with standard metadata curation, we also offer custom metadata curation wherein users will be able to curate a field of their choice. For instance, the curation of cancer stage, BMI etc. The user will be able to define the custom column header, and ontology to be used if any.
• Visualization apps
Public OmixAtlas is a repository of more than 1.5 million datasets and 4.1 million samples aggregated from 32 publicly available sources. In addition, managing in-house data at scale can also be done with our Enterprise OmixAtlas where proprietary data is standardized and curated. This helps in significantly reducing the time spent on processing datasets.
Benefits of Public OmixAtlas:
Benefits of Enterprise OmixAtlas:
Contact us if you want to learn more about using our 1.5 million curated datasets to train your models or to take advantage of our data-centric platform Polly, to find and analyze relevant datasets.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly provides access to a curated repository of RNA-seq datasets that are consistently processed and enriched with metadata. This harmonization allows researchers to efficiently search for datasets with similar transcriptional profiles, facilitating transcriptome profiling and biomarker identification.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly utilizes signature reversal and multivariate gene expression signatures to predict potential drug combinations. By analyzing publicly available transcriptomics data and drug signatures, Polly can identify drugs or compounds that may have therapeutic effects by reversing disease signatures.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly ranks similar datasets using cosine similarity scores, which measure how closely a dataset's transcriptional profile matches the query signature. This helps researchers quickly find relevant datasets for further analysis and validation.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Researchers define the biological process of interest, select a dataset, preprocess the data, identify differentially expressed genes, and validate the signature. Polly’s platform streamlines this process with expert support and ML-ready datasets.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly's RNA-Seq Atlas addresses challenges in extracting associated signatures from public databases by providing a curated resource of RNA-seq datasets collected from the Gene Expression Omnibus (GEO). This richly curated resource helps researchers to find datasets with similar transcriptional profiles to their gene sets of interest.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Gene signature comparison analyzes gene expression patterns to identify disease-related signatures. It helps researchers find drugs that can reverse disease signatures, aiding in therapeutic discoveries.