RNA sequencing is a rapidly emerging method for investigating the transcriptome. Over the past few decades, it has significantly progressed, becoming a paramount approach in transcriptome profiling. RNA-seq data is being utilized in multiple aspects of research and disease treatments. However, findability, usability, quality, and reliability have always been problematic for researchers and data scientists.
Though this is a very niche space, multiple platforms are being developed to facilitate the availability of RNA-seq data and do so in varying degrees of efficiency. Here we lay out a comparison between two such platforms operating in a similar space.
In this blog, we compare and discuss the difference between Elucidata’s ML-Ops platform Polly, and an online resource Recount3, as sources for uniformly processed and annotated RNAseq data.
Polly is a data-centric MLOps platform that hosts FAIR (Findable Accessible Interoperable and Reusable) multi-omics data from public and proprietary sources. Specific ETL pipelines called Connectors facilitate seamless data ingestion and harmonization. Polly’s curation infrastructure is built on a specialized BERT model, PollyBERT, that helps in metadata annotation.
Recount3 is an online resource that consists of uniformly processed RNA-seq data. It consists of RNA-seq gene, exon, and exon-exon junction counts as well as coverage bigWig files for 8,679 and 10,088 different studies for humans and mouse respectively. It is the third generation of the ReCount project and part of recount.bio.The raw sequencing data is processed with the Monorail system which generates the coverage bigWig files and the recount-unified text files. Furthermore, snapcount enables query-based access to the recount3 and recount2 data.
Let us dive deeper into understanding how these platforms work with the help of a few examples.
1. Querying Efficiency
Querying at GUI level for transcriptomics datasets Neurodegenerative diseases in humans.
Querying programmatically for Alzheimer's disease datasets with normal and patient samples.
2. How Easy Is It to Find Relevant Data?
3. How Easy Is It to Access the Data?
4. How Easy Is It to Integrate the Data with Other Data and Interoperate with Applications or Workflows for Analysis, Storage, and Processing?
Ratings out of 5
While both platforms are great sources for finding processed RNAseq data, it would be helpful to take a closer look to identify how they would serve particular users. It is very important for researchers and scientists to keep up with all the emerging data without having to spend a lot of time finding the relevant ones. It is thus preferable to have metadata backed with standard ontologies enabling superior search and findability. We hope this blog can help users make an informed choice between these platforms.
If you are spending time scouring datasets to just find out relevant ones for downstream analysis, now is the time to reach out. Connect with us to learn more about how to accelerate your research.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly provides access to a curated repository of RNA-seq datasets that are consistently processed and enriched with metadata. This harmonization allows researchers to efficiently search for datasets with similar transcriptional profiles, facilitating transcriptome profiling and biomarker identification.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly utilizes signature reversal and multivariate gene expression signatures to predict potential drug combinations. By analyzing publicly available transcriptomics data and drug signatures, Polly can identify drugs or compounds that may have therapeutic effects by reversing disease signatures.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly ranks similar datasets using cosine similarity scores, which measure how closely a dataset's transcriptional profile matches the query signature. This helps researchers quickly find relevant datasets for further analysis and validation.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Researchers define the biological process of interest, select a dataset, preprocess the data, identify differentially expressed genes, and validate the signature. Polly’s platform streamlines this process with expert support and ML-ready datasets.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly's RNA-Seq Atlas addresses challenges in extracting associated signatures from public databases by providing a curated resource of RNA-seq datasets collected from the Gene Expression Omnibus (GEO). This richly curated resource helps researchers to find datasets with similar transcriptional profiles to their gene sets of interest.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Gene signature comparison analyzes gene expression patterns to identify disease-related signatures. It helps researchers find drugs that can reverse disease signatures, aiding in therapeutic discoveries.