Quality control (QC) is a critical preliminary stage in single-cell RNA-seq (scRNA-Seq) data analysis, serving two primary objectives:
In this solution brief, we discuss the prevalent metrics and techniques employed for QC and filtering of cell barcodes in single-cell RNA-seq data on our biomedical data curation platform - Polly. These methods influence the inclusion of cell barcodes in downstream analysis, potentially influencing clustering outcomes and visualization.
In scRNA-seq, various techniques exhibit differences in transcript length and sequence coverage. Some methods, such as Smart-seq and Quartz-seq, capture complete transcript sequences, while others, like Drop-seq (3’-end only), STRT-seq (5’-end only), and Chromium (3’-end only) focus on partial sequences. These techniques collectively form a pipeline that transforms limited-scale input into high-dimensional output, shedding light on cellular mechanisms and trajectory dynamics.
This analysis follows a structured workflow, divided into two main sections: pre-processing and downstream analysis. Common quality control filters are the gatekeepers of data integrity, ensuring that the information derived from complex datasets remains accurate and reliable.
Before embarking on the data filtering process for single-cell RNA-seq data, two essential steps should be undertaken:
In the analysis of single-cell data, the adoption of common metrics and filtering methods is pivotal. Below, we explore these practices in breif, providing insights into their rationale and potential caveats where applicable.
Quality control is critical in scRNA-seq data analysis, ensuring that only high-quality cells and genes are used for downstream analysis. By implementing ordinary QC filters and considering the unique characteristics of your dataset, you can enhance the reliability and biological relevance of your scRNA-seq results, leading to more accurate insights into cellular heterogeneity and gene expression patterns.
Polly is a transformative asset in elevating the quality of data. It excels in curating multi-omics and assay data, rendering them ML-ready and analysis-ready. This process is driven by a Polly-verified curation engine, overseen by skilled experts who harmonize a wide spectrum of data types, enrich metadata, and ensure consistent data processing while maintaining affordability. The ML-Ready data is securely stored on cloud-based Atlas data stores, optimized for efficient analysis and data management.
Polly's state-of-the-art technology caters to approximately 26 diverse R&D data types, meeting the requirements of teams involved in pre-clinical drug discovery and diagnostics R&D. It's the trusted choice for over 25 research organizations, including four of the largest 10 pharmaceutical companies, who leverage Polly and its associated solutions to expedite their discovery programs. Numerous other data-driven healthcare enterprises rely on Polly-verified processes to harmonize and securely store public and proprietary biomedical data. In a nutshell, Polly, with its user-friendly interface and advanced capabilities, ensures high-quality scRNA-seq data.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly provides access to a curated repository of RNA-seq datasets that are consistently processed and enriched with metadata. This harmonization allows researchers to efficiently search for datasets with similar transcriptional profiles, facilitating transcriptome profiling and biomarker identification.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly utilizes signature reversal and multivariate gene expression signatures to predict potential drug combinations. By analyzing publicly available transcriptomics data and drug signatures, Polly can identify drugs or compounds that may have therapeutic effects by reversing disease signatures.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly ranks similar datasets using cosine similarity scores, which measure how closely a dataset's transcriptional profile matches the query signature. This helps researchers quickly find relevant datasets for further analysis and validation.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Researchers define the biological process of interest, select a dataset, preprocess the data, identify differentially expressed genes, and validate the signature. Polly’s platform streamlines this process with expert support and ML-ready datasets.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Polly's RNA-Seq Atlas addresses challenges in extracting associated signatures from public databases by providing a curated resource of RNA-seq datasets collected from the Gene Expression Omnibus (GEO). This richly curated resource helps researchers to find datasets with similar transcriptional profiles to their gene sets of interest.
Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.
Gene signature comparison analyzes gene expression patterns to identify disease-related signatures. It helps researchers find drugs that can reverse disease signatures, aiding in therapeutic discoveries.