How to Perform Patient Stratification on Polly

Anurag Srivastava, Shruti Malavade
November 6, 2023
How to Perform Patient Stratification on Polly

Patient stratification involves categorizing a patient population into subgroups based on the presence or absence of a disease. This approach plays a crucial role in understanding the underlying pathology of a disease, enabling physicians to customize therapeutic interventions for individuals. Patient stratification is key to precision medicine and the development of novel therapeutic targets. While AI models and multi-omics approaches have simplified patient stratification, significant challenges persist.

This blog will discuss the significant challenges faced in performing patient stratification and how users can achieve it using the Polly harmonization engine.

Challenges Faced by the Industry in Performing Patient Stratification

The ideal situation for successful personalized medicine would be for clinicians to know beforehand the patient’s risk classification and which drug to administer. However, the reality is that performing patient stratification is difficult even when utilizing multi-omics datasets at our disposal. There are many hurdles like poor data quality, a small sample size, and limited data availability.

More than 50% of public repository datasets like Gene Expression Omnibus (GEO) lack annotations, and just 2% are harmonized.

Nearly 80% of the available data are unstructured and unFAIR, making their usefulness inadequate. The problems associated with poor data quality, unFAIR data, missing metadata, and small sample sizes can result in a faulty predictive model, leading to suboptimal results.

One common strategy for patient stratification relies on cell type differentiation, which has proven effective in classifying autoimmune and cancer patients. Implementing it on a large scale presents challenges due to data-related issues. Another significant challenge in patient stratification is the lack of reliable biomarkers, as exemplified in the case of pancreatic cancer.  Moreover, disease heterogeneity adds another layer of complexity, as primary tumor sites vary among patients. The critical breakthrough in overcoming these challenges lies in the quality and harmonization of data.

The solution to this Polly by Elucidata. Polly's harmonization engine provides the means to enhance data quality, harmonize multi-modal datasets, and train patient classifiers.

Polly's capabilities include data harmonization, metadata annotation (providing essential information like tumor site), and seamless integration of various data types. Polly further aids in tackling quality-related issues by harmonizing multi-modal data into an ML-ready resource. This ensures that all data is clean, consistently processed, linked to critical metadata, and statistically robust.

Our Approach:

1. Curate an Atlas specific to disease

The first step in patient stratification using the cell type differentiation method at scale involves aggregating a large multi-omic data corpus to gain a comprehensive view of the disease. This multi-omic data corpus provides a holistic perspective, simultaneously enhancing model robustness and clinical relevance. To create a data warehouse, we use our Polly harmonization engine. Polly harmonization engine can build disease-specific atlases of ML-ready datasets. Researchers can merge and harmonize multi-modal datasets from diverse sources to meet common standards. This integration of multiple omics types and samples enhances the robustness of the models.  

2. Define Genetic Signatures

The next step involves defining genetic signatures for each stage of cell differentiation using the harmonized data from the disease-specific atlas. We employ cell types and ranking genes from each dataset to build the classifier model. After comparing gene pairs, the model classifies cell types. The cell differentiation stage cannot be determined by pairwise comparisons alone. Instead, we use more modeling techniques. We acquire patient samples from public sources like TCGA after defining the genetic markers for each cell type at each differentiation step.

3. Train Classifier Model

The classifier model is trained on harmonized datasets to categorize patients based on their cell differentiation stage and to classify them into low and high-risk groups. Performing differential expression analysis on the two patient cohorts generates a list of differentially expressed genes, serving as the foundation for a genetic signature for these patient populations. Subsequently, users can utilize transcription factor enrichment analysis to refine these genetic signatures and define potential drug targets.

4. Target Prioritization

To obtain precise targets from patient stratification, it's crucial to prioritize further gene targets more thoroughly. Our experts collaborate with your team to prioritize drug targets based on druggability scores and supporting literature evidence.

Case Study: Identifying Potential Drug Targets for AML

We used Polly OmixAtlas and patient stratification to identify potential drug targets for AML. To do this, 10k+ multi-omics datasets related to AML and normal Hematopoiesis were consolidated from public & proprietary sources. The datasets were cleaned & linked to harmonized metadata, stage of differentiation, cell line, cell type & more by Polly. Curation enabled the integration of multiple datasets to create high-quality multi-omics signatures.

Overcoming Challenges in Multi-Omics Patient Stratification
The Polly Platform
Overcoming Challenges in Multi-Omics Patient Stratification
Shortlisting genes that act as markers of differentiation

Overcoming Challenges in Multi-Omics Patient Stratification
Overcoming Challenges in Multi-Omics Patient Stratification
Training patient classifiers with harmonized AML Datasets

Results

  • 2+ data-centric patient stratification-based targets in AML were identified using an integrative multi-omics approach
  • 6 Months to identify & validate targets with Polly. Significantly faster than the average 2-year time period

Read the case study here.

Polly For Enhanced Quality of Patient Stratification

By utilizing Polly's capabilities, researchers can streamline the multi-omics analysis process, from data retrieval to downstream analysis and interpretation. Its assistance can save time, provide expert guidance, and simplify the complex tasks involved in multi-omics analysis, ultimately enhancing the efficiency and accuracy of research.

Polly aims to empower researchers by augmenting their capabilities, accelerating the pace of discovery, and facilitating breakthroughs in various scientific fields.

Other Resources

Talk to our Data Expert
Thank you for reaching out!

Our team will get in touch with you over email within next 24-48hrs.
Oops! Something went wrong while submitting the form.

FAQs

What are the key benefits of using Polly for gene target prioritization in patient stratification?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

  • Data-Driven Target Selection: Polly integrates multi-omics data to identify key genes relevant to patient subgroups.
  • Accelerated Drug Discovery: The platform prioritizes targets based on disease associations and biomarker relevance, expediting the discovery and validation process.
  • Improved Reproducibility: Harmonized datasets ensure reliable and reproducible findings for target validation.

How does Polly help in training classifier models for patient stratification?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Polly provides pre-processed, harmonized datasets that enable AI/ML model training for patient classification. It supports feature selection, dimensionality reduction, and validation workflows to build robust predictive models for precision medicine applications.

How does Polly assist in defining genetic signatures for different stages of cell differentiation?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Polly analyzes both single-cell and bulk multi-omics data to identify stage-specific genetic markers. By applying machine learning algorithms to detect patterns in gene expression, Polly helps researchers map lineage differentiation and gain insights into disease progression.

What is the process of creating a disease-specific atlas using Polly’s harmonization engine?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Polly builds disease-specific atlases by:

  1. Aggregating multi-omics datasets from curated sources.
  2. Harmonizing data using standardized ontologies.
  3. Annotating datasets with clinical metadata.
  4. Structuring the information into disease-specific cohorts for targeted biomarker and therapeutic research.

How does Polly integrate multiple data types for more reliable patient stratification?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Polly integrates genomics, transcriptomics, proteomics, and clinical data into a unified, multi-dimensional view of patient populations. This helps researchers uncover complex biological relationships and enhances predictive modeling for patient subgroups.

Can Polly handle data quality issues and unstructured data from public repositories?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Yes, Polly automatically processes raw, unstructured data from public sources, addressing missing values, batch effects, and inconsistencies. Its machine learning–driven pipelines filter out noise and standardize data, ensuring higher-quality datasets for seamless analysis.

How does Polly harmonize multi-omic datasets to improve the quality of patient stratification?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Polly's harmonization engine normalizes, processes, and integrates diverse datasets using standard ontologies and metadata frameworks. This ensures consistency, removes batch effects, and enhances the reliability of downstream analyses for precise patient classification.

How does Elucidata's Polly help in overcoming the challenges of patient stratification?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Polly streamlines patient stratification by:

  • Harmonizing and Integrating Multi-omics Data: Polly standardizes data across different sources, making it analysis-ready.
  • Curating High-quality Datasets: The platform ensures datasets are clean, structured, and well-annotated, thereby improving the reliability of downstream analyses.
  • Enabling AI-driven Insights: Polly applies machine learning models to uncover patterns and classify patients effectively.
  • Ensuring Reproducibility and Scalability
  • Automated pipelines and version-controlled workflows allow for efficient scaling to large datasets while maintaining detailed records of each analysis step, making it easier to reproduce or modify results.

What challenges do researchers face when performing patient stratification using multi-omics data?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Researchers encounter several challenges, including:

  • Data Heterogeneity: Multi-omics data come from different platforms, making integration complex.
  • Data Quality Issues: Public datasets often contain missing values, noise, or inconsistencies.
  • Computational Complexity: Large-scale multi-omics data require significant computational power and expertise to process.
  • Interpretability: Even with powerful analytical methods, extracting clear and meaningful biological insights from high-dimensional data remains a significant challenge.

What is patient stratification, and why is it important for precision medicine?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Patient stratification is the process of categorizing patients into subgroups based on genetic, molecular, or clinical characteristics. This approach is crucial for precision medicine because it identifies which patient populations are most likely to respond to specific treatments, thereby improving therapeutic outcomes and reducing the risk of adverse effects.

What are the key advantages of using Polly for transcriptome profiling and biomarker identification?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Polly provides access to a curated repository of RNA-seq datasets that are consistently processed and enriched with metadata. This harmonization allows researchers to efficiently search for datasets with similar transcriptional profiles, facilitating transcriptome profiling and biomarker identification.

What methodologies does Polly use to identify synergistic drug combinations?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Polly utilizes signature reversal and multivariate gene expression signatures to predict potential drug combinations. By analyzing publicly available transcriptomics data and drug signatures, Polly can identify drugs or compounds that may have therapeutic effects by reversing disease signatures.

How does Polly rank datasets similar to a gene signature query?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Polly ranks similar datasets using cosine similarity scores, which measure how closely a dataset's transcriptional profile matches the query signature. This helps researchers quickly find relevant datasets for further analysis and validation.

What steps are involved in creating a query gene signature on Polly?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Researchers define the biological process of interest, select a dataset, preprocess the data, identify differentially expressed genes, and validate the signature. Polly’s platform streamlines this process with expert support and ML-ready datasets.

How does Polly's RNA-Seq Atlas simplify gene signature analysis?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Polly's RNA-Seq Atlas addresses challenges in extracting associated signatures from public databases by providing a curated resource of RNA-seq datasets collected from the Gene Expression Omnibus (GEO). This richly curated resource helps researchers to find datasets with similar transcriptional profiles to their gene sets of interest.

What is gene signature comparison, and why is it important in drug discovery?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

Gene signature comparison analyzes gene expression patterns to identify disease-related signatures. It helps researchers find drugs that can reverse disease signatures, aiding in therapeutic discoveries.