Others

Cancer Research in the Age of AI: The Next Frontier

In 2021, the U.S. Food and Drug Administration approved Paige Prostate, the first AI-powered pathology solution to aid in the detection and diagnosis of prostate cancer. Dr. Tim Stenzel, Director of the Office of In Vitro Diagnostics and Radiological Health at the FDA, remarked that such advancements could "increase the number of identified prostate biopsy samples with cancerous tissue, ultimately saving lives."

Over the last decade we have seen huge progress in artificial intelligence (AI) to transform the healthcare landscape. Its impact is quite profound in the field of cancer detection where early diagnosis could mean the difference between life and death. The discourse around AI in healthcare frequently revolves around binary questions: Will AI replace doctors? Will it revolutionize medicine? These debates, while engaging, often miss the point. The critical question is how we can ask the right questions to effectively harness AI's capabilities and achieve meaningful impacts. This shift in perspective invites a more sophisticated understanding of AI’s role—one that recognizes its potential as a powerful tool to augment human expertise rather than replace it. Today’s algorithms can analyze and manipulate large amounts of data to uncover new relationships, develop and test ideas, and reveal biological pathways and processes. AI engines can create novel molecular structures, either alone or in combination, to cure diseases with enough training data.

https://journals.sagepub.com/doi/full/10.1177/17455065211018111

In this blog, we explore how AI is reshaping cancer research and the vital role of high-quality data in unlocking its full potential,drawing from the unique perspective of Elucidata, a leader in biomedical data management.

Current and Emerging Applications of AI in Cancer Research

With its unparalleled ability to analyze large, complex datasets, AI enables researchers to extract meaningful patterns from high-dimensional data such as genomics, proteomics, and imaging. This is particularly critical in cancer R&D, where understanding intricate molecular interactions and patient heterogeneity is key to developing effective therapies. Let’s explore some of the key applications of AI across various domains such as cancer biology, screening, drug discovery, precision treatment, surveillance, and accessibility in Oncology.

(Source)

1. Improving the Accuracy and Efficiency of Cancer Screening, Detection, and Diagnosis

AI is transforming cancer detection worldwide, enhancing the speed, accuracy, and reliability of screening methods. The US FDA has approved AI tools to assist pathologists in identifying cancerous areas in prostate biopsy images. At the Mayo Clinic, Rochester, US, researchers have developed AI algorithms to improve breast cancer detection through mammogram analysis and predict long-term invasive cancer risks. NCI researchers have advanced deep learning tools to automate the detection of precancerous cervical lesions, facilitating earlier diagnoses. These innovations are just a few examples of the global advancements in AI-driven cancer diagnostics.

Ongoing and Potential Integrations of AI in Cancer Screening, Detection, Diagnosis and Treatment (Source)

2. Accelerating Cancer Drug Discovery

AI is transforming the drug discovery landscape through innovative approaches to drug design, repurposing, and predicting treatment responses. For example, in 2022, the group of Grégoire Altan-Bonnet, Ph.D., Laboratory of Integrative Cancer Immunology, US, in collaboration with Paul François’ group at McGill University, Canada developed a model to analyze patterns in T-cell activation data, with the goal of improving immunotherapy outcomes. Predictive AI models are also mapping drug response pathways, providing insights into biological mechanisms. These advancements reduce the time and cost of drug development, bringing treatments to patients faster.

(Source)

3. Improving Cancer Surveillance

AI is improving cancer surveillance by streamlining data collection and analysis. Projects like MOSSAIC (Modeling Outcomes Using Surveillance Data and Scalable Artificial Intelligence for Cancer) are accelerating the submission of cancer data to the Surveillance, Epidemiology, and End Results (SEER) program. Automated algorithms extract tumor features from clinical texts, saving significant manual effort. Researchers at Harvard Medical School and the University of Copenhagen, in partnership with VA Boston Healthcare System, Dana-Farber Cancer Institute, and the Harvard T.H. Chan School of Public Health have developed an AI tool that successfully identified individuals at the highest risk for pancreatic cancer up to three years before diagnosis using only the patients' medical records.

4. Facilitating Precision Cancer Treatment

Precision oncology, which tailors treatments based on tumor characteristics, greatly benefits from AI’s ability to analyze large datasets. DeepGlioma technology analyzes tumor samples during surgery to detect genetic mutations with over 90% accuracy. Another innovation, FastGlioma, can determine within 10 seconds whether any portion of a cancerous brain tumor that could be removed remains. According to the research team led by the University of Michigan and the University of California, San Francisco, FastGlioma significantly outperformed conventional methods for identifying residual tumor tissue during surgery.  Additionally, AI models can predict survival outcomes for breast cancer patients using digital pathology images. By integrating histopathology and molecular data, AI is enhancing clinical decision-making and improving patient outcomes.

(Source)

5. Expanding Access to Cancer Care

AI has the potential to bridge health disparities by delivering high-quality care to under-served populations. Scientists at Google Research and Google Deepmind (2024) developed the Articulate Medical Intelligence Explorer (AMIE), a research AI system aimed at enhancing diagnostic reasoning and conversational capabilities in clinical settings. AMIE was trained in a simulated dialogue environment with automated feedback and equipped with a chain-of-reasoning strategy to boost diagnostic accuracy. In tests involving complex medical cases, AMIE outperformed both unassisted clinicians and other assistive tools, highlighting its potential to support clinical decision-making. Though experimental, AMIE highlights the potential for AI to provide expert-level care in under-resourced settings.

6. Advancing Fundamental Knowledge of Cancer Biology

AI is being leveraged to deepen our understanding of cancer’s underlying mechanisms. For instance, large language models are enabling scientists to extract valuable insights from vast scientific literature. A collaboration between the National Cancer Institute (NCI) and the Department of Energy is using AI to simulate the atomic behavior of the RAS protein, a common mutation site in cancer. This work aims to uncover new strategies for targeting mutations in the RAS gene, which could lead to novel treatments.

The applications of AI in cancer research and treatment are expanding rapidly. This growth mirrors broader market trends, with the AI in cancer diagnosis sector projected to grow from USD 175.3 million in 2023 to an estimated USD 1,943.6 million by 2033 (Market.US, 2024). These numbers highlight the transformative potential of AI in reshaping oncology and improving patient outcomes

Data: The Foundation of AI in Cancer R&D

The effectiveness of AI hinges on the availability and quality of data. As Dr. Fei-Fei Li, a pioneer in AI and co-director of the Stanford Institute for Human-Centered Artificial Intelligence, once stated, "The foundation of AI is data. Without data, there is no AI."

The Importance of Data Quality

AI models rely on vast datasets to identify patterns, predict outcomes, and generate insights. However, the old adage “garbage in, garbage out” holds especially true in this domain. Poor-quality data can lead to biased models, inaccurate predictions, and ultimately, flawed scientific conclusions.

Consider a scenario in drug discovery where an AI model trained on incomplete or inconsistent genomic datasets predicts the wrong target molecule. Such errors could derail months of research and lead to significant financial losses. To mitigate these risks, it’s essential to adopt stringent quality control measures, such as:

  • Data Curation: Cleaning and standardizing datasets to remove errors and inconsistencies.
  • Metadata Enrichment: Adding context to raw data to ensure interpretability.
  • Validation Pipelines: Implementing rigorous testing protocols to verify data accuracy.

A robust dataset is the cornerstone of any meaningful AI-driven discovery." -Dr. Daphne Koller, co-founder of Insitro

Elucidata’s platform, Polly, excels in addressing data quality challenges. By leveraging automated data curation and annotation pipelines with thorough QC by human experts, Elucidata ensures that datasets are both clean and standardized. Polly’s ability to enrich metadata with domain-specific context further enhances interpretability, while built-in validation tools maintain accuracy, enabling researchers to trust their data at every step.

Tailoring Data for Unique Research Needs

Cancer R&D encompasses a wide range of objectives, from identifying biomarkers to designing personalized therapies. Each objective has unique data requirements.

AI models must be trained on datasets specifically tailored to the research goal. This highlights the importance of data annotation and labeling, as well as the integration of domain expertise to ensure the relevance of data to specific research questions. Elucidata’s focus on contextualizing data for specific research objectives ensures that datasets are fit-for-purpose. Polly’s ability to integrate domain knowledge with advanced annotation capabilities allows researchers to tailor datasets precisely to their needs. Whether for biomarker discovery or drug repurposing, Elucidata ensures that the right data is always at the researcher’s fingertips.

Integration: Breaking Down Data Silos

The fragmented nature of biomedical data presents another challenge. Data is often siloed across institutions, platforms, and formats, limiting its utility for AI applications. Effective integration of diverse data types—such as clinical records, imaging datasets, and genomic sequences—is crucial for generating holistic insights. As Dr. Atul Butte, Director of the UCSF Institute for Computational Health Sciences, puts it, "The real magic happens when you integrate data from different sources. That’s when AI starts to provide actionable insights."

Modern data integration platforms, like those developed by companies such as Elucidata, employ advanced techniques. Elucidata’s proprietary data model encompasses popular frameworks like OMOP, integrating their core features while offering 2X more attributes for EHR data. This enriched structure enables seamless exploration and deeper patient-centric insights within a unified schema.

Regulatory Considerations

In cancer R&D, clinical data often involves sensitive patient information, making regulatory compliance a critical concern. Researchers must navigate frameworks such as HIPAA, GDPR, and the FDA’s guidance on AI/ML in medical devices. Compliance not only ensures ethical data usage but also builds trust with stakeholders.

Key strategies for regulatory adherence include:

  • Data Anonymization: Removing personally identifiable information while retaining data utility.
  • Audit Trails: Maintaining detailed records of data usage and access.
  • Continuous Monitoring: Regularly updating practices to align with evolving regulations.

Elucidata places a strong emphasis on safeguarding sensitive data and intellectual property by employing rigorous security measures and adhering to top-tier industry standards. Its security-focused engineering services empower users to confidently address the challenges of health data security, including access control, advanced encryption, and continuous monitoring. Read more here.

A Data-Centric Approach to Model Accuracy

Traditional AI development has often been model-centric, focusing on optimizing algorithms. However, a paradigm shift toward data-centric AI is gaining traction. In this approach, the emphasis is on improving the quality and diversity of data to enhance model performance. As Dr. Andrew Ng, cofounder of Coursera and DeepLearning.AI, argues, "Improving data quality can often yield better results than tweaking models."  This philosophy is exemplified by Elucidata, a leader in data-centric innovation and the winner of the NCI CRDC AI Data Readiness Challenge (2024).

Elucidata's advanced data harmonization engine ensures that terabytes of data are curated to the highest standards. Datasets are harmonized with ontology-backed metadata, processed through scientifically validated pipelines, and subjected to nearly 50 rigorous QC checks to achieve gold-standard quality.

As AI-driven approaches revolutionize R&D, from drug discovery to clinical trials, platforms like Polly provide the foundation for success by ensuring data is clean, curated, and contextualized. Whether it’s supporting companies to accelerate tumor biomarker discovery,  identifying cancer-specific differentiation targets with high success probability or aiding in building multi-modal models of cellular interactions and tumor microenvironments, Elucidata equips researchers with the tools they need to unlock insights and accelerate breakthroughs.

By addressing the complexities of biomedical data management, Elucidata ensures that the transformative potential of AI is fully realized, driving advancements in cancer care and reshaping the future of oncology.
Who knows? With innovations like these, the vision of an AI-powered assistant akin to Baymax from Big Hero 6 may not be far off.

Connect with us today to fast-track your data-driven AI breakthroughs in cancer R&D!

Blog Categories

Talk to our Data Expert
Thank you for reaching out!

Our team will get in touch with you over email within next 24-48hrs.
Oops! Something went wrong while submitting the form.

Blog Categories