Big Data has taken the pharmaceutical industry by storm, owing to its potential of developing lucrative therapeutics in shortened time frames. However, the rapid pace at which semi-structured biological data is generated from disparate sources continues to overwhelm pharmaceutical organizations. Utilizing drug discovery approaches that harness the power of analysis – ready data which is accurate, complete & easy to use; therefore becomes imperative for R&D effectiveness. Further, it also helps scientists redirect their focus towards generating actionable insights.
Keeping in mind these data-centric sentiments & needs of the industry, we’ve introduced a range of new features, updates & optimizations on Polly that ensure ML-ready data remains central to the platform.
OmixAtlas: The world’s largest repository of ML-ready Multi-omics data
Polly’s latest data offering, the OmixAtlas provides access to a large volume of multi-omics data curated from multiple public sources in an analysis-ready format. Data from the atlas is enriched with a plethora of metadata annotations. These enable easy discovery of relevant datasets through a filter-based search function on Polly’s user interface & advanced code based querying through Python libraries. Further, users can access this data over a computational environment of their choice.
Our first tissue-specific OmixAtlas – ‘Liver OmixAtlas’ hosts the largest multi-omics collection of curated liver tissue-derived data from human, rat, and mouse. It provides access to 9 different types of data from 10 public sources in a curated format ready for downstream ML applications. In our Webinar Liver OmixAtlas – the world’s most comprehensive hosted by 20/15 Visioneers, we elaborate on how the Liver OmixAtlas helps accelerate therapeutic asset discovery for liver associated diseases. Watch a recording of the session, to learn more.
2X increase in ML-ready data on Polly
The number of ML-ready biomedical molecular data sets on Polly has doubled from 160,000 to 320,000 in just 3 months! This data now offers an even greater depth of information on 5000+ diseases and 33 million samples from 21 different public sources. Additionally, users can access latest datasets from public sources like GEO with minimal delay. This is made possible through data connectors, which continually curate and pre-process data from public sources or your own proprietary data at high throughput.
Explore a diverse collection of biomedical molecular data on a single platform
Polly hosts over 10 different types of biomedical data, enabling users to perform integrative data analysis on one platform. Some of the latest data types added include – ELISA, Elispot, Antibody Titers & Flow Cytometry from notable repositories such as:
- Pharmacodb: An integrative database for mining in vitro anticancer drug screening studies
- Immport: An immunology database and analysis portal that archives clinical study & trial data generated by NIAID/DAIT funded investigators
- Human Protein Atlas: Sweden-based program with the aim to map human proteins, tissues & organs through an integration of omics technologies like mass spectrometry – based proteomics, transcriptomics, and systems biology
- cBioPortal: An open resource for interactive exploration of cancer genomics data sets
Leverage data enriched with an evolving breadth & depth of metadata annotations
Our proprietary ML-powered curation models generate harmonized metadata annotations with scientific context at an accuracy matching that of human experts. The resulting curated data sets, therefore, are mapped according to their identifiers, normalized, and made analysis-ready for ML-based workflows in drug discovery.
Further, enrichment of data present on Polly with these metatdata annotations is a continuous process. Over the last 3 months, 6.2 million metadata annotations were generated through our curation models, bringing the total number of auto curated metadata labels to 15.5 mil+ for the 320k data sets on Polly. These labels have been generated at the dataset, sample, and feature level, enabling a more streamlined approach to finding relevant data.
Take data on Polly to where you work
Users can now explore data on Polly or their own infrastructure with unprecedented granularity through Polly Python Libraries. Polly libraries facilitate powerful querying capabilities across dataset, sample, and feature level metadata through SQL, allowing in-depth data exploration. Further, they allow you to access data on your preferred computational infrastructure. You can also integrate this data through Polly libraries with analytical platforms like AWS Sagemaker and Databricks.
Google Slides Integration on Polly
Analysis reports containing plots & insights generated from an experiment promote reproducibility of data and facilitate knowledge transfer across departments. To ensure users don’t spend a lot of time manually curating reports after each experiment, we have integrated Polly notebooks with Google Slides. These auto-generated decks can be stored in your workspace on Polly, viewed by select stakeholders based on your preferences and shared across your organization.
Your data is safer on Polly than ever before
Data security is one of the key tenets of Polly. We employ advanced infrastructure, security & authentication measures to protect our partners’ proprietary data present on the platform.
- Security enhancements
We have rectified Polly’s security threats & vulnerabilities across various levels. Some of these include – Common vulnerabilities and exposures to cybersecurity threats; Network connection security through cipher suites & protection against malicious attacks (clickjacking, code injection & cross-site scripting) through HTTP security headers. Consequently, in an external security audit conducted by Security Scorecard; a leading cybersecurity risk rating platform; Polly was graded “A” with a score of 97.
Secure thinking is central to our infrastructure, with AWS as the cloud provider of choice for Polly. AWS provides a host of server-less services that ensure industry-leading data security, performance & 24/7 on-site monitoring. Additionally, we’ve also introduced SSO and SSO group solutions to configure and grant AWS account holders within Elucidata relevant permissions, enabling controlled access to the cloud infrastructure.