Data Science & Machine Learning

A Code-first Company Aiding Pharma R&D. Worth Your While?

Deepthi Das
December 27, 2022

iOS started taking over Android sales in the US very recently after adding many refinements and features. But the journey of Android has been great. It is interesting to pause and reflect a bit about this. For the average user, iOS and Android are the same. For an advanced user, however, android offers far more flexibility, functionality, and freedom of choice. There are factors galore that fuel the growth of one entity over another. However, I can't help but think about how important the above-mentioned aspects are in the development of a product that aids in research!

Flexibility, Functionality, and Freedom of Choice.

This is exactly what a code-first approach offers. The ability to manipulate data and analyze it in a versatile manner is a powerful capability that could tip the balance from data to insight. Pharma research is inherently data rich and needs various tools to manipulate the data to get actionable insights.  

Image: Link

Being code-first allows users flexibility in terms of ingesting a variety of data types from various sources, improves functionality in terms of highly complex querying ability, and offers the freedom of choice in using different programming languages like Python, R, or Bash to automate pipelines and integration various analysis/ visualization tools.  We have seen it firsthand with our data-centric MLOps platform, Polly. Our data engineering is based on the IDEATE (Ingest, Discover, Enrich, Analyze, communicaTE) framework.

The Ability to Perform Complex Queries

You can perform highly complex queries across structured data - 1.5 M datasets - on the platform using Polly Python. You can apply filters at multiple levels of the data schema-  the metadata level (cell- line, tissue, organism, etc.), the dataset level( perturbation, sample size, etc. ), and the feature level (gene expression). It is not the ‘X AND Y NOT Z’ kind of search that is carried out on unstructured data. This is an in-depth query to zero-in on the most relevant datasets for your research. You can not only search for a specific mutation in a particular gene for a disease across repositories but could also go as far as setting the gene expression range within which you want to search.  

An example of a complex query to retrieve mutation data for given diseases and genes from public repos having mutation data type.

The data thus obtained is in a machine-readable structured format which allows the user to readily allow the user to analyze it further. You can build sub-queries to find out expression values from the data matrix. These levels of query/user journeys do not exist on a GUI platform.

Ease of Automation/Integration

A code-first platform allows you to fetch relevant datasets and load them directly into an automation pipeline or docker to analyze the data using tools integrated into the platform. Polly notebooks provide a language-independent architecture. The decoupling between the client and kernel makes it possible to code in multiple programming languages. It allows for conducting of efficient and reproducible interactive computing experiments. It is very easy to host on the server side, which is useful for security purposes. Notebooks are highly customizable and easily shareable.

Image source: Link

Languages like Python and R have taken over the data science world. The code-first platform allows analytical complexity by letting you use the latest algorithms to manipulate data in ways that were impossible earlier. It also enables the development of robust algorithms that can analyze large and complex datasets to identify novel drug targets and/or biomarkers. It facilitates collaboration within and across research teams, and significantly reduces the time to results & insights. To find out more, please check our GitHub page.

Happy coding!

This blog was originally published as part of our LinkedIn newsletter Polly Bits.

Blog Categories

Blog Categories

Request Demo