Analyzing raw metabolomics data is a complicated and time-consuming process as of now. Rarely does one find a tool that fits all their requirements perfectly. This often pushes people towards a complex interwoven network of paid and open-source software in the hope of customizing an end-to-end process for themselves. However, the numerous disadvantages of hacking together a solution; time spent to find and optimize, lack of support and documentation, the effort required to troubleshoot, lack of reproducibility and reliability; greatly outweigh the benefits, if any.
To this effect, we at Elucidata, have created Polly, a vendor-agnostic one-stop-shop cloud platform for target discovery using multi-omics data. Polly enables you to store data and analyses and access them anywhere, create custom plots and processing pipelines using a Jupyter Notebook, restore past analyses, generate customizable reports at the click of a button and reach out to our experienced team of Application Scientists, Data Scientists, and Product Managers.
Among the various applications present on Polly, MetScape forms the backbone of the processing pipeline that allows you to process both targeted and untargeted metabolomics data, with ease in minutes. The process starts with loading data (.mzXML or .mzML files) in El-MAVEN, a vendor-agnostic open-source metabolomics peak annotation software. The peaks annotated are then pushed to MetScape which provides you different types of normalization, transformation, scaling and differential expression algorithms available. Following this is the visualization dashboard with various quality check plots and pathway level visualization.
The ability to process data as large as 100’s of GB in minutes allows you the luxury to spend more and more time on analysis, which it rightfully deserves.
How it works
The process to go from raw data to visualizations which provide basic insights into one’s data broadly consists of four steps, all of which can be performed in less than 15 minutes using El-MAVEN and MetScape, as shown below:
1. Annotate your peaks in El-MAVEN
Metabolites can be annotated using El-MAVEN in an automated manner without the need for manual curation. El-MAVEN superimposes the EICs of a metabolite across all samples so as to give you a better understanding of its behavior across different cohorts. This information can be exported as a .csv file and uploaded in MetScape as is.
2. Jupyter Notebook to support customization
The Jupyter Notebook allows you to use in-built scripts for normalization and visualization or write your own custom scripts. The box plot of the raw data makes it clear that normalization is required.
The plot shown above is formed after the raw data has been log-transformed using a pre-loaded script.
3. Interactive plots enable you to remove outliers on the fly
The interactive PCA plot allows you to reject outliers if present. Samples can be removed from only the PCA plot using the option “Recalculate PCA” or rejected from all downstream analysis by clicking on “Remove from Dataset”.
The volcano plot gives a visual representation of the differentially expressed metabolites. Fold change, p-value and the current cohort comparison can be changed using the option box on the right-hand side.
4. The pathway dashboard
Once the differential expression is calculated, the differentially expressed metabolites are displayed on a pathway map. This makes it easier to interpret the data as compared to just a list of differentially expressed metabolites. The size of the nodes depends on the p-value and the color on the fold change as shown in the legend.
The Path to Success
MetScape is an end-to-end workflow that serves to process and analyze unlabeled LC-MS data in conjunction with El-MAVEN. Its simple and intuitive user interface, the capability to handle big data, ease of customization and integration with Elucidata’s powerful pathway visualization tool makes it an integral and indispensable part of any group dealing with metabolomics data.