FAIR Data

Visualizing Bulk RNA-seq Data Using Phantasus

Jayashree
February 2, 2023

Bulk RNA-seq refers to a sequencing approach where the gene expression from a population of cells is averaged to check for RNA presence and quantity in a sample of cells during the time of measurement. It is the preferred technique for a transcriptomic investigation of tissue slices, biopsies, or pooled cell populations.

Once bulk RNA-seq data has been processed, there remains the essential process where the biology is explored, visualized, and interpreted. Without a visualization and analysis tool, this step can be time-consuming and laborious.

In this blog, we talk about Phantasus, a web application for the visualization and analysis of datasets, and how its integration with Polly helps in easy bulk RNA-seq data analysis.

What Is Phantasus?

Phantasus is a user-friendly web application for interactive gene expression analysis. It simplifies data analysis by offering a seamless approach, from loading, normalizing and filtering the data to performing differential gene expression and downstream analysis. Phantasus integrates an intuitive heatmap interface with gene expression analysis tools to achieve this. The tool supports R-based methods such as k-means clustering, principal component analysis, or differential expression analysis with limma package.

What Is a Heatmap?

A heatmap is a graphical representation of data that uses a system of color coding to represent different values. For example, assume there are 20,000 genes listed in rows, with the conditions listed in columns. Each gene under every condition represents a certain value. If we just list the numerical values of each gene corresponding to every condition, it will be difficult to differentiate between the genes. So instead of numbers, we use colors. Each color represents a range of numbers, so when the heatmap is plotted, we get an idea about the behavior of genes under different conditions.

In case of Phantasus, the heat map represents gene expression data under different conditions and other parameters listed.

How Does Phantasus Help in Data Analysis?

In simple terms, Phantasus is an application that takes input data about the genes in GCT (Gene Cluster Text) file format and generates a heatmap for these genes with respect to certain conditions and parameters like cell type, cell line, etc. that are listed in the metadata. Using this tool, we can easily analyze the gene data, differentiate between them, and find the group of genes that matches our study interest. Various statistics and differential expression techniques are used to find the difference between the genes.

  • Loading public datasets from Gene Expression Omnibus with both microarrays and RNA-seq datasets being supported.
  • Differential gene expression using limma or DESeq2.
  • Publication-ready plots with export to SVG: PCA plot, row profiles, box plots.
  • Clustering: k-means and hierarchical.
  • Gene set enrichment analysis.
  • Pathway enrichment analysis.

Phantasus on Polly:

Polly, a data-centric ML Ops platform, hosts OmixAtlas, a data warehouse with millions of datasets from public, proprietary, and licensed sources. Phantasus can be used directly from Polly OmixAtlas. The highly curated datasets on Polly allow seamless integration of the Phantasus app, and data can be analyzed readily without the need for preprocessing. Any dataset can be opened on this application on Polly, and a corresponding heatmap will appear.

The app loads data, normalizes it, and filters outliers to perform differential expression and other downstream analyses like plotting PCA plots or pathway analysis.

Opening Phantasus on Polly
Visualizing a dataset on Polly-Phantasus

The figure above shows the visualization of a dataset in the form of a heatmap on Phantasus.

  • On the heatmap, the rows correspond to genes (or microarray probes). The rows are annotated with Gene symbol and Gene ID annotations. Columns correspond to samples.
  • Phantasus on Polly uses data_type from dataset metadata from the atlas and checks whether the ‘data_type’ has the value ‘RAW COUNTS TRANSCRIPTOMICS.’ When the condition is matched, it uses the VST normalization method to perform normalization before loading the visualizations. The aim of normalization methods for large-scale expression data, including microarray and RNA-seq, is to eliminate systematic experimental bias and technical variation while preserving biological variation.
  • Variance stabilizing transformation (VST) aims at generating a matrix of values for which variance is constant across the range of mean values, especially for low mean.
  • Phantasus application uses normalized data to draw all visualizations. It has been integrated on Polly and can be used directly on OmixAtlas like GEO Raw Counts OmixAtlas, and Bulk RNASeq OmixAtlas.

Polly hosts the world’s largest collection of highly curated, ML-ready bulk and single-cell RNA seq data. Our curation pipelines, high-quality, accurately annotated data, standard workflows, and scientific expertise are used by industries and academia across the globe to accelerate their drug discovery process. Reach out to us to learn more about how to accelerate your research!

Blog Categories

Blog Categories

Request Demo