Gemma: A Treasure Trove for Genomic Data Gee(q)s

Since the turn of the century, the development of new methods and platforms, combined with improved computational infrastructure and reduced costs have made it easier for labs to adopt high-throughput technologies. Consequently, massive amounts of genomics data are being generated. These data are not limited to answering just a single research question; they harbor immense potential for reuse. There are several public repositories that promote data reuse, each dedicated to a specific kind of omics data or a specific disease.

However, data abundance is a measure of quantity, not quality. Public repositories often do not have specific guidelines that have to be followed while depositing data. As a result, a great deal of time has to be spent in cleaning and pre-processing, in order to bring the data to a usable form. One way of accelerating the process of discovering novel insights from legacy data, is by curating data.

A Curated Database of Genomics Datasets

Gemma was established in 2012, by the Pavladis lab at the University of British Columbia. It is a database containing approximately 10,000 genomics datasets that have been curated to enable meta-analysis. Data have been sourced from multiple public repositories, primarily the Gene Expression Omnibus (GEO). It hosts microarray as well as RNA-sequencing data and offers support for numerous platforms such as Affymetrix, Illumina and nucleotide arrays. An important feature of Gemma is the data annotation done by both manual and automated means. This enhances the ease of data usability from the database. Apart from serving as a database, Gemma also hosts web-based tools for data exploration and discovery.

Geeq-ing Out Over Data Quality

A key difference between public repositories and Gemma is the emphasis on the quality of data being hosted. Once expression data has been imported from the public source, curators include corresponding annotations and sample metadata. Array and experimental designs are other aspects that are subjected to quality checks. Array design undergoes sequence analysis and corresponding gene assignment. Experimental design is closely examined to ensure that datasets meet certain criteria (such as a minimum sample number, contain minimal outliers or missing data). Gemma datasets are put through a quality assessment known as Geeq, which takes into consideration the quality and suitability of the data.

Enabling Data Reuse

Gemma aims to streamline and ease the process of genomics data reuse by going the extra mile to ensure that all the data added to the platform are curated and subjected to rigorous quality checks. By eliminating the time spent on making the data usable, Gemma helps scientists focus on the more important task of using data to answer research questions and push the boundaries of science as we know it.

Blog Categories

Data Analysis and Management

Data Quality & Compliance

Industry Features

Product & Engineering

Data Science & Machine Learning

Company & Culture

FAIR Data

Others

Thank you for reaching out!

Our team will get in touch with you over email within next 24-48hrs.

Oops! Something went wrong while submitting the form.

Other Resources

Case Studies Dataset Roundup Documentation Glossary Solution Briefs Webinars Whitepapers

Upcoming Webinar - AlphaGenome Unpacked: Promise, Progress, and What Comes Next for AI in Genomics

Join us

[Upcoming Webinar] Scaling High-Quality Data Processing: Achieve 4x Cost Reduction for Foundation ModelsRegister Now->

Reserve Your Seat

Gemma: A Treasure Trove for Genomic Data Gee(q)s

A Curated Database of Genomics Datasets

Geeq-ing Out Over Data Quality

Enabling Data Reuse

Blog Categories

Talk to our Data Expert

Other Resources

Related Blogs

How to Choose the Right Data Analytics Platform for Biopharma Research

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Clinical Trials Data: Best Practices for Effective Analysis and Integration

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Blog Categories

Get the latest news, industry insights, and updates delivered directly to your inbox.

Latest Blogs

How to Choose the Right Data Analytics Platform for Biopharma Research

How to Choose the Right Data Analytics Platform for Biopharma Research

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Clinical Trials Data: Best Practices for Effective Analysis and Integration

Clinical Trials Data: Best Practices for Effective Analysis and Integration

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Trending Blogs

How to Choose the Right Data Analytics Platform for Biopharma Research

Clinical Trials Data: Best Practices for Effective Analysis and Integration

Scaling Data Pipelines for High-throughput Bioinformatics

Decoding Complexities: The Critical Role of Deconvolution in Spatial Transcriptomics

Challenges with Diagnostics Data Processing Pipelines

info@elucidata.io

info@elucidata.io

info@elucidata.io