In our experience, researchers looking to use TCGA often end up in a long search for relevant links. While the TCGA website is a great resource, getting to some relevant links can be hard. So we put together this quick cheat sheet with useful links. If you are looking to download, cite, or just use TCGA as is this post should be a helpful quick guide.
What is The Cancer Genome Atlas (TCGA)?
The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program that has molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. It is the result of a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) in bringing together researchers from diverse disciplines and multiple institutions. TCGA has helped establish the importance of cancer genomics, transformed our understanding of cancer, and even begun to change how the disease is treated in the clinic. The impact goes even further, reaching health and science technologies, computational biology, and other research fields. TCGA has produced a rich data set of immeasurable value. This data remains available to the public in Genomic Data Commons (GDC) web portal as a trusted reference that will be mined for many years. Genomic Data Commons (GDC) is a data-sharing platform that promotes precision medicine in oncology.
Data on TCGA
Over a period of 12 years starting from 2006, with contributions from over 11,000 patients, and incredible effort from thousands of researchers, TCGA has generated over 2.5 petabytes of genomic, epigenomic, transcriptomic, and proteomic data. The data types available in TCGA include:
· Clinical data: Clinical information, Biospecimen data, Pathology reports
· Copy number: SNP microarray, Copy number microarray, Low-Pass DNA Sequencing
· DNA: Whole-exome, Whole-genome, SNP microarray, Sequence traces
· Imaging: Diagnostic image, Tissue image, Radiological image
· Methylation: Bisulfite sequencing, Bead array
· Microsatellite Instability
· miRNA: miRNA Sequencing, Array-based data
· mRNA Expression: mRNA sequencing, Total RNA Sequencing, Microarray
· Protein Expression: Reverse-Phase Protein Array
TCGA’s selection criteria of cancers for study:
- Poor prognosis
- Overall public health impact
- Availability of samples meeting standards for patient consent
- Availability of samples meeting standards for quality and quantity that include:
- Primary, untreated tumor with a source of matched normal tissue or blood sample
- Frozen, sufficiently sized, resection samples
- Samples composed of at least 80% tumor nuclei (threshold later lowered to 60% with improved sequencing technology and computational methods)
- With support from patients, patient advocacy groups, and doctors, many rare cancers were also included
Using Data on TCGA
How to Look for Data on TCGA?
Data Collected and processed by the TCGA program can be accessed using the following links:
For clinical, molecular, and imaging data: Genomic Data Commons
For guidance in navigating various data types: Resources for TCGA users
How to Download Data from TCGA?
The data from the TCGA project can be accessed and downloaded using the following links on the GDC website.
For querying and downloading GDC data files: GDC Data Portal
For downloading large volumes of files: GDC Data Transfer Tool
For performing programmatic queries and downloads: GDC Application Programming Interface (API)
Authors who use data from TCGA are to acknowledge the TCGA Research Network in the acknowledgments section of their work in the format: “The results <published or shown> here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.”