TY - JOUR
T1 - Before and After
T2 - Comparison of Legacy and Harmonized TCGA Genomic Data Commons’ Data
AU - The Genomic Data Analysis Network
AU - Gao, Galen F.
AU - Parker, Joel S.
AU - Reynolds, Sheila M.
AU - Silva, Tiago C.
AU - Wang, Liang Bo
AU - Zhou, Wanding
AU - Akbani, R.
AU - Bailey, Matthew
AU - Balu, Saianand
AU - Berman, Benjamin P.
AU - Brooks, Denise
AU - Chen, Hu
AU - Cherniack, Andrew D.
AU - Demchok, John A.
AU - Ding, Li
AU - Felau, Ina
AU - Gaheen, Sharon
AU - Gerhard, Daniela S.
AU - Heiman, David I.
AU - Hernandez, Kyle M.
AU - Hoadley, Katherine A.
AU - Jayasinghe, R.
AU - Kemal, Anab
AU - Knijnenburg, Theo A.
AU - Laird, Peter W.
AU - Mensah, Michael K.A.
AU - Mungall, Andrew J.
AU - Robertson, A. Gordon
AU - Shen, Hui
AU - Tarnuzzer, Roy
AU - Wang, Zhining
AU - Wyczalkowski, Matthew
AU - Yang, Liming
AU - Zenklusen, Jean C.
AU - Zhang, Zhenyu
AU - Liang, Han
AU - Noble, Michael S.
N1 - Publisher Copyright:
© 2019 The Authors
PY - 2019/7/24
Y1 - 2019/7/24
N2 - We present a systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)—mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations—comprehensive sample, gene, and probe-level studies were performed, towards quantifying the degree of similarity between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons. We offer gene lists to elucidate differences that remained after controlling for confounders, and strategies to mitigate their impact on biological interpretation. Our results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve. Gao et al. performed a systematic analysis of the effects of synchronizing the large-scale, widely used, multi-omic dataset of The Cancer Genome Atlas to the current human reference genome. For each of the five molecular data platforms assessed, they demonstrated a very high concordance between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons.
AB - We present a systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)—mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations—comprehensive sample, gene, and probe-level studies were performed, towards quantifying the degree of similarity between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons. We offer gene lists to elucidate differences that remained after controlling for confounders, and strategies to mitigate their impact on biological interpretation. Our results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve. Gao et al. performed a systematic analysis of the effects of synchronizing the large-scale, widely used, multi-omic dataset of The Cancer Genome Atlas to the current human reference genome. For each of the five molecular data platforms assessed, they demonstrated a very high concordance between the ‘legacy’ GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as ‘harmonized’ by the Genomic Data Commons.
KW - DNA methylation
KW - The Cancer Genome Atlas
KW - human reference genome
KW - mRNA expression
KW - microRNA expression
KW - quality control
KW - somatic copy number alteration
KW - somatic mutation
UR - http://www.scopus.com/inward/record.url?scp=85068937179&partnerID=8YFLogxK
U2 - 10.1016/j.cels.2019.06.006
DO - 10.1016/j.cels.2019.06.006
M3 - Article
C2 - 31344359
AN - SCOPUS:85068937179
SN - 2405-4712
VL - 9
SP - 24-34.e10
JO - Cell Systems
JF - Cell Systems
IS - 1
ER -