TY - JOUR
T1 - CMDS
T2 - A population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data
AU - Zhang, Qunyuan
AU - Ding, Li
AU - Larson, David E.
AU - Koboldt, Daniel C.
AU - McLellan, Michael D.
AU - Chen, Ken
AU - Shi, Xiaoqi
AU - Kraja, Aldi
AU - Mardis, Elaine R.
AU - Wilson, Richard K.
AU - Borecki, Ingrid B.
AU - Province, Michael A.
PY - 2009/12/23
Y1 - 2009/12/23
N2 - Motivation: DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies. Results: Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes. Availability: The R and C programs implementing our method are available at https://dsgweb.wustl.edu/qunyuan/software/cmds. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online. Contact: [email protected]
AB - Motivation: DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies. Results: Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes. Availability: The R and C programs implementing our method are available at https://dsgweb.wustl.edu/qunyuan/software/cmds. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online. Contact: [email protected]
UR - http://www.scopus.com/inward/record.url?scp=77949501694&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btp708
DO - 10.1093/bioinformatics/btp708
M3 - Article
C2 - 20031968
AN - SCOPUS:77949501694
SN - 1367-4803
VL - 26
SP - 464
EP - 469
JO - Bioinformatics
JF - Bioinformatics
IS - 4
ER -