TY - JOUR
T1 - A global approach to identify differentially expressed genes in cDNA (two-color) microarray experiments
AU - Zhou, Yiyong
AU - Cras-Méneur, Corentin
AU - Ohsugi, Mitsuru
AU - Stormo, Gary D.
AU - Permutt, M. Alan
N1 - Funding Information:
This work is supported in part by National Institute of Health Grants DK16746, DK56954, DK99007 (to M.A.P); GM28755 (to G.D.S.); NSF FIBR under grant number 0425749 and the Battelle Pacific under DOE project (to Prof. Bijoy Ghosh, Department of Electrical and Systems Engineering, Washington University in Saint Louis) and the Washington University Diabetes Research and Training Center. We gratefully acknowledge the D. Melton lab (Harvard University) and K. Kaestner and C. Stoeckert Labs (University of Pennsylvania), as well as Ellen Ostlund, Jessica Murray, Sandy Clifton, Hiroshi Inoue, Chris Sawyer, Mike Heinz, Wesley Warren, Elaine Mardis, and other members of the Genome Sequencing Center for their work with the Endocrine Pancreas Consortium cDNA libraries and micro-arrays. We would also like to thank Robin Matlib for helpful discussions. We would like to thank the reviewers’ comments and suggestions that improved our manuscript.
PY - 2007/8/15
Y1 - 2007/8/15
N2 - Motivation: Currently most of the methods for identifying differentially expressed genes fall into the category of so called single-geneanalysis, performing hypothesis testing on a gene-by-gene basis. In a single-gene-analysis approach, estimating the variability of each gene is required to determine whether a gene is differentially expressed or not. Poor accuracy of variability estimation makes it difficult to identify genes with small fold-changes unless a very large number of replicate experiments are performed. Results: We propose a method that can avoid the difficult task ofestimating variability for each gene, while reliably identifying a group of differentially expressed genes with low false discovery rates, even when the fold-changes are very small. In this article, a new characterization of differentially expressed genes is established based on a theorem about the distribution of ranks of genes sorted by (log) ratios within each array. This characterization of differentially expressed genes based on rank is an example of all-gene-analysis instead of single gene analysis. We apply the method to a cDNA microarray dataset and many low fold-changed genes (as low as 1.3 fold-changes) are reliably identified without carrying out hypothesis testing on a gene-by-gene basis. The false discovery rate is estimated in two different ways reflecting the variability from all the genes without the complications related to multiple hypothesis testing. We also provide some comparisons between our approach and single-gene-analysis based methods.
AB - Motivation: Currently most of the methods for identifying differentially expressed genes fall into the category of so called single-geneanalysis, performing hypothesis testing on a gene-by-gene basis. In a single-gene-analysis approach, estimating the variability of each gene is required to determine whether a gene is differentially expressed or not. Poor accuracy of variability estimation makes it difficult to identify genes with small fold-changes unless a very large number of replicate experiments are performed. Results: We propose a method that can avoid the difficult task ofestimating variability for each gene, while reliably identifying a group of differentially expressed genes with low false discovery rates, even when the fold-changes are very small. In this article, a new characterization of differentially expressed genes is established based on a theorem about the distribution of ranks of genes sorted by (log) ratios within each array. This characterization of differentially expressed genes based on rank is an example of all-gene-analysis instead of single gene analysis. We apply the method to a cDNA microarray dataset and many low fold-changed genes (as low as 1.3 fold-changes) are reliably identified without carrying out hypothesis testing on a gene-by-gene basis. The false discovery rate is estimated in two different ways reflecting the variability from all the genes without the complications related to multiple hypothesis testing. We also provide some comparisons between our approach and single-gene-analysis based methods.
UR - http://www.scopus.com/inward/record.url?scp=34548570211&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btm292
DO - 10.1093/bioinformatics/btm292
M3 - Article
C2 - 17550914
AN - SCOPUS:34548570211
SN - 1367-4803
VL - 23
SP - 2073
EP - 2079
JO - Bioinformatics
JF - Bioinformatics
IS - 16
ER -