TY - JOUR
T1 - A Two-Sample Test for Equality of Means in High Dimension
AU - Gregory, Karl Bruce
AU - Carroll, Raymond J.
AU - Baladandayuthapani, Veerabhadran
AU - Lahiri, Soumendra N.
N1 - Publisher Copyright:
© 2015 American Statistical Association.
PY - 2015/4/3
Y1 - 2015/4/3
N2 - We develop a test statistic for testing the equality of two population mean vectors in the “large-p-small-n” setting. Such a test must surmount the rank-deficiency of the sample covariance matrix, which breaks down the classic Hotelling T2 test. The proposed procedure, called the generalized component test, avoids full estimation of the covariance matrix by assuming that the p components admit a logical ordering such that the dependence between components is related to their displacement. The test is shown to be competitive with other recently developed methods under ARMA and long-range dependence structures and to achieve superior power for heavy-tailed data. The test does not assume equality of covariance matrices between the two populations, is robust to heteroscedasticity in the component variances, and requires very little computation time, which allows its use in settings with very large p. An analysis of mitochondrial calcium concentration in mouse cardiac muscles over time and of copy number variations in a glioblastoma multiforme dataset from The Cancer Genome Atlas are carried out to illustrate the test. Supplementary materials for this article are available online.
AB - We develop a test statistic for testing the equality of two population mean vectors in the “large-p-small-n” setting. Such a test must surmount the rank-deficiency of the sample covariance matrix, which breaks down the classic Hotelling T2 test. The proposed procedure, called the generalized component test, avoids full estimation of the covariance matrix by assuming that the p components admit a logical ordering such that the dependence between components is related to their displacement. The test is shown to be competitive with other recently developed methods under ARMA and long-range dependence structures and to achieve superior power for heavy-tailed data. The test does not assume equality of covariance matrices between the two populations, is robust to heteroscedasticity in the component variances, and requires very little computation time, which allows its use in settings with very large p. An analysis of mitochondrial calcium concentration in mouse cardiac muscles over time and of copy number variations in a glioblastoma multiforme dataset from The Cancer Genome Atlas are carried out to illustrate the test. Supplementary materials for this article are available online.
KW - Copy number variation
KW - Heteroscedasticity
KW - Large p
UR - https://www.scopus.com/pages/publications/84934920563
U2 - 10.1080/01621459.2014.934826
DO - 10.1080/01621459.2014.934826
M3 - Article
AN - SCOPUS:84934920563
SN - 0162-1459
VL - 110
SP - 837
EP - 849
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 510
ER -