TY - JOUR
T1 - Fast surrogates of U-statistics
AU - Lin, N.
AU - Xi, R.
PY - 2010/1/1
Y1 - 2010/1/1
N2 - U-statistics have long been known as a class of nonparametric estimators with good theoretical properties such as unbiasedness and asymptotic normality. However, their applications in modern statistical analysis are limited due to the high computational complexity, especially when massive data sets are becoming more and more common nowadays. In this paper, using the "divide-and-conquer" technique, we developed two surrogates of the U-statistics, aggregated U-statistics and average aggregated U-statistics, both of which are shown asymptotically equivalent to U-statistics and computationally much more efficient. When dividing the raw data set into K subsets, the two proposed estimators reduce the computational complexity from O (Nm) to O (K (N / K)m), which results in significant time reduction as long as K = o (N) and m ≥ 2. The merit of the two proposed statistics is demonstrated by both simulation studies and real data examples.
AB - U-statistics have long been known as a class of nonparametric estimators with good theoretical properties such as unbiasedness and asymptotic normality. However, their applications in modern statistical analysis are limited due to the high computational complexity, especially when massive data sets are becoming more and more common nowadays. In this paper, using the "divide-and-conquer" technique, we developed two surrogates of the U-statistics, aggregated U-statistics and average aggregated U-statistics, both of which are shown asymptotically equivalent to U-statistics and computationally much more efficient. When dividing the raw data set into K subsets, the two proposed estimators reduce the computational complexity from O (Nm) to O (K (N / K)m), which results in significant time reduction as long as K = o (N) and m ≥ 2. The merit of the two proposed statistics is demonstrated by both simulation studies and real data examples.
UR - https://www.scopus.com/pages/publications/70349298314
U2 - 10.1016/j.csda.2009.08.009
DO - 10.1016/j.csda.2009.08.009
M3 - Article
AN - SCOPUS:70349298314
SN - 0167-9473
VL - 54
SP - 16
EP - 24
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
IS - 1
ER -