TY - JOUR
T1 - Necessary and sufficient conditions for variable selection consistency of the lasso in high dimensions
AU - Lahiri, Soumendra N.
N1 - Publisher Copyright:
© Institute of Mathematical Statistics, 2021
PY - 2021
Y1 - 2021
N2 - This paper investigates conditions for variable selection consistency of the LASSO in high dimensional regression models and gives necessary and sufficient conditions for the same, potentially allowing the model dimension p to grow arbitrarily fast as a function of the sample size n. These conditions require both upper and lower bounds on the growth rate of the penalty parameter. It turns out that a variant of the irrepresentable Condition (IRC) of (J. Mach. Learn. Res. 7 (2006) 2541-2563), herein called the lower irrepresentable Condition (or LIRC), is determined by the lower bound considerations while the upper bound considerations lead to a new condition, called the upper irrepresentable Condition (or UIRC) in this paper. It is shown that the LIRC together with the UIRC is necessary and sufficient for the variable selection consistency of the LASSO, thereby settling a conjecture of (J. Mach. Learn. Res. 7 (2006) 2541-2563). Further, it is shown that under some mild regularity conditions, the penalty parameter must necessarily tend to infinity at a certain minimal rate to ensure variable selection consistency of the LASSO and that the corresponding LASSO estimators of the nonzero regression parameters cannot be √n-consistent (even for individual parameters). Thus, under fairly general conditions, the LASSO with a single choice of the penalty parameter cannot achieve both variable selection consistency and √n-consistency simultaneously.
AB - This paper investigates conditions for variable selection consistency of the LASSO in high dimensional regression models and gives necessary and sufficient conditions for the same, potentially allowing the model dimension p to grow arbitrarily fast as a function of the sample size n. These conditions require both upper and lower bounds on the growth rate of the penalty parameter. It turns out that a variant of the irrepresentable Condition (IRC) of (J. Mach. Learn. Res. 7 (2006) 2541-2563), herein called the lower irrepresentable Condition (or LIRC), is determined by the lower bound considerations while the upper bound considerations lead to a new condition, called the upper irrepresentable Condition (or UIRC) in this paper. It is shown that the LIRC together with the UIRC is necessary and sufficient for the variable selection consistency of the LASSO, thereby settling a conjecture of (J. Mach. Learn. Res. 7 (2006) 2541-2563). Further, it is shown that under some mild regularity conditions, the penalty parameter must necessarily tend to infinity at a certain minimal rate to ensure variable selection consistency of the LASSO and that the corresponding LASSO estimators of the nonzero regression parameters cannot be √n-consistent (even for individual parameters). Thus, under fairly general conditions, the LASSO with a single choice of the penalty parameter cannot achieve both variable selection consistency and √n-consistency simultaneously.
KW - Asymptotic normality
KW - Irrepresentable condition
KW - Oracle property
KW - Regularization
UR - https://www.scopus.com/pages/publications/85104803291
U2 - 10.1214/20-AOS1979
DO - 10.1214/20-AOS1979
M3 - Article
AN - SCOPUS:85104803291
SN - 0090-5364
VL - 49
SP - 820
EP - 844
JO - Annals of Statistics
JF - Annals of Statistics
IS - 2
ER -