TY - GEN
T1 - GRID for variable selection in high dimensional regression
AU - Giordano, Francesco
AU - Lahiri, Soumendra Nath
AU - Parrella, Maria Lucia
N1 - Publisher Copyright:
© 2014 Proceedings of COMPSTAT 2014 - 21st International Conference on Computational Statistics. All rights reserved.
PY - 2014
Y1 - 2014
N2 - Given a nonparametric regression model, we assume that the number of covariates may increase infinitely but only some of these covariates are relevant for the model. Our goal is to identify the relevant covariates and to obtain some information about the structure of the model. We propose a new nonparametric procedure, called GRID, having the following features: (a) it automatically identifies the relevant covariates of the regression model, also distinguishing the nonlinear from the linear ones (a covariate is defined linear/nonlinear depending on the marginal relation between the response variable and such a covariate); (b) the interactions between the covariates (mixed effect terms) are automatically identified, without the necessity of considering some kind of stepwise selection method. In particular, our procedure can identify the mixed terms of any order (two way, three way, ...) without increasing the computational complexity of the algorithm; (c) it is completely data-driven, so being easily implementable for the analysis of real datasets. In particular, it does not depend on the selection of crucial regularization parameters, nor it requires the estimation of the nuisance parameter σ2 (self-scaling). The acronym GRID derives from Gradient Relevant Identification Derivatives, meaning that the procedure is based on testing the significance of a partial derivative estimator.
AB - Given a nonparametric regression model, we assume that the number of covariates may increase infinitely but only some of these covariates are relevant for the model. Our goal is to identify the relevant covariates and to obtain some information about the structure of the model. We propose a new nonparametric procedure, called GRID, having the following features: (a) it automatically identifies the relevant covariates of the regression model, also distinguishing the nonlinear from the linear ones (a covariate is defined linear/nonlinear depending on the marginal relation between the response variable and such a covariate); (b) the interactions between the covariates (mixed effect terms) are automatically identified, without the necessity of considering some kind of stepwise selection method. In particular, our procedure can identify the mixed terms of any order (two way, three way, ...) without increasing the computational complexity of the algorithm; (c) it is completely data-driven, so being easily implementable for the analysis of real datasets. In particular, it does not depend on the selection of crucial regularization parameters, nor it requires the estimation of the nuisance parameter σ2 (self-scaling). The acronym GRID derives from Gradient Relevant Identification Derivatives, meaning that the procedure is based on testing the significance of a partial derivative estimator.
KW - high dimension
KW - model selection
KW - nonparametric regression
KW - Variable selection
UR - https://www.scopus.com/pages/publications/85183580374
M3 - Conference contribution
AN - SCOPUS:85183580374
T3 - Proceedings of COMPSTAT 2014 - 21st International Conference on Computational Statistics
SP - 515
EP - 522
BT - Proceedings of COMPSTAT 2014 - 21st International Conference on Computational Statistics
A2 - Gilli, Manfred
A2 - Gonzalez-Rodriguez, Gil
A2 - Nieto-Reyes, Alicia
PB - The International Statistical Institute/International Association for Statistical Computing
T2 - 21st International Conference on Computational Statistics, COMPSTAT 2014
Y2 - 19 August 2014 through 22 August 2014
ER -