TY - JOUR
T1 - Residual projection for quantile regression in vertically partitioned big data
AU - Fan, Ye
AU - Li, Jr Shin
AU - Lin, Nan
N1 - Funding Information:
Nan Lin’s work is supported by NVDIA GPU grant program. Ye Fan’s work is supported by Initial Scientific Research Fund of Young Teachers in Capital University of Economics and Business [Grant No. XRZ2022062], and partly supported by Special Fund for Basic Scientific Research of Beijing Municipal Colleges in Capital University of Economics and Business [Grant No. QNTD202207]. Jr-Shin Li’s work is supported by the Air Force Office of Scientific Research under the award FA9550-21-1-0335.
Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature.
PY - 2023/3
Y1 - 2023/3
N2 - Standard regression techniques model only the mean of the response variable. Quantile regression (QR) is more powerful in that it depicts a comprehensive relationship between the response variable and independent covariates at different quantiles. It is particularly useful for non-normally distributed data with skewness or heterogeneity, which appear routinely in many scientific fields, such as economics, finance, public health and biology. Although its theory has been well developed in the literature, its computation in big data still faces multiple challenges, especially for vertically stored big data in modern distributed environments, where communication efficiency and security are usually the primary considerations. While the popular alternating direction method of multipliers (ADMM) provides a general computational solution, its slow convergence becomes a bottleneck when communication cost dominates local computational consumption, such as Internet of Things (IoT) networks. Motivated by the residual projection technique, in this paper we propose an innovative iterative parallel framework, PIQR, that converges faster and has a more secure data transmission plan, and establish its convergence property. This framework is further extended to composite quantile regression (CQR), a modified QR technique that improves estimation efficiency at extreme quantiles. Simulation studies show that both the ADMM-based method and the PIQR enjoy favorable estimation accuracy in distributed environments. While PIQR is inferior to the ADMM-based method at local computation, it requires much fewer iterations to achieve convergence, and hence significantly improves the overall computational efficiency when communication cost is the dominating factor. Moreover, PIQR transmits only data involving the residual information between different machines, and can better prevent the leakage of important data information compared with the ADMM-based method.
AB - Standard regression techniques model only the mean of the response variable. Quantile regression (QR) is more powerful in that it depicts a comprehensive relationship between the response variable and independent covariates at different quantiles. It is particularly useful for non-normally distributed data with skewness or heterogeneity, which appear routinely in many scientific fields, such as economics, finance, public health and biology. Although its theory has been well developed in the literature, its computation in big data still faces multiple challenges, especially for vertically stored big data in modern distributed environments, where communication efficiency and security are usually the primary considerations. While the popular alternating direction method of multipliers (ADMM) provides a general computational solution, its slow convergence becomes a bottleneck when communication cost dominates local computational consumption, such as Internet of Things (IoT) networks. Motivated by the residual projection technique, in this paper we propose an innovative iterative parallel framework, PIQR, that converges faster and has a more secure data transmission plan, and establish its convergence property. This framework is further extended to composite quantile regression (CQR), a modified QR technique that improves estimation efficiency at extreme quantiles. Simulation studies show that both the ADMM-based method and the PIQR enjoy favorable estimation accuracy in distributed environments. While PIQR is inferior to the ADMM-based method at local computation, it requires much fewer iterations to achieve convergence, and hence significantly improves the overall computational efficiency when communication cost is the dominating factor. Moreover, PIQR transmits only data involving the residual information between different machines, and can better prevent the leakage of important data information compared with the ADMM-based method.
KW - ADMM
KW - Parallel framework
KW - Privacy preservation
KW - Quantile regression
KW - Residual projection
KW - Vertically distributed big data
UR - https://www.scopus.com/pages/publications/85146544160
U2 - 10.1007/s10618-022-00914-4
DO - 10.1007/s10618-022-00914-4
M3 - Article
AN - SCOPUS:85146544160
SN - 1384-5810
VL - 37
SP - 710
EP - 735
JO - Data Mining and Knowledge Discovery
JF - Data Mining and Knowledge Discovery
IS - 2
ER -