TY - JOUR
T1 - Distributed quantile regression for longitudinal big data
AU - Fan, Ye
AU - Lin, Nan
AU - Yu, Liqun
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022.
PY - 2024/4
Y1 - 2024/4
N2 - Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in volume and high dimensional, hence often require distributed analysis in real-world applications. Standard divide-and-conquer techniques do not apply directly to longitudinal big data due to within-subject dependence. In this paper, we focus on developing a distributed algorithm to support quantile regression (QR) analysis of longitudinal big data, which currently remains an open and challenging issue. We employ weighted quantile regression (WQR) to accommodate the correlation in longitudinal big data, and parallelize the WQR estimation process with a two-stage algorithm to support distributed computing. Based on weights estimated in the first stage by the Newton–Raphson algorithm, the second stage solves the WQR problem using the multi-block alternating direction method of multipliers (ADMM). Simulation studies show that, compared to traditional non-distributed algorithms, our proposed method has favorable estimation accuracy and is computationally more efficient in both non-distributed and distributed environments. Further, we also analyze an air quality data set to illustrate the practical performance of this method.
AB - Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in volume and high dimensional, hence often require distributed analysis in real-world applications. Standard divide-and-conquer techniques do not apply directly to longitudinal big data due to within-subject dependence. In this paper, we focus on developing a distributed algorithm to support quantile regression (QR) analysis of longitudinal big data, which currently remains an open and challenging issue. We employ weighted quantile regression (WQR) to accommodate the correlation in longitudinal big data, and parallelize the WQR estimation process with a two-stage algorithm to support distributed computing. Based on weights estimated in the first stage by the Newton–Raphson algorithm, the second stage solves the WQR problem using the multi-block alternating direction method of multipliers (ADMM). Simulation studies show that, compared to traditional non-distributed algorithms, our proposed method has favorable estimation accuracy and is computationally more efficient in both non-distributed and distributed environments. Further, we also analyze an air quality data set to illustrate the practical performance of this method.
KW - ADMM
KW - Big data
KW - Distributed algorithm
KW - Longitudinal analysis
KW - Weighted quantile regression
UR - https://www.scopus.com/pages/publications/85146248708
U2 - 10.1007/s00180-022-01318-0
DO - 10.1007/s00180-022-01318-0
M3 - Article
AN - SCOPUS:85146248708
SN - 0943-4062
VL - 39
SP - 751
EP - 779
JO - Computational Statistics
JF - Computational Statistics
IS - 2
ER -