Distributed quantile regression for longitudinal big data

Ye Fan, Nan Lin, Liqun Yu

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in volume and high dimensional, hence often require distributed analysis in real-world applications. Standard divide-and-conquer techniques do not apply directly to longitudinal big data due to within-subject dependence. In this paper, we focus on developing a distributed algorithm to support quantile regression (QR) analysis of longitudinal big data, which currently remains an open and challenging issue. We employ weighted quantile regression (WQR) to accommodate the correlation in longitudinal big data, and parallelize the WQR estimation process with a two-stage algorithm to support distributed computing. Based on weights estimated in the first stage by the Newton–Raphson algorithm, the second stage solves the WQR problem using the multi-block alternating direction method of multipliers (ADMM). Simulation studies show that, compared to traditional non-distributed algorithms, our proposed method has favorable estimation accuracy and is computationally more efficient in both non-distributed and distributed environments. Further, we also analyze an air quality data set to illustrate the practical performance of this method.

Original languageEnglish
Pages (from-to)751-779
Number of pages29
JournalComputational Statistics
Volume39
Issue number2
DOIs
StatePublished - Apr 2024

Keywords

  • ADMM
  • Big data
  • Distributed algorithm
  • Longitudinal analysis
  • Weighted quantile regression

Fingerprint

Dive into the research topics of 'Distributed quantile regression for longitudinal big data'. Together they form a unique fingerprint.

Cite this