TY - JOUR
T1 - BAYESIAN QUANTILE REGRESSION WITH SUBSET SELECTION
T2 - A DECISION ANALYSIS PERSPECTIVE
AU - Feldman, Joseph
AU - Kowal, Daniel R.
N1 - Publisher Copyright:
© Institute of Mathematical Statistics, 2025.
PY - 2025/9
Y1 - 2025/9
N2 - Quantile regression is a powerful tool in epidemiological studies where interest lies in inferring how different exposures affect specific percentiles of the distribution of a health or life outcome. Existing methods either estimate conditional quantiles separately for each quantile of interest or estimate the entire conditional distribution using semi-or nonparametric models. The former often produce inadequate models for real data and do not share information across quantiles, while the latter are characterized by complex and constrained models that can be difficult to interpret and computationally inefficient. Further, neither approach is well suited for quantile-specific subset selection. Instead, we pose the fundamental problems of linear quantile estimation, uncertainty quantification, and subset selection from a Bayesian decision analysis perspective. For any Bayesian regression model, we derive optimal and interpretable linear estimates and uncertainty quantification for each model-based conditional quantile. Our approach introduces a quantilefocused squared error loss, which enables efficient, closed-form computing and maintains a close relationship with Wasserstein-based density estimation. In an extensive simulation study, our methods demonstrate substantial gains in quantile estimation accuracy, variable selection, and inference over frequentist and Bayesian competitors. We use these tools to identify and quantify the heterogeneous impacts of multiple social stressors and environmental exposures on educational outcomes across the full spectrum of low-, medium-, and high-achieving students in North Carolina.
AB - Quantile regression is a powerful tool in epidemiological studies where interest lies in inferring how different exposures affect specific percentiles of the distribution of a health or life outcome. Existing methods either estimate conditional quantiles separately for each quantile of interest or estimate the entire conditional distribution using semi-or nonparametric models. The former often produce inadequate models for real data and do not share information across quantiles, while the latter are characterized by complex and constrained models that can be difficult to interpret and computationally inefficient. Further, neither approach is well suited for quantile-specific subset selection. Instead, we pose the fundamental problems of linear quantile estimation, uncertainty quantification, and subset selection from a Bayesian decision analysis perspective. For any Bayesian regression model, we derive optimal and interpretable linear estimates and uncertainty quantification for each model-based conditional quantile. Our approach introduces a quantilefocused squared error loss, which enables efficient, closed-form computing and maintains a close relationship with Wasserstein-based density estimation. In an extensive simulation study, our methods demonstrate substantial gains in quantile estimation accuracy, variable selection, and inference over frequentist and Bayesian competitors. We use these tools to identify and quantify the heterogeneous impacts of multiple social stressors and environmental exposures on educational outcomes across the full spectrum of low-, medium-, and high-achieving students in North Carolina.
KW - Bayesian inference
KW - Variable selection
KW - decision theory
KW - interpretable machine learning
UR - https://www.scopus.com/pages/publications/105016181483
U2 - 10.1214/25-AOAS2053
DO - 10.1214/25-AOAS2053
M3 - Article
AN - SCOPUS:105016181483
SN - 1932-6157
VL - 19
SP - 2294
EP - 2319
JO - Annals of Applied Statistics
JF - Annals of Applied Statistics
IS - 3
ER -