TY - JOUR
T1 - Leveraging mixed and incomplete outcomes via reduced-rank modeling
AU - Luo, Chongliang
AU - Liang, Jian
AU - Li, Gen
AU - Wang, Fei
AU - Zhang, Changshui
AU - Dey, Dipak K.
AU - Chen, Kun
N1 - Publisher Copyright:
© 2018 Elsevier Inc.
PY - 2018/9
Y1 - 2018/9
N2 - Multivariate outcomes with multivariate features of possibly high dimension are routinely produced in various fields. In many real-world problems, the collected outcomes are of mixed types, including continuous measurements, binary indicators and counts, and a substantial proportion of values may also be missing. Regardless of their types, these mixed outcomes are often interrelated, representing diverse reflections or views of the same underlying data generation mechanism. As such, an integrative multivariate model can be beneficial. We develop a mixed-outcome reduced-rank regression, which effectively enables information sharing among different prediction tasks. Our approach integrates mixed and partially observed outcomes belonging to the exponential dispersion family, by assuming that all the outcomes are associated through a shared low-dimensional subspace spanned by the features. A general singular value regularized criterion is proposed, and we establish a non-asymptotic performance bound for the proposed estimators in the context of supervised learning with mixed outcomes from an exponential family and under a general sampling scheme of missing data. An iterative singular value thresholding algorithm is developed for optimization with convergence guarantee. The effectiveness of our approach is demonstrated by simulation studies and an application on predicting health-related outcomes in longitudinal studies of aging.
AB - Multivariate outcomes with multivariate features of possibly high dimension are routinely produced in various fields. In many real-world problems, the collected outcomes are of mixed types, including continuous measurements, binary indicators and counts, and a substantial proportion of values may also be missing. Regardless of their types, these mixed outcomes are often interrelated, representing diverse reflections or views of the same underlying data generation mechanism. As such, an integrative multivariate model can be beneficial. We develop a mixed-outcome reduced-rank regression, which effectively enables information sharing among different prediction tasks. Our approach integrates mixed and partially observed outcomes belonging to the exponential dispersion family, by assuming that all the outcomes are associated through a shared low-dimensional subspace spanned by the features. A general singular value regularized criterion is proposed, and we establish a non-asymptotic performance bound for the proposed estimators in the context of supervised learning with mixed outcomes from an exponential family and under a general sampling scheme of missing data. An iterative singular value thresholding algorithm is developed for optimization with convergence guarantee. The effectiveness of our approach is demonstrated by simulation studies and an application on predicting health-related outcomes in longitudinal studies of aging.
KW - Generalized linear model
KW - Integrative learning
KW - Missing data
KW - Multivariate regression
UR - http://www.scopus.com/inward/record.url?scp=85049349617&partnerID=8YFLogxK
U2 - 10.1016/j.jmva.2018.04.011
DO - 10.1016/j.jmva.2018.04.011
M3 - Article
AN - SCOPUS:85049349617
SN - 0047-259X
VL - 167
SP - 378
EP - 394
JO - Journal of Multivariate Analysis
JF - Journal of Multivariate Analysis
ER -