TY - JOUR
T1 - DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models
AU - Luo, Chongliang
AU - Islam, Md Nazmul
AU - Sheils, Natalie E.
AU - Buresh, John
AU - Reps, Jenna
AU - Schuemie, Martijn J.
AU - Ryan, Patrick B.
AU - Edmondson, Mackenzie
AU - Duan, Rui
AU - Tong, Jiayi
AU - Marks-Anglin, Arielle
AU - Bian, Jiang
AU - Chen, Zhaoyi
AU - Duarte-Salles, Talita
AU - Fernández-Bertolín, Sergio
AU - Falconer, Thomas
AU - Kim, Chungsoo
AU - Park, Rae Woong
AU - Pfohl, Stephen R.
AU - Shah, Nigam H.
AU - Williams, Andrew E.
AU - Xu, Hua
AU - Zhou, Yujia
AU - Lautenbach, Ebbing
AU - Doshi, Jalpa A.
AU - Werner, Rachel M.
AU - Asch, David A.
AU - Chen, Yong
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Linear mixed models are commonly used in healthcare-based association analyses for analyzing multi-site data with heterogeneous site-specific random effects. Due to regulations for protecting patients’ privacy, sensitive individual patient data (IPD) typically cannot be shared across sites. We propose an algorithm for fitting distributed linear mixed models (DLMMs) without sharing IPD across sites. This algorithm achieves results identical to those achieved using pooled IPD from multiple sites (i.e., the same effect size and standard error estimates), hence demonstrating the lossless property. The algorithm requires each site to contribute minimal aggregated data in only one round of communication. We demonstrate the lossless property of the proposed DLMM algorithm by investigating the associations between demographic and clinical characteristics and length of hospital stay in COVID-19 patients using administrative claims from the UnitedHealth Group Clinical Discovery Database. We extend this association study by incorporating 120,609 COVID-19 patients from 11 collaborative data sources worldwide.
AB - Linear mixed models are commonly used in healthcare-based association analyses for analyzing multi-site data with heterogeneous site-specific random effects. Due to regulations for protecting patients’ privacy, sensitive individual patient data (IPD) typically cannot be shared across sites. We propose an algorithm for fitting distributed linear mixed models (DLMMs) without sharing IPD across sites. This algorithm achieves results identical to those achieved using pooled IPD from multiple sites (i.e., the same effect size and standard error estimates), hence demonstrating the lossless property. The algorithm requires each site to contribute minimal aggregated data in only one round of communication. We demonstrate the lossless property of the proposed DLMM algorithm by investigating the associations between demographic and clinical characteristics and length of hospital stay in COVID-19 patients using administrative claims from the UnitedHealth Group Clinical Discovery Database. We extend this association study by incorporating 120,609 COVID-19 patients from 11 collaborative data sources worldwide.
UR - http://www.scopus.com/inward/record.url?scp=85127245114&partnerID=8YFLogxK
U2 - 10.1038/s41467-022-29160-4
DO - 10.1038/s41467-022-29160-4
M3 - Article
C2 - 35354802
AN - SCOPUS:85127245114
SN - 2041-1723
VL - 13
JO - Nature communications
JF - Nature communications
IS - 1
M1 - 1678
ER -