TY - JOUR
T1 - Learning from local to global
T2 - An efficient distributed algorithm for modeling time-to-event data
AU - Duan, Rui
AU - Luo, Chongliang
AU - Schuemie, Martijn J.
AU - Tong, Jiayi
AU - Liang, C. Jason
AU - Chang, Howard H.
AU - Boland, Mary Regina
AU - Bian, Jiang
AU - Xu, Hua
AU - Holmes, John H.
AU - Forrest, Christopher B.
AU - Morton, Sally C.
AU - Berlin, Jesse A.
AU - Moore, Jason H.
AU - Mahoney, Kevin B.
AU - Chen, Yong
N1 - Publisher Copyright:
© 2020 The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved.
PY - 2020/7/1
Y1 - 2020/7/1
N2 - Objective: We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit a multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites. Materials and Methods: Using patient-level data from a single site combined with only aggregated information from other sites, we constructed a surrogate likelihood function, approximating the Cox partial likelihood function obtained using patient-level data from all sites. By maximizing the surrogate likelihood function, each site obtained a local estimate of the model parameter, and the ODAC estimator was constructed as a weighted average of all the local estimates. We evaluated the performance of ODAC with (1) a simulation study and (2) a real-world use case study using 4 datasets from the Observational Health Data Sciences and Informatics network. Results: On the one hand, our simulation study showed that ODAC provided estimates nearly the same as the estimator obtained by analyzing, in a single dataset, the combined patient-level data from all sites (ie, the pooled estimator). The relative bias was <0.1% across all scenarios. The accuracy of ODAC remained high across different sample sizes and event rates. On the other hand, the meta-analysis estimator, which was obtained by the inverse variance weighted average of the site-specific estimates, had substantial bias when the event rate is <5%, with the relative bias reaching 20% when the event rate is 1%. In the Observational Health Data Sciences and Informatics network application, the ODAC estimates have a relative bias <5% for 15 out of 16 log hazard ratios, whereas the meta-analysis estimates had substantially higher bias than ODAC. Conclusions: ODAC is a privacy-preserving and noniterative method for implementing time-to-event analyses across multiple sites. It provides estimates on par with the pooled estimator and substantially outperforms the meta-analysis estimator when the event is uncommon, making it extremely suitable for studying rare events and diseases in a distributed manner.
AB - Objective: We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit a multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites. Materials and Methods: Using patient-level data from a single site combined with only aggregated information from other sites, we constructed a surrogate likelihood function, approximating the Cox partial likelihood function obtained using patient-level data from all sites. By maximizing the surrogate likelihood function, each site obtained a local estimate of the model parameter, and the ODAC estimator was constructed as a weighted average of all the local estimates. We evaluated the performance of ODAC with (1) a simulation study and (2) a real-world use case study using 4 datasets from the Observational Health Data Sciences and Informatics network. Results: On the one hand, our simulation study showed that ODAC provided estimates nearly the same as the estimator obtained by analyzing, in a single dataset, the combined patient-level data from all sites (ie, the pooled estimator). The relative bias was <0.1% across all scenarios. The accuracy of ODAC remained high across different sample sizes and event rates. On the other hand, the meta-analysis estimator, which was obtained by the inverse variance weighted average of the site-specific estimates, had substantial bias when the event rate is <5%, with the relative bias reaching 20% when the event rate is 1%. In the Observational Health Data Sciences and Informatics network application, the ODAC estimates have a relative bias <5% for 15 out of 16 log hazard ratios, whereas the meta-analysis estimates had substantially higher bias than ODAC. Conclusions: ODAC is a privacy-preserving and noniterative method for implementing time-to-event analyses across multiple sites. It provides estimates on par with the pooled estimator and substantially outperforms the meta-analysis estimator when the event is uncommon, making it extremely suitable for studying rare events and diseases in a distributed manner.
KW - Cox proportional hazards model
KW - data integration
KW - distributed algorithm
KW - electronic health record
KW - meta-analysis
UR - http://www.scopus.com/inward/record.url?scp=85088492383&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocaa044
DO - 10.1093/jamia/ocaa044
M3 - Article
C2 - 32626900
AN - SCOPUS:85088492383
SN - 1067-5027
VL - 27
SP - 1028
EP - 1036
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 7
ER -