TY - JOUR
T1 - DISTRIBUTED PROPORTIONAL LIKELIHOOD RATIO MODEL WITH APPLICATION TO DATA INTEGRATION ACROSS CLINICAL SITES
AU - Luo, Chongliang
AU - Duan, Rui
AU - Edmondson, Mackenzie
AU - Shi, Jiasheng
AU - Maltenfort, Mitchell
AU - Morris, Jeffrey S.
AU - Forrest, Christopher B.
AU - Hubbard, Rebecca
AU - Chen, Yong
N1 - Publisher Copyright:
© Institute of Mathematical Statistics, 2024.
PY - 2024/3
Y1 - 2024/3
N2 - Real-world evidence synthesis through integration of data from distributed research networks has gained increasing attention in recent years. Due to privacy concerns and restrictions of sharing patient-level data, distributed algorithms that do not require sharing patient level information are in great need for facilitating multisite collaborations. On the other hand, data collected at multiple sites often come from diverse populations, and there exists a substantial amount of heterogeneity across sites in patient characteristics. Most of the existing distributed algorithms have ignored such betweensite heterogeneity. In this paper we aim to fill this methodological gap by proposing a general distributed algorithm. We develop our distributed algorithm based on a general semiparametric model, namely, the proportional likelihood ratio model (Biometrika 99 (2012) 211–222), which is a semiparametric extension of generalized linear model. We devise the proportional likelihood ratio model with site-specific baseline function, to account for between-site heterogeneity, and shared regression parameters to borrow information across sites. Under this flexible formulation, our distributed algorithm is designed to be privacy-preserving and communication-efficient (i.e., only one round of communication across sites is needed). We validate our method via simulation studies and demonstrate the utility of our method via a multisite study of pediatric avoidable hospitalization based on electronic health record data from a total of 354,672 patients across 26 different clinical sites within the Children’s Hospital of Philadelphia health system.
AB - Real-world evidence synthesis through integration of data from distributed research networks has gained increasing attention in recent years. Due to privacy concerns and restrictions of sharing patient-level data, distributed algorithms that do not require sharing patient level information are in great need for facilitating multisite collaborations. On the other hand, data collected at multiple sites often come from diverse populations, and there exists a substantial amount of heterogeneity across sites in patient characteristics. Most of the existing distributed algorithms have ignored such betweensite heterogeneity. In this paper we aim to fill this methodological gap by proposing a general distributed algorithm. We develop our distributed algorithm based on a general semiparametric model, namely, the proportional likelihood ratio model (Biometrika 99 (2012) 211–222), which is a semiparametric extension of generalized linear model. We devise the proportional likelihood ratio model with site-specific baseline function, to account for between-site heterogeneity, and shared regression parameters to borrow information across sites. Under this flexible formulation, our distributed algorithm is designed to be privacy-preserving and communication-efficient (i.e., only one round of communication across sites is needed). We validate our method via simulation studies and demonstrate the utility of our method via a multisite study of pediatric avoidable hospitalization based on electronic health record data from a total of 354,672 patients across 26 different clinical sites within the Children’s Hospital of Philadelphia health system.
KW - Distributed research network
KW - heterogeneity-aware distributed algorithms
KW - noniterative distributed algorithm
KW - privacy-preserving
KW - real-world evidence
UR - http://www.scopus.com/inward/record.url?scp=85185338995&partnerID=8YFLogxK
U2 - 10.1214/23-AOAS1779
DO - 10.1214/23-AOAS1779
M3 - Article
AN - SCOPUS:85185338995
SN - 1932-6157
VL - 18
SP - 63
EP - 79
JO - Annals of Applied Statistics
JF - Annals of Applied Statistics
IS - 1
ER -