TY - JOUR
T1 - Utility of inverse probability weighting in molecular pathological epidemiology
AU - Liu, Li
AU - Nevo, Daniel
AU - Nishihara, Reiko
AU - Cao, Yin
AU - Song, Mingyang
AU - Twombly, Tyler S.
AU - Chan, Andrew T.
AU - Giovannucci, Edward L.
AU - VanderWeele, Tyler J.
AU - Wang, Molin
AU - Ogino, Shuji
N1 - Funding Information:
Funding This work was supported by U.S. National Institutes of Health (NIH) grants [P01 CA87969 to M.J. Stampfer; UM1 CA186107 to M.J. Stampfer; R01 CA137178 to A.T.C.; K24 DK098311 to A.T.C.; R01 CA151993 to S.O.; R35 CA197735 to S.O.; K07 CA190673 to R.N.]; and Nodal Award (to S.O.) from the Dana-Farber Harvard Cancer Center. L.L. is supported by the grant from National Natural Science Foundation of China No. 81302491, a scholarship grant from Chinese Scholarship Council and a fellowship grant from Huazhong University of Science and Technology. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2017, Springer Science+Business Media B.V., part of Springer Nature.
PY - 2018/4/1
Y1 - 2018/4/1
N2 - As one of causal inference methodologies, the inverse probability weighting (IPW) method has been utilized to address confounding and account for missing data when subjects with missing data cannot be included in a primary analysis. The transdisciplinary field of molecular pathological epidemiology (MPE) integrates molecular pathological and epidemiological methods, and takes advantages of improved understanding of pathogenesis to generate stronger biological evidence of causality and optimize strategies for precision medicine and prevention. Disease subtyping based on biomarker analysis of biospecimens is essential in MPE research. However, there are nearly always cases that lack subtype information due to the unavailability or insufficiency of biospecimens. To address this missing subtype data issue, we incorporated inverse probability weights into Cox proportional cause-specific hazards regression. The weight was inverse of the probability of biomarker data availability estimated based on a model for biomarker data availability status. The strategy was illustrated in two example studies; each assessed alcohol intake or family history of colorectal cancer in relation to the risk of developing colorectal carcinoma subtypes classified by tumor microsatellite instability (MSI) status, using a prospective cohort study, the Nurses’ Health Study. Logistic regression was used to estimate the probability of MSI data availability for each cancer case with covariates of clinical features and family history of colorectal cancer. This application of IPW can reduce selection bias caused by nonrandom variation in biospecimen data availability. The integration of causal inference methods into the MPE approach will likely have substantial potentials to advance the field of epidemiology.
AB - As one of causal inference methodologies, the inverse probability weighting (IPW) method has been utilized to address confounding and account for missing data when subjects with missing data cannot be included in a primary analysis. The transdisciplinary field of molecular pathological epidemiology (MPE) integrates molecular pathological and epidemiological methods, and takes advantages of improved understanding of pathogenesis to generate stronger biological evidence of causality and optimize strategies for precision medicine and prevention. Disease subtyping based on biomarker analysis of biospecimens is essential in MPE research. However, there are nearly always cases that lack subtype information due to the unavailability or insufficiency of biospecimens. To address this missing subtype data issue, we incorporated inverse probability weights into Cox proportional cause-specific hazards regression. The weight was inverse of the probability of biomarker data availability estimated based on a model for biomarker data availability status. The strategy was illustrated in two example studies; each assessed alcohol intake or family history of colorectal cancer in relation to the risk of developing colorectal carcinoma subtypes classified by tumor microsatellite instability (MSI) status, using a prospective cohort study, the Nurses’ Health Study. Logistic regression was used to estimate the probability of MSI data availability for each cancer case with covariates of clinical features and family history of colorectal cancer. This application of IPW can reduce selection bias caused by nonrandom variation in biospecimen data availability. The integration of causal inference methods into the MPE approach will likely have substantial potentials to advance the field of epidemiology.
KW - Etiologic heterogeneity
KW - Marginal structural model
KW - Missing at random
KW - Neoplasm
KW - Selection bias
KW - Unique disease principle
UR - http://www.scopus.com/inward/record.url?scp=85038612809&partnerID=8YFLogxK
U2 - 10.1007/s10654-017-0346-8
DO - 10.1007/s10654-017-0346-8
M3 - Article
C2 - 29264788
AN - SCOPUS:85038612809
SN - 0393-2990
VL - 33
SP - 381
EP - 392
JO - European Journal of Epidemiology
JF - European Journal of Epidemiology
IS - 4
ER -