Purpose: Spontaneous preterm birth is a leading cause of perinatal mortality in the United States, occurring disproportionately among non-Hispanic black women compared to other race-ethnicities. Clinicians lack tools to identify first-time mothers at risk for spontaneous preterm birth. This study assessed prediction of early (<32 weeks) spontaneous preterm birth among non-Hispanic black and white women by applying state-of-the-art machine-learning to multilevel data from a large birth cohort. Methods: Data from birth certificate and hospital discharge records for 336,214 singleton births to nulliparous women in California from 2007 to 2011 were used in cross-validated regressions, with multiple imputation for missing covariate data. Residential census tract information was overlaid for 281,733 births. Prediction was assessed with areas under the receiver operator characteristic curves (AUCs). Results: Cross-validated AUCs were low (0.62 [min = 0.60, max = 0.63] for non-Hispanic blacks and 0.63 [min = 0.61, max = 0.65] for non-Hispanic whites). Combining racial-ethnic groups improved prediction (cross-validated AUC = 0.67 [min = 0.65, max = 0.68]), approaching what others have achieved using biomarkers. Census tract-level information did not improve prediction. Conclusions: The resolution of administrative data was inadequate to precisely predict individual risk for early spontaneous preterm birth despite the use of advanced statistical methods.
- Machine learning
- Spontaneous preterm birth