TY - JOUR
T1 - Machine learning classifier models can identify acute respiratory distress syndrome phenotypes using readily available clinical data
AU - Sinha, Pratik
AU - Churpek, Matthew M.
AU - Calfee, Carolyn S.
N1 - Funding Information:
Supported by NIH grants HL140026 (C.S.C.), T32-GM008440 (P.S.), and R01-GM123193 (M.M.C.).
Publisher Copyright:
© 2020 by the American Thoracic Society
PY - 2020/10/1
Y1 - 2020/10/1
N2 - Rationale: Two distinct phenotypes of acute respiratory distress syndrome (ARDS) with differential clinical outcomes and responses to randomly assigned treatment have consistently been identified in randomized controlled trial cohorts using latent class analysis. Plasma biomarkers, key components in phenotype identification, currently lack point-of-care assays and represent a barrier to the clinical implementation of phenotypes. Objectives: The objective of this study was to develop models to classify ARDS phenotypes using readily available clinical data only. Methods: Three randomized controlled trial cohorts served as the training data set (ARMA [High vs. Low VT], ALVEOLI [Assessment of Low VT and Elevated End-Expiratory Pressure to Obviate Lung Injury], and FACTT [Fluids and Catheter Treatment Trial]; n = 2,022), and a fourth served as the validation data set (SAILS [Statins for Acutely Injured Lungs from Sepsis]; n = 745). A gradient-boosted machine algorithm was used to develop classifier models using 24 variables (demographics, vital signs, laboratory, and respiratory variables) at enrollment. In two secondary analyses, the ALVEOLI and FACTT cohorts each, individually, served as the validation data set, and the remaining combined cohorts formed the training data set for each analysis. Model performance was evaluated against the latent class analysis-derived phenotype. Measurements and Main Results: For the primary analysis, the model accurately classified the phenotypes in the validation cohort (area under the receiver operating characteristic curve [AUC], 0.95; 95% confidence interval [CI], 0.94-0.96). Using a probability cutoff of 0.5 to assign class, inflammatory biomarkers (IL-6, IL-8, and sTNFR-1; P, 0.0001) and 90-day mortality (38% vs. 24%; P = 0.0002) were significantly higher in the hyperinflammatory phenotype as classified by the model. Model accuracy was similar when ALVEOLI (AUC, 0.94; 95% CI, 0.92-0.96) and FACTT (AUC, 0.94; 95% CI, 0.92-0.95) were used as the validation cohorts. Significant treatment interactions were observed with the clinical classifier model-assigned phenotypes in both ALVEOLI (P = 0.0113) and FACTT (P = 0.0072) cohorts. Conclusions: ARDS phenotypes can be accurately identified using machine learning models based on readily available clinical data and may enable rapid phenotype identification at the bedside.
AB - Rationale: Two distinct phenotypes of acute respiratory distress syndrome (ARDS) with differential clinical outcomes and responses to randomly assigned treatment have consistently been identified in randomized controlled trial cohorts using latent class analysis. Plasma biomarkers, key components in phenotype identification, currently lack point-of-care assays and represent a barrier to the clinical implementation of phenotypes. Objectives: The objective of this study was to develop models to classify ARDS phenotypes using readily available clinical data only. Methods: Three randomized controlled trial cohorts served as the training data set (ARMA [High vs. Low VT], ALVEOLI [Assessment of Low VT and Elevated End-Expiratory Pressure to Obviate Lung Injury], and FACTT [Fluids and Catheter Treatment Trial]; n = 2,022), and a fourth served as the validation data set (SAILS [Statins for Acutely Injured Lungs from Sepsis]; n = 745). A gradient-boosted machine algorithm was used to develop classifier models using 24 variables (demographics, vital signs, laboratory, and respiratory variables) at enrollment. In two secondary analyses, the ALVEOLI and FACTT cohorts each, individually, served as the validation data set, and the remaining combined cohorts formed the training data set for each analysis. Model performance was evaluated against the latent class analysis-derived phenotype. Measurements and Main Results: For the primary analysis, the model accurately classified the phenotypes in the validation cohort (area under the receiver operating characteristic curve [AUC], 0.95; 95% confidence interval [CI], 0.94-0.96). Using a probability cutoff of 0.5 to assign class, inflammatory biomarkers (IL-6, IL-8, and sTNFR-1; P, 0.0001) and 90-day mortality (38% vs. 24%; P = 0.0002) were significantly higher in the hyperinflammatory phenotype as classified by the model. Model accuracy was similar when ALVEOLI (AUC, 0.94; 95% CI, 0.92-0.96) and FACTT (AUC, 0.94; 95% CI, 0.92-0.95) were used as the validation cohorts. Significant treatment interactions were observed with the clinical classifier model-assigned phenotypes in both ALVEOLI (P = 0.0113) and FACTT (P = 0.0072) cohorts. Conclusions: ARDS phenotypes can be accurately identified using machine learning models based on readily available clinical data and may enable rapid phenotype identification at the bedside.
KW - ARDS phenotypes
KW - Classifier models
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85092248474&partnerID=8YFLogxK
U2 - 10.1164/rccm.202002-0347OC
DO - 10.1164/rccm.202002-0347OC
M3 - Article
C2 - 32551817
AN - SCOPUS:85092248474
SN - 1073-449X
VL - 202
SP - 996
EP - 1004
JO - American Journal of Respiratory and Critical Care Medicine
JF - American Journal of Respiratory and Critical Care Medicine
IS - 7
ER -