Machine learning classifier models can identify acute respiratory distress syndrome phenotypes using readily available clinical data

Pratik Sinha, Matthew M. Churpek, Carolyn S. Calfee

Research output: Contribution to journalArticlepeer-review

63 Scopus citations


Rationale: Two distinct phenotypes of acute respiratory distress syndrome (ARDS) with differential clinical outcomes and responses to randomly assigned treatment have consistently been identified in randomized controlled trial cohorts using latent class analysis. Plasma biomarkers, key components in phenotype identification, currently lack point-of-care assays and represent a barrier to the clinical implementation of phenotypes. Objectives: The objective of this study was to develop models to classify ARDS phenotypes using readily available clinical data only. Methods: Three randomized controlled trial cohorts served as the training data set (ARMA [High vs. Low VT], ALVEOLI [Assessment of Low VT and Elevated End-Expiratory Pressure to Obviate Lung Injury], and FACTT [Fluids and Catheter Treatment Trial]; n = 2,022), and a fourth served as the validation data set (SAILS [Statins for Acutely Injured Lungs from Sepsis]; n = 745). A gradient-boosted machine algorithm was used to develop classifier models using 24 variables (demographics, vital signs, laboratory, and respiratory variables) at enrollment. In two secondary analyses, the ALVEOLI and FACTT cohorts each, individually, served as the validation data set, and the remaining combined cohorts formed the training data set for each analysis. Model performance was evaluated against the latent class analysis-derived phenotype. Measurements and Main Results: For the primary analysis, the model accurately classified the phenotypes in the validation cohort (area under the receiver operating characteristic curve [AUC], 0.95; 95% confidence interval [CI], 0.94-0.96). Using a probability cutoff of 0.5 to assign class, inflammatory biomarkers (IL-6, IL-8, and sTNFR-1; P, 0.0001) and 90-day mortality (38% vs. 24%; P = 0.0002) were significantly higher in the hyperinflammatory phenotype as classified by the model. Model accuracy was similar when ALVEOLI (AUC, 0.94; 95% CI, 0.92-0.96) and FACTT (AUC, 0.94; 95% CI, 0.92-0.95) were used as the validation cohorts. Significant treatment interactions were observed with the clinical classifier model-assigned phenotypes in both ALVEOLI (P = 0.0113) and FACTT (P = 0.0072) cohorts. Conclusions: ARDS phenotypes can be accurately identified using machine learning models based on readily available clinical data and may enable rapid phenotype identification at the bedside.

Original languageEnglish
Pages (from-to)996-1004
Number of pages9
JournalAmerican journal of respiratory and critical care medicine
Issue number7
StatePublished - Oct 1 2020


  • ARDS phenotypes
  • Classifier models
  • Machine learning


Dive into the research topics of 'Machine learning classifier models can identify acute respiratory distress syndrome phenotypes using readily available clinical data'. Together they form a unique fingerprint.

Cite this