Predicting hospitalization of COVID-19 positive patients using clinician-guided machine learning methods

Wenyu Song, Linying Zhang, Luwei Liu, Michael Sainlaire, Mehran Karvar, Min Jeoung Kang, Avery Pullman, Stuart Lipsitz, Anthony Massaro, Namrata Patil, Ravi Jasuja, Patricia C. Dykes

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


Objectives: The coronavirus disease 2019 (COVID-19) is a resource-intensive global pandemic. It is important for healthcare systems to identify high-risk COVID-19-positive patients who need timely health care. This study was conducted to predict the hospitalization of older adults who have tested positive for COVID-19. Methods: We screened all patients with COVID test records from 11 Mass General Brigham hospitals to identify the study population. A total of 1495 patients with age 65 and above from the outpatient setting were included in the final cohort, among which 459 patients were hospitalized. We conducted a clinician-guided, 3-stage feature selection, and phenotyping process using iterative combinations of literature review, clinician expert opinion, and electronic healthcare record data exploration. A list of 44 features, including temporal features, was generated from this process and used for model training. Four machine learning prediction models were developed, including regularized logistic regression, support vector machine, random forest, and neural network. Results: All 4 models achieved area under the receiver operating characteristic curve (AUC) greater than 0.80. Random forest achieved the best predictive performance (AUC ¼ 0.83). Albumin, an index for nutritional status, was found to have the strongest association with hospitalization among COVID positive older adults. Conclusions: In this study, we developed 4 machine learning models for predicting general hospitalization among COVID positive older adults. We identified important clinical factors associated with hospitalization and observed temporal patterns in our study cohort. Our modeling pipeline and algorithm could potentially be used to facilitate more accurate and efficient decision support for triaging COVID positive patients.

Original languageEnglish
Pages (from-to)1661-1667
Number of pages7
JournalJournal of the American Medical Informatics Association
Issue number10
StatePublished - 2022


  • COVID-19
  • electronic health record
  • hospitalization
  • machine learning
  • temporal patterns


Dive into the research topics of 'Predicting hospitalization of COVID-19 positive patients using clinician-guided machine learning methods'. Together they form a unique fingerprint.

Cite this