Unraveling COVID-19: A Large-Scale Characterization of 4.5 Million COVID-19 Cases Using CHARYBDIS

Kristin Kostka, Talita Duarte-Salles, Albert Prats-Uribe, Anthony G. Sena, Andrea Pistillo, Sara Khalid, Lana Y.H. Lai, Asieh Golozar, Thamir M. Alshammari, Dalia M. Dawoud, Fredrik Nyberg, Adam B. Wilcox, Alan Andryc, Andrew Williams, Anna Ostropolets, Carlos Areia, Chi Young Jung, Christopher A. Harle, Christian G. Reich, Clair BlacketerDaniel R. Morales, David A. Dorr, Edward Burn, Elena Roel, Eng Hooi Tan, Evan Minty, Frank De Falco, Gabriel De Maeztu, Gigi Lipori, Hiba Alghoul, Hong Zhu, Jason A. Thomas, Jiang Bian, Jimyung Park, Jordi Martínez Roldán, Jose D. Posada, Juan M. Banda, Juan P. Horcajada, Julianna Kohler, Karishma Shah, Karthik Natarajan, Kristine E. Lynch, Li Liu, Lisa M. Schilling, Martina Recalde, Matthew Spotnitz, Mengchun Gong, Michael E. Matheny, Neus Valveny, Nicole G. Weiskopf, Nigam Shah, Osaid Alser, Paula Casajust, Rae Woong Park, Robert Schuff, Sarah Seager, Scott L. Du Vall, Seng Chan You, Seokyoung Song, Sergio Fernández-Bertolín, Stephen Fortin, Tanja Magoc, Thomas Falconer, Vignesh Subbian, Vojtech Huser, Waheed Ul Rahman Ahmed, William Carter, Yin Guan, Yankuic Galvan, Xing He, Peter R. Rijnbeek, George Hripcsak, Patrick B. Ryan, Marc A. Suchard, Daniel Prieto-Alhambra

Research output: Contribution to journalArticlepeer-review

10 Scopus citations


Purpose: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Health Associated Risks and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD. Patients and Methods: We conducted a descriptive retrospective database study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11th June 2020 and are iteratively updated via GitHub. We identified three nonmutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19, and 113,627 hospitalized with COVID-19 requiring intensive services. Results: We aggregated over 22,000 unique characteristics describing patients with COVID-19. All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts and are readily available online. Globally, we observed similarities in the USA and Europe: More women diagnosed than men but more men hospitalized than women, most diagnosed cases between 25 and 60 years of age versus most hospitalized cases between 60 and 80 years of age. South Korea differed with more women than men hospitalized. Common comorbidities included type 2 diabetes, hypertension, chronic kidney disease and heart disease. Common presenting symptoms were dyspnea, cough and fever. Symptom data availability was more common in hospitalized cohorts than diagnosed. Conclusion: We constructed a global, multi-centre view to describe trends in COVID-19 progression, management and evolution over time. By characterising baseline variability in patients and geography, our work provides critical context that may otherwise be misconstrued as data quality issues. This is important as we perform studies on adverse events of special interest in COVID-19 vaccine surveillance.

Original languageEnglish
Pages (from-to)369-384
Number of pages16
JournalClinical Epidemiology
StatePublished - 2022


  • Descriptive epidemiology
  • Open science
  • Real world data
  • Real world evidence


Dive into the research topics of 'Unraveling COVID-19: A Large-Scale Characterization of 4.5 Million COVID-19 Cases Using CHARYBDIS'. Together they form a unique fingerprint.

Cite this