Harvard Electroencephalography Database: A comprehensive clinical electroencephalographic resource from four Boston hospitals

  • Chenxi Sun
  • , Jin Jing
  • , Niels Turley
  • , Callison Alcott
  • , Wan Yee Kang
  • , Andrew J. Cole
  • , Daniel M. Goldenholz
  • , Alice Lam
  • , Edilberto Amorim
  • , Catherine Chu
  • , Sydney Cash
  • , Valdery Moura Junior
  • , Aditya Gupta
  • , Manohar Ghanta
  • , Bruce Nearing
  • , Fábio A. Nascimento
  • , Aaron Struck
  • , Jennifer Kim
  • , Shadi Sartipi
  • , Alexandra Maria Tauton
  • Marta Fernandes, Haoqi Sun, Grace Bayas, Kaileigh Gallagher, Joost B. Wagenaar, Nishant Sinha, Christopher Lee-Messer, Christine Tsien Silvers, Bharath Gunapati, Jonathan Rosand, Jurriaan Peters, Tobias Loddenkemper, Jong Woo Lee, Sahar Zafar, M. Brandon Westover

Research output: Contribution to journalArticlepeer-review

Abstract

Objective: This article presents the Harvard Electroencephalography Database (HEEDB), a large-scale, deidentified, and standardized electroencephalographic (EEG) resource supporting artificial intelligence-driven and reproducible research in epilepsy and broader clinical neuroscience. Methods: HEEDB aggregates more than 280 000 EEG recordings from more than 108 000 patients across four Harvard-affiliated hospitals. Data are harmonized using the Brain Imaging Data Structure and hosted on the Brain Data Science Platform. EEG data are linked with clinical notes, International Classification of Diseases, 10th Revision codes, medications, and EEG reports. Deidentification follows Health Insurance Portability and Accountability Act Safe Harbor standards. Results: The database includes routine, epilepsy monitoring unit, and intensive care unit EEGs across all age groups, with 73% linked to deidentified clinical reports and 96% of those matched to recordings. Findings are extracted using expert curation, regular expressions, and medical natural language processing models. Auxiliary data include diagnoses, medications, and hospital course, supporting multimodal analysis. Significance: HEEDB fills a critical gap in EEG data availability for epilepsy research. By enabling large-scale, privacy-compliant, and clinically relevant analysis, it accelerates the development of diagnostic tools, improves training datasets for machine learning, and promotes data-sharing in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) and National Institutes of Health data policies.

Original languageEnglish
Pages (from-to)3411-3425
Number of pages15
JournalEpilepsia
Volume66
Issue number9
DOIs
StatePublished - Sep 2025

Keywords

  • AI for neurology
  • Data-driven EEG analysis
  • Deidentified clinical data
  • EEG data platform
  • EEG large-scale database

Fingerprint

Dive into the research topics of 'Harvard Electroencephalography Database: A comprehensive clinical electroencephalographic resource from four Boston hospitals'. Together they form a unique fingerprint.

Cite this