TY - JOUR
T1 - Harvard Electroencephalography Database
T2 - A comprehensive clinical electroencephalographic resource from four Boston hospitals
AU - Sun, Chenxi
AU - Jing, Jin
AU - Turley, Niels
AU - Alcott, Callison
AU - Kang, Wan Yee
AU - Cole, Andrew J.
AU - Goldenholz, Daniel M.
AU - Lam, Alice
AU - Amorim, Edilberto
AU - Chu, Catherine
AU - Cash, Sydney
AU - Junior, Valdery Moura
AU - Gupta, Aditya
AU - Ghanta, Manohar
AU - Nearing, Bruce
AU - Nascimento, Fábio A.
AU - Struck, Aaron
AU - Kim, Jennifer
AU - Sartipi, Shadi
AU - Tauton, Alexandra Maria
AU - Fernandes, Marta
AU - Sun, Haoqi
AU - Bayas, Grace
AU - Gallagher, Kaileigh
AU - Wagenaar, Joost B.
AU - Sinha, Nishant
AU - Lee-Messer, Christopher
AU - Silvers, Christine Tsien
AU - Gunapati, Bharath
AU - Rosand, Jonathan
AU - Peters, Jurriaan
AU - Loddenkemper, Tobias
AU - Lee, Jong Woo
AU - Zafar, Sahar
AU - Westover, M. Brandon
N1 - Publisher Copyright:
© 2025 The Author(s). Epilepsia published by Wiley Periodicals LLC on behalf of International League Against Epilepsy.
PY - 2025/9
Y1 - 2025/9
N2 - Objective: This article presents the Harvard Electroencephalography Database (HEEDB), a large-scale, deidentified, and standardized electroencephalographic (EEG) resource supporting artificial intelligence-driven and reproducible research in epilepsy and broader clinical neuroscience. Methods: HEEDB aggregates more than 280 000 EEG recordings from more than 108 000 patients across four Harvard-affiliated hospitals. Data are harmonized using the Brain Imaging Data Structure and hosted on the Brain Data Science Platform. EEG data are linked with clinical notes, International Classification of Diseases, 10th Revision codes, medications, and EEG reports. Deidentification follows Health Insurance Portability and Accountability Act Safe Harbor standards. Results: The database includes routine, epilepsy monitoring unit, and intensive care unit EEGs across all age groups, with 73% linked to deidentified clinical reports and 96% of those matched to recordings. Findings are extracted using expert curation, regular expressions, and medical natural language processing models. Auxiliary data include diagnoses, medications, and hospital course, supporting multimodal analysis. Significance: HEEDB fills a critical gap in EEG data availability for epilepsy research. By enabling large-scale, privacy-compliant, and clinically relevant analysis, it accelerates the development of diagnostic tools, improves training datasets for machine learning, and promotes data-sharing in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) and National Institutes of Health data policies.
AB - Objective: This article presents the Harvard Electroencephalography Database (HEEDB), a large-scale, deidentified, and standardized electroencephalographic (EEG) resource supporting artificial intelligence-driven and reproducible research in epilepsy and broader clinical neuroscience. Methods: HEEDB aggregates more than 280 000 EEG recordings from more than 108 000 patients across four Harvard-affiliated hospitals. Data are harmonized using the Brain Imaging Data Structure and hosted on the Brain Data Science Platform. EEG data are linked with clinical notes, International Classification of Diseases, 10th Revision codes, medications, and EEG reports. Deidentification follows Health Insurance Portability and Accountability Act Safe Harbor standards. Results: The database includes routine, epilepsy monitoring unit, and intensive care unit EEGs across all age groups, with 73% linked to deidentified clinical reports and 96% of those matched to recordings. Findings are extracted using expert curation, regular expressions, and medical natural language processing models. Auxiliary data include diagnoses, medications, and hospital course, supporting multimodal analysis. Significance: HEEDB fills a critical gap in EEG data availability for epilepsy research. By enabling large-scale, privacy-compliant, and clinically relevant analysis, it accelerates the development of diagnostic tools, improves training datasets for machine learning, and promotes data-sharing in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) and National Institutes of Health data policies.
KW - AI for neurology
KW - Data-driven EEG analysis
KW - Deidentified clinical data
KW - EEG data platform
KW - EEG large-scale database
UR - https://www.scopus.com/pages/publications/105007543834
U2 - 10.1111/epi.18487
DO - 10.1111/epi.18487
M3 - Article
C2 - 40464151
AN - SCOPUS:105007543834
SN - 0013-9580
VL - 66
SP - 3411
EP - 3425
JO - Epilepsia
JF - Epilepsia
IS - 9
ER -