TY - JOUR
T1 - Automated Interpretation of Clinical Electroencephalograms Using Artificial Intelligence
AU - Tveit, Jesper
AU - Aurlien, Harald
AU - Plis, Sergey
AU - Calhoun, Vince D.
AU - Tatum, William O.
AU - Schomer, Donald L.
AU - Arntsen, Vibeke
AU - Cox, Fieke
AU - Fahoum, Firas
AU - Gallentine, William B.
AU - Gardella, Elena
AU - Hahn, Cecil D.
AU - Husain, Aatif M.
AU - Kessler, Sudha
AU - Kural, Mustafa Aykut
AU - Nascimento, Fábio A.
AU - Tankisi, Hatice
AU - Ulvin, Line B.
AU - Wennberg, Richard
AU - Beniczky, Sándor
N1 - Publisher Copyright:
© 2023 American Medical Association. All rights reserved.
PY - 2023/8/14
Y1 - 2023/8/14
N2 - Importance: Electroencephalograms (EEGs) are a fundamental evaluation in neurology but require special expertise unavailable in many regions of the world. Artificial intelligence (AI) has a potential for addressing these unmet needs. Previous AI models address only limited aspects of EEG interpretation such as distinguishing abnormal from normal or identifying epileptiform activity. A comprehensive, fully automated interpretation of routine EEG based on AI suitable for clinical practice is needed. Objective: To develop and validate an AI model (Standardized Computer-based Organized Reporting of EEG-Artificial Intelligence [SCORE-AI]) with the ability to distinguish abnormal from normal EEG recordings and to classify abnormal EEG recordings into categories relevant for clinical decision-making: epileptiform-focal, epileptiform-generalized, nonepileptiform-focal, and nonepileptiform-diffuse. Design, Setting, and Participants: In this multicenter diagnostic accuracy study, a convolutional neural network model, SCORE-AI, was developed and validated using EEGs recorded between 2014 and 2020. Data were analyzed from January 17, 2022, until November 14, 2022. A total of 30493 recordings of patients referred for EEG were included into the development data set annotated by 17 experts. Patients aged more than 3 months and not critically ill were eligible. The SCORE-AI was validated using 3 independent test data sets: a multicenter data set of 100 representative EEGs evaluated by 11 experts, a single-center data set of 9785 EEGs evaluated by 14 experts, and for benchmarking with previously published AI models, a data set of 60 EEGs with external reference standard. No patients who met eligibility criteria were excluded. Main Outcomes and Measures: Diagnostic accuracy, sensitivity, and specificity compared with the experts and the external reference standard of patients' habitual clinical episodes obtained during video-EEG recording. Results: The characteristics of the EEG data sets include development data set (N = 30493; 14980 men; median age, 25.3 years [95% CI, 1.3-76.2 years]), multicenter test data set (N = 100; 61 men, median age, 25.8 years [95% CI, 4.1-85.5 years]), single-center test data set (N = 9785; 5168 men; median age, 35.4 years [95% CI, 0.6-87.4 years]), and test data set with external reference standard (N = 60; 27 men; median age, 36 years [95% CI, 3-75 years]). The SCORE-AI achieved high accuracy, with an area under the receiver operating characteristic curve between 0.89 and 0.96 for the different categories of EEG abnormalities, and performance similar to human experts. Benchmarking against 3 previously published AI models was limited to comparing detection of epileptiform abnormalities. The accuracy of SCORE-AI (88.3%; 95% CI, 79.2%-94.9%) was significantly higher than the 3 previously published models (P <.001) and similar to human experts. Conclusions and Relevance: In this study, SCORE-AI achieved human expert level performance in fully automated interpretation of routine EEGs. Application of SCORE-AI may improve diagnosis and patient care in underserved areas and improve efficiency and consistency in specialized epilepsy centers..
AB - Importance: Electroencephalograms (EEGs) are a fundamental evaluation in neurology but require special expertise unavailable in many regions of the world. Artificial intelligence (AI) has a potential for addressing these unmet needs. Previous AI models address only limited aspects of EEG interpretation such as distinguishing abnormal from normal or identifying epileptiform activity. A comprehensive, fully automated interpretation of routine EEG based on AI suitable for clinical practice is needed. Objective: To develop and validate an AI model (Standardized Computer-based Organized Reporting of EEG-Artificial Intelligence [SCORE-AI]) with the ability to distinguish abnormal from normal EEG recordings and to classify abnormal EEG recordings into categories relevant for clinical decision-making: epileptiform-focal, epileptiform-generalized, nonepileptiform-focal, and nonepileptiform-diffuse. Design, Setting, and Participants: In this multicenter diagnostic accuracy study, a convolutional neural network model, SCORE-AI, was developed and validated using EEGs recorded between 2014 and 2020. Data were analyzed from January 17, 2022, until November 14, 2022. A total of 30493 recordings of patients referred for EEG were included into the development data set annotated by 17 experts. Patients aged more than 3 months and not critically ill were eligible. The SCORE-AI was validated using 3 independent test data sets: a multicenter data set of 100 representative EEGs evaluated by 11 experts, a single-center data set of 9785 EEGs evaluated by 14 experts, and for benchmarking with previously published AI models, a data set of 60 EEGs with external reference standard. No patients who met eligibility criteria were excluded. Main Outcomes and Measures: Diagnostic accuracy, sensitivity, and specificity compared with the experts and the external reference standard of patients' habitual clinical episodes obtained during video-EEG recording. Results: The characteristics of the EEG data sets include development data set (N = 30493; 14980 men; median age, 25.3 years [95% CI, 1.3-76.2 years]), multicenter test data set (N = 100; 61 men, median age, 25.8 years [95% CI, 4.1-85.5 years]), single-center test data set (N = 9785; 5168 men; median age, 35.4 years [95% CI, 0.6-87.4 years]), and test data set with external reference standard (N = 60; 27 men; median age, 36 years [95% CI, 3-75 years]). The SCORE-AI achieved high accuracy, with an area under the receiver operating characteristic curve between 0.89 and 0.96 for the different categories of EEG abnormalities, and performance similar to human experts. Benchmarking against 3 previously published AI models was limited to comparing detection of epileptiform abnormalities. The accuracy of SCORE-AI (88.3%; 95% CI, 79.2%-94.9%) was significantly higher than the 3 previously published models (P <.001) and similar to human experts. Conclusions and Relevance: In this study, SCORE-AI achieved human expert level performance in fully automated interpretation of routine EEGs. Application of SCORE-AI may improve diagnosis and patient care in underserved areas and improve efficiency and consistency in specialized epilepsy centers..
UR - http://www.scopus.com/inward/record.url?scp=85166638432&partnerID=8YFLogxK
U2 - 10.1001/jamaneurol.2023.1645
DO - 10.1001/jamaneurol.2023.1645
M3 - Article
C2 - 37338864
AN - SCOPUS:85166638432
SN - 2168-6149
VL - 80
SP - 805
EP - 812
JO - JAMA Neurology
JF - JAMA Neurology
IS - 8
ER -