Deep-learning models displayed expert-level accuracy in detecting epileptiform discharges on electroencephalogram (EEG) recordings

1. Li and colleagues developed a deep-learning model to analyze EEG recordings and detect event-level EEG spikes.

2. The model achieved high accuracy and a low false-positive rate, with only 32% of human experts outperforming the model.

Evidence Rating Level: 2 (Good)

Study Rundown: Epileptiform discharges form spikes on EEGs, and physicians with specialized training could visually interpret findings to detect epilepsy. However, misinterpretation is common, and there is a lack of physicians trained in EEG interpretation in many parts of the world. Li and colleagues developed SpikeNet2, a deep-learning spike detection model, and assessed its performance using real-life patient EEGs. Performance metrics were the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and the area under a modified receiver operating characteristic curve (mAUC) metric designed to measure sensitivity versus the number of false positives per hour. For event-level spike classification, SpikeNet2 achieved an AUROC of 0.973, AUPRC of 0.995, and a mAUC of 0.997. It also outperformed most human experts on all metrics, with only 44% of experts’ operating points located above SpikeNet2’s ROC curve. SpikeNet2 achieved similar performance on EEG level classification. This study demonstrated that SpikeNet2 can accurately detect epileptiform discharges at both the event and EEG levels.

Click here to read the study in NEJM AI

Relevant Reading: Artificial intelligence in electroencephalography analysis for epilepsy diagnosis and management

RELATED REPORTS

Physical frailty is associated with increased risk of epilepsy

2 Minute Medicine Rewind May 18, 2026

Artificial intelligence decision support software improved spirometry diagnosis performance in primary care

In-Depth [retrospective cohort]: SpikeNet2 was validated using three independent datasets that were not included in model development. The datasets consisted of 17,812 EEGs from 13,523 patients located in the United States, Europe, and Africa. The EEG recordings included both event-level and EEG-level labels provided by clinical experts. For event-level performance, the metrics were AUROC, AUPRC, and mAUC. Only AUROC and AUPRC were used to assess EEG-level classification. Additionally, SpikeNet2’s performance was compared with human experts using ROC curves. For event-level spike classification, SpikeNet2 achieved an AUROC of 0.973 (95% confidence interval (CI), 0.961-0.982), AUPRC of 0.995 (95% CI, 0.993-0.997), and a mAUC of 0.997 (95% CI, 0.994-0.998). For EEG-level classification, SpikeNet2 achieved an AUROC of 0.958 (95% CI, 0.946-0.968) and an AUPRC of 0.959 (95% CI, 0.947-0.970). For both event-level and EEG-level classifications, SpikeNet2 outperformed most human experts, with only 44% (95% CI, 25-75) and 32% of human experts’ operating points located above SpikeNet2’s ROC curve for event-level and EEG-level classification, respectively. The validity of this model was strengthened by its large-scale multi-institutional training dataset, representing diverse patient demographics. However, the model was limited by its inability to analyze background EEG rhythms and lack of clinical contexts. Overall, this study provided promising evidence that deep-learning models can be used for the detection of epilepsy in settings where experienced neurologists are scarce.

Image: PD

©2026 2 Minute Medicine, Inc. All rights reserved. No works may be reproduced without expressed written consent from 2 Minute Medicine, Inc. Inquire about licensing here. No article should be construed as medical advice and is not intended as such by the authors or by 2 Minute Medicine, Inc.