Skip to content

Automated Clustering of Continuous EEG Data Accelerates Physician Review

Key findings

  • Researchers at Massachusetts General Hospital have validated an automated method to aggregate continuous EEG (cEEG) data into a small number of clusters
  • Neurophysiologists need to label only one data point in each cluster
  • The average length of the cEEG recordings in this study was 30 hours and the average number of clusters to be annotated was 27
  • The median time required to annotate each recording was about four minutes, about 45 times faster than the two to four hours that might be expected for unaided manual review

Continuous electroencephalography (cEEG) can detect ischemia, hemorrhage and elevated intracranial pressure as well as seizures, so it has become standard care for high-risk patients in the ICU. However, even experienced neurophysiologists find it quite burdensome to annotate the volume of cEEG data recorded during the typical ICU stay (that is, divide it into classes such as seizures, periodic discharges and sleep).

Jin Jing, PhD, research fellow in the Department of Neurology at Massachusetts General Hospital, and M. Brandon Westover, MD, PhD, neurologist and director of the Clinical Data Animation Center, and colleagues have created an automated method of clustering cEEG data that allows physicians to review it about 45 times faster than normal. They describe their process in the Journal of Neuroscience Methods.

Feature Extraction

The researchers selected archival cEEG data from 97 ICU patients, divided the recordings into two-second segments and computed 37 spectral and temporal features in each of four regions of the brain.

To include contextual information from the surrounding EEG, they repeated the process for segments of 6, 10 and 14 seconds, centered on each two-second segment. This resulted in 592 features that collectively described each two-second segment of cEEG.


For each patient, the researchers then applied:

  • Changepoint detection, a general method to find abrupt changes in sequential data—it was used here to divide cEEG data into discrete segments that could be annotated
  • A "bag-of-words" model, commonly applied in document classification to record the frequency of each word—the segments between the changepoints were considered "sentences" and the consecutive two-second segments were "words"
  • The "words" and "sentences" were clustered together using affinity propagation to assign each two-second segment to one of several data clusters


Three fellowship-trained epileptologists independently performed manual annotation of the cEEG data, but they labeled only the medoid in each data cluster—the one data point considered most representative of the others.

The researchers then trained a computer algorithm to learn from the experts' labeled medoids as well as pseudo-labeled data. The mean pairwise agreement between the final model and the experts was 62.6%, and among the experts it was 63.8%, demonstrating the validity of the clustering method.

At Mass General, the manual annotation takes about three hours per 24 hours of EEG. The average length of the cEEG recordings in this study was 30 hours and the average number of clusters to be annotated was 27. Because the three experts manually annotated only the cluster medoids, the median time required per recording was about four minutes, approximately 45 times faster than usual.

Toward Better Decisions

This clustering method can be expected to significantly improve workflows for clinical neurophysiologists. It should also improve accuracy by eliminating tiresomely long labeling sessions and enable physicians to evaluate exemplary EEG segments in more detail. Thus, automated aggregation of cEEG data will support better clinical decision-making for critically ill patients.

agreement between the algorithm and neurophysiologists

agreement among neurophysiologists

Learn more about the Department of Neurology at Mass General

Learn more about neurology research at Mass General

Related topics


SpikeNet, a computer algorithm that analyzes EEG waveforms, was more accurate than fellowship-trained clinical neurophysiologists and the standard commercial software at detecting interictal epileptiform discharges.


Massachusetts General Hospital researchers have created a deep learning convolutional neural network capable of discriminating adult ADHD patients from healthy controls.