Preprocessing pipelines for applying DIVER (Deep learning for Interictal EEG Variability Evaluation and Recognition) to downstream EEG analysis tasks.
This repository contains preprocessing scripts and documentation for three downstream tasks:
- CBraMod: Clinical brain monitoring (sleep staging, seizure detection)
- LEAD: Neurodegenerative disease diagnosis (AD/FTD classification)
- Repository: https://github.com/yourusername/DIVER
- Paper: [Link to DIVER paper]
- Description: Self-supervised learning framework for EEG representation learning
- π CBraMod/ - Clinical monitoring tasks
- π ISRUC-Sleep - Sleep stage classification (5-class)
- π§ CHB-MIT - Seizure detection (binary)
DIVER1.0_bh/
βββ CBraMod/
β βββ ISRUC_Sleep/ # Sleep staging: 100 subjects, 6 channels
β βββ CHBMIT_Seizure/ # Seizure detection: 21 patients, 16 channels
βββ LEAD/
βββ ADFTD/ # AD/FTD diagnosis: 88 subjects, 19 channels
Each task directory contains:
README.md- Quick start guide and complete documentation*_DATASET_INFO.md- Comprehensive dataset documentationscripts/- Preprocessing and validation scriptslogs/- Processing logs
All tasks follow a unified preprocessing pipeline designed for DIVER:
Raw EEG Data (EDF/REC format)
β
[1. Load & Extract Channels]
βββ ISRUC: 6 channels (F3, C3, O1, F4, C4, O2)
βββ CHBMIT: 16 bipolar channels (Double Banana montage)
βββ ADFTD: 19 channels (standard 10-20 system)
β
[2. Preprocessing & Artifact Removal]
βββ ISRUC: 0.3-35 Hz bandpass, 50 Hz notch, average reference
βββ CHBMIT: Channel extraction from raw EDF
βββ ADFTD: Clipping detection (amplitude, gradient, flatline) β
β
[3. Segmentation]
βββ ISRUC: 30-second epochs (aligned with annotations)
βββ CHBMIT: 10-second segments (with seizure oversampling)
βββ ADFTD: 10-second segments (non-overlapping)
β
[4. Label Assignment]
βββ ISRUC: 5-class (W, N1, N2, N3, REM)
βββ CHBMIT: Binary (non-seizure, seizure)
βββ ADFTD: 3-class (CN, AD, FTD)
β
[5. Resampling to 500 Hz] β DIVER standard sampling rate
β
[6. Reshape]
βββ ISRUC: (6, 6000) β (6, 30, 500)
βββ CHBMIT: (16, 2560) β (16, 10, 500)
βββ ADFTD: (19, 2560) β (19, 10, 500)
β
[7. Add Metadata] β Dataset info, electrode positions, subject info
β
[8. Store in LMDB] β Efficient batch loading for training
| Task | Channels | Segment | Original SR | Target SR | Output Shape |
|---|---|---|---|---|---|
| ISRUC-Sleep | 6 EEG | 30s epochs | 200 Hz | 500 Hz | (6, 30, 500) |
| CHB-MIT | 16 bipolar | 10s segments | 256 Hz | 500 Hz | (16, 10, 500) |
| ADFTD | 19 EEG | 10s segments | 256 Hz | 500 Hz | (19, 10, 500) |
All datasets now use the unified format:
{
"sample": np.array, # Signal data (channels, time_segments, samples)
"label": int, # Task-specific label
"data_info": { # Unified metadata
# Common fields (ISRUC-compatible)
"Dataset": str, # "ISRUC-Sleep", "CHBMIT-Seizure", "ADFTD"
"modality": "EEG",
"release": str, # Dataset version
"subject_id": str,
"task": str, # "sleep-staging", "seizure-detection", etc.
"resampling_rate": 500,
"original_sampling_rate": int,
"segment_index": int,
"start_time": float,
"channel_names": list,
# Task-specific fields
...
}
}CHBMIT and ADFTD were updated from v1 to v2 for consistency:
| Field | v1 (Legacy) | v2 (Current) | Status |
|---|---|---|---|
| Signal data | signal |
sample β
|
Standardized |
| Metadata | metadata |
data_info β
|
Standardized |
| Electrode info | elc_info (separate) |
Merged into data_info β
|
Unified |
| Common fields | Missing | Added (Dataset, modality, task) β
|
Complete |
ISRUC used the correct format from the start.
Migration: Use migrate_chbmit_v1_to_v2.py to convert existing v1 LMDB databases.
- Subjects: 100 healthy adults
- Epochs: ~89,283 (30-second epochs)
- Split: 80/10/10 (subjects 1-80 train, 81-90 val, 91-100 test)
- Storage: ~2.4 GB LMDB
- Distribution: N2 dominant (~40-50%), N1 rare (~5-10%)
- Patients: 21 pediatric epilepsy patients
- Segments: 327,834 (10-second segments with oversampling)
- Split: chb01-20 train, chb21-22 val, chb23-24 test
- Storage: ~99 GB LMDB
- Distribution: Severe imbalance (30:1 non-seizure:seizure after oversampling)
- Subjects: 88 patients (30 CN, 35 AD, 23 FTD)
- Segments: ~5,700 (10-second segments after artifact removal)
- Split: 70/15/15 (stratified by diagnosis)
- Storage: ~600 MB LMDB (all splits)
- Distribution: Slight imbalance (CN 34%, AD 40%, FTD 26%)
- β ISRUC-Sleep: Public (https://sleeptight.isr.uc.pt/)
- β CHB-MIT: Public (https://physionet.org/content/chbmit/)
- β ADFTD: Not public (contact LEAD team)
Note: This repository contains preprocessing scripts only. Raw data and LMDB files are not included.
# Clone repository
git clone https://github.com/bohee-connectome/DIVER1.0_bh.git
cd DIVER1.0_bh
# Navigate to desired task
cd CBraMod/ISRUC_Sleep # or CHBMIT_Seizure, or LEAD/ADFTD
# Install dependencies
pip install numpy scipy mne lmdb
# Run preprocessing (see task README for specific commands)
cd scripts
python preprocessing_*.pyIf you use this code, please cite:
@article{diver2024,
title={DIVER: Deep learning for Interictal EEG Variability Evaluation and Recognition},
author={Your Name},
journal={Journal Name},
year={2024}
}@inproceedings{cbramod2025,
title={CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding},
author={Lee, B. and Park, J. E. and others},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025}
}@article{khalighi2016isruc,
title={ISRUC-Sleep: A comprehensive public dataset for sleep researchers},
author={Khalighi, S. and Sousa, T. and Santos, J. M. and Nunes, U.},
journal={Computer Methods and Programs in Biomedicine},
volume={124},
pages={180--192},
year={2016},
publisher={Elsevier}
}@inproceedings{shoeb2009chbmit,
title={Application of Machine Learning to Epileptic Seizure Detection},
author={Shoeb, A.},
booktitle={Proceedings of the 26th International Conference on Machine Learning},
year={2009}
}@article{lead2024,
title={LEAD: Learning EEG Analysis for neurodegenerative Diseases},
author={Park, J. E. and Lee, B. and others},
journal={Journal of Neural Engineering},
year={2024}
}- Data Format: v2 (ISRUC-compatible, unified format)
- Last Updated: 2025-11-27
- Migration Available: v1 β v2 (for CHBMIT and ADFTD)
MIT License - See original project repositories for dataset-specific licenses.
For questions or issues, please open an issue on GitHub.