Developed for high-throughput electrophysiology data, eeg-feat-ext transforms large-scale human brain recordings into clean, structured features β ready for downstream analysis, predictive modeling, and real-time monitoring pipelines.
eeg-feat-ext was purpose-built for high-throughput analysis of 21.8β―GiB of raw iEEG data across 36 human subjects, culminating in a Nature (2024) publication:
βControl of working memory by phaseβamplitude coupling of human hippocampal neuronsβ
π Read Article
π DANDI Dataset
Task: Subjects memorized short listsβ1 item (low load) or 3 items (high load)βthen judged whether a probe item had appeared in the original list.
Hypothesis: Are differences in oscillatory synchrony (PAC) between memory loads driven by subtle waveform shape artifacts, or do they reflect true cognitive state dynamics?
βTo determine the influence of waveform shape on phaseβamplitude coupling (PAC)... we used the bycycle (eeg-feat-ext) toolbox... then tested peak-to-trough and rise-to-decay asymmetries across task conditions.β
- Extracts condition-specific neural features from noisy iEEG signals
- Controls for waveform artifacts to ensure signal fidelity
- Outputs large-scale, structured CSV files ready for ML models and dashboards
- Enables detection of latent cognitive states in brain data
- Statistically separates memory load conditions via waveform metrics
- Optimized for production-grade throughput and reproducibility
βWe did not find evidence for any of those factors.β
β Referring to waveform asymmetries as confoundsβThese findings suggest that PAC is related to ongoing WM processes during the maintenance period in the hippocampus.β
Human Cognitive Biomarker Identified:
Coordinated brain oscillations in the hippocampus track working memory load.
- β Ruled out spurious signal artifacts using waveform shape controls
- β Confirmed oscillatory synchrony (PAC) as a reliable index of memory maintenance
- π Findings visualized in Extended Figure 3
Neuroscience Application | Industry Parallel |
---|---|
Memory classification via waveform shape | Behavioral segmentation, attention modeling |
Phase-amplitude coupling (PAC) as latent signal | User state dynamics, anomaly detection |
High-density iEEG signal processing | IoT / biosensor analytics |
Multi-subject LFP preprocessing on CPU | Big data ETL in low-resource environments |
Feature | Description |
---|---|
Waveform decomposition | Sub-ms segmentation using Hilbert transform |
Feature generation | Amplitude, symmetry, sharpness, slope, duration per cycle |
Interoperability | MATLAB β Python via shared metadata files |
Big-data scalability | Supports real-time analysis on large datasets |
Output format | Structured CSVs per brain region, with optional merged tables |
- Hybrid orchestration in MATLAB + Python
- Reproducible, cycle-level waveform feature extraction
- Validated on real-world data (21.8 GiB, 36 subjects Γ 500+ channels)
- Runs entirely on CPU β no GPU required
- Preprocessing (MATLAB) β metaDataExt.mat β RunBycycle.py (Python) β Cycle-level CSV features β Merged subject-level CSV
- MATLAB filters and exports trial-wise LFP
- Metadata is extended and passed to Python
- Python extracts per-cycle waveform metrics
- Per-region CSVs are saved and then auto-merged
.data/cycle_features
βββ SubjectID1/
βββββ BrainRegion2/
ββββββββ π SubjectID1_BrainRegion2_bycycle_features_YYYYMMDD_#####_sample.csv
βββββ π SubjectID1_merged_bycycle_features_YYYYMMDD_#####_sample.csv
- Per-region CSVs are generated after validation
- Subject-level CSV created after all region-level files are verified
Log File | Description |
---|---|
MATLAB console log | Full preprocessing, orchestration, and LFP filtering steps |
Python log | Extraction trace: per-cycle metrics, CSV generation, merging |
eeg-feat-ext is built on existing Python tools for neual time-series analytics, these dependencies are listed below:
- Bycycle β Cycle-by-cycle feature extraction by Cole & Voytek, Journal of Neurophysiology (2019) π Docs
- NeuroDSP β Neural time-series signal processing by Cole et al., Journal of Open Source Software (2019) π Docs
- Clone the source code:
git clone https://github.com/khan-u/eeg-feat-ext.git
cd eeg-feat-ext
- Install MATLAB-Python Engine (requires a MATLAB license)
cd "C:\Program Files\MATLAB\R2024b\extern\engines\python"
python setup.py install
- Run the main pipeline entry point from MATLAB for automated downstream hand-off to Python:
MAIN.m
eeg-feat-ext/ # Root of the feature-extraction pipeline
βββ data/ # Subject data organized for preprocessing and export
β βββ raw/ # Original iEEG/LFP input files (untouched)
β β βββ .gitkeep # Keeps directory version-controlled before data exists
β βββ pre-processed/ # MATLAB-filtered LFP trials per region
β β βββ .gitkeep # Placeholder until MATLAB generates data
β βββ cycle_features/ # Bycycle CSV outputs per subject/region
β βββ SubjectID1/ # Example subject container
β β βββ BrainRegion1/ # e.g., Hippocampus or Amygdala
β β β βββ .gitkeep # Placeholder before region CSVs exist
β β βββ BrainRegion2/
β β βββ SubjectID1_BrainRegion2_bycycle_features_YYYYMMDD_#####_sample.csv
β β # Per-trial, per-channel waveform metrics output
β βββ SubjectID2/ # Identical structure for each subject
β βββ BrainRegion1/
β β βββ .gitkeep
β βββ BrainRegion2/
β βββ .gitkeep
βββ figures/ # (Optional) output plots and analysis figures
β βββ .gitkeep # Ensures directory is version-controlled even if empty
βββ logs/ # MATLAB-side logs from MAIN.m and helpers
β βββ MATLAB-console-log_YYYYMMDD_HHMMSS.log # Full transcript of preprocessing and orchestration steps
βββ recycle/ # Archived metaData.mat snapshots for reproducibility
β βββ .gitkeep # Retains directory before first archival entry
βββ RunBycycle/ # Logs from the Python-side Bycycle extraction
β βββ PythonLog_YYYYMMDD_HHMMSS.log # Bycycle runtime debug/info trace for subject/session runs
βββ src/ # Core logic: hybrid MATLAB + Python pipeline code
β βββ call_python_bycycle.m # Calls RunBycycle.py with metaDataExt as input
β βββ checkMetaDataFile.m # Verifies presence of expected keys in metaData.mat
β βββ compareVersions.m # Reports version drift warnings (MATLAB modules)
β βββ createDirectories.m # Sets up output folders and structure
β βββ definePaths.m # Constructs absolute/relative project paths
β βββ extractLFP.m # Loads LFP signals into memory for each trial
β βββ extractMatFilesToText.m # (Optional) Writes a human-readable view of .mat contents
β βββ filterSubjectsLFP.m # Removes subjects missing required brain-region LFP
β βββ loadMetaData.m # Reads base metaData.mat into MATLAB workspace
β βββ logMessage.m # Simple multi-level logger for structured console output
β βββ processSubjects.m # Dynamically builds subject data structs
β βββ recycleMetaData.m # Moves current metaData.mat to `/recycle` with timestamp
β βββ RunBycycle.py # π Python script to compute per-cycle waveform features via Bycycle
β βββ saveExtendedMetadata.m # Merges metadata and saves metaDataExt.mat
β βββ saveFolderTree.m # (Optional) Prints the full directory structure to .txt
β βββ selectSubjectsAndRegions.m # Lets user choose specific sessions/regions to run
β βββ setupLogging.m # Creates timestamped MATLAB log file
β βββ setupProject.m # Initializes project paths and folder checks
β βββ verifyAndMoveFiles.m # Moves data into expected structure if misplaced
β βββ verifyProcessedDataFiles.m # Confirms presence of expected .mat LFP trials
β βββ verifyRawDataFiles.m # Checks raw files are present before MATLAB starts
βββ .gitattributes # Ensures consistent line endings across platforms
βββ LICENSE # Open-source MIT license for reuse and modification
βββ MAIN.m # π Primary entry script that orchestrates full pipeline from MATLAB
βββ README.md # This documentation file
βββ requirements.txt # Python dependencies (Bycycle, NumPy, Pandas, etc.)
MIT License β free for academic and commercial use.