Skip to content

High-throughput, cycle-level neural signal analytics validated on clinical-scale iEEG data for biomarker discovery in human cognition.

License

Notifications You must be signed in to change notification settings

khan-u/eeg-feat-ext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

eeg-feat-ext 🧠

License: MIT
Python 3.9–3.12

Developed for high-throughput electrophysiology data, eeg-feat-ext transforms large-scale human brain recordings into clean, structured features β€” ready for downstream analysis, predictive modeling, and real-time monitoring pipelines.

Scalable Feature Engineering for Predictive Time-Series Analytics

eeg-feat-ext was purpose-built for high-throughput analysis of 21.8β€―GiB of raw iEEG data across 36 human subjects, culminating in a Nature (2024) publication:

β€œControl of working memory by phase–amplitude coupling of human hippocampal neurons”
πŸ“„ Read Article
πŸ“ DANDI Dataset


❓ The Big Question

Task: Subjects memorized short listsβ€”1 item (low load) or 3 items (high load)β€”then judged whether a probe item had appeared in the original list.

Hypothesis: Are differences in oscillatory synchrony (PAC) between memory loads driven by subtle waveform shape artifacts, or do they reflect true cognitive state dynamics?

β€œTo determine the influence of waveform shape on phase–amplitude coupling (PAC)... we used the bycycle (eeg-feat-ext) toolbox... then tested peak-to-trough and rise-to-decay asymmetries across task conditions.”


πŸ”§ What This Pipeline Does

  • Extracts condition-specific neural features from noisy iEEG signals
  • Controls for waveform artifacts to ensure signal fidelity
  • Outputs large-scale, structured CSV files ready for ML models and dashboards

🎯 Why It Matters

  • Enables detection of latent cognitive states in brain data
  • Statistically separates memory load conditions via waveform metrics
  • Optimized for production-grade throughput and reproducibility

βœ… Key Findings

β€œWe did not find evidence for any of those factors.”
β€” Referring to waveform asymmetries as confounds

β€œThese findings suggest that PAC is related to ongoing WM processes during the maintenance period in the hippocampus.”


πŸš€ Real-World Impact

Human Cognitive Biomarker Identified:
Coordinated brain oscillations in the hippocampus track working memory load.

  • βœ… Ruled out spurious signal artifacts using waveform shape controls
  • βœ… Confirmed oscillatory synchrony (PAC) as a reliable index of memory maintenance
  • πŸ“Š Findings visualized in Extended Figure 3

♻️ Transferable Analytics + ML

Neuroscience Application Industry Parallel
Memory classification via waveform shape Behavioral segmentation, attention modeling
Phase-amplitude coupling (PAC) as latent signal User state dynamics, anomaly detection
High-density iEEG signal processing IoT / biosensor analytics
Multi-subject LFP preprocessing on CPU Big data ETL in low-resource environments

πŸ“¦ Pipeline Highlights

Feature Description
Waveform decomposition Sub-ms segmentation using Hilbert transform
Feature generation Amplitude, symmetry, sharpness, slope, duration per cycle
Interoperability MATLAB ↔ Python via shared metadata files
Big-data scalability Supports real-time analysis on large datasets
Output format Structured CSVs per brain region, with optional merged tables

  • Hybrid orchestration in MATLAB + Python
  • Reproducible, cycle-level waveform feature extraction
  • Validated on real-world data (21.8 GiB, 36 subjects Γ— 500+ channels)
  • Runs entirely on CPU β€” no GPU required

βš™οΈ Pipeline Mechanics

  • Preprocessing (MATLAB) β†’ metaDataExt.mat β†’ RunBycycle.py (Python) β†’ Cycle-level CSV features β†’ Merged subject-level CSV

  • MATLAB filters and exports trial-wise LFP
  • Metadata is extended and passed to Python
  • Python extracts per-cycle waveform metrics
  • Per-region CSVs are saved and then auto-merged

πŸ“Š Pipeline-Extracted Features (.csv)

.data/cycle_features

└── SubjectID1/
β€ƒβ€ƒβ”œβ”€β”€ BrainRegion2/
  │  └── πŸ“„ SubjectID1_BrainRegion2_bycycle_features_YYYYMMDD_#####_sample.csv
  └── πŸ“„ SubjectID1_merged_bycycle_features_YYYYMMDD_#####_sample.csv


  • Per-region CSVs are generated after validation
  • Subject-level CSV created after all region-level files are verified

πŸ“ Pipeline-Generated Logs

Log File Description
MATLAB console log Full preprocessing, orchestration, and LFP filtering steps
Python log Extraction trace: per-cycle metrics, CSV generation, merging

Dependencies

eeg-feat-ext is built on existing Python tools for neual time-series analytics, these dependencies are listed below:

  • Bycycle β€” Cycle-by-cycle feature extraction by Cole & Voytek, Journal of Neurophysiology (2019) πŸ“˜ Docs
  • NeuroDSP β€” Neural time-series signal processing by Cole et al., Journal of Open Source Software (2019) πŸ“˜ Docs

βš™οΈ Quick Start

  1. Clone the source code:
git clone https://github.com/khan-u/eeg-feat-ext.git
cd eeg-feat-ext
  1. Install MATLAB-Python Engine (requires a MATLAB license)
cd "C:\Program Files\MATLAB\R2024b\extern\engines\python"
python setup.py install
  1. Run the main pipeline entry point from MATLAB for automated downstream hand-off to Python:
MAIN.m

πŸ—‚οΈ Repo Tree Β 

eeg-feat-ext/                           # Root of the feature-extraction pipeline
β”œβ”€β”€ data/                               # Subject data organized for preprocessing and export
β”‚   β”œβ”€β”€ raw/                            # Original iEEG/LFP input files (untouched)
β”‚   β”‚   └── .gitkeep                    # Keeps directory version-controlled before data exists
β”‚   β”œβ”€β”€ pre-processed/                  # MATLAB-filtered LFP trials per region
β”‚   β”‚   └── .gitkeep                    # Placeholder until MATLAB generates data
β”‚   └── cycle_features/                 # Bycycle CSV outputs per subject/region
β”‚       β”œβ”€β”€ SubjectID1/                 # Example subject container
β”‚       β”‚   β”œβ”€β”€ BrainRegion1/           # e.g., Hippocampus or Amygdala
β”‚       β”‚   β”‚   └── .gitkeep            # Placeholder before region CSVs exist
β”‚       β”‚   └── BrainRegion2/
β”‚       β”‚       └── SubjectID1_BrainRegion2_bycycle_features_YYYYMMDD_#####_sample.csv
β”‚       β”‚                               # Per-trial, per-channel waveform metrics output
β”‚       └── SubjectID2/                 # Identical structure for each subject
β”‚           β”œβ”€β”€ BrainRegion1/
β”‚           β”‚   └── .gitkeep
β”‚           └── BrainRegion2/
β”‚               └── .gitkeep
β”œβ”€β”€ figures/                            # (Optional) output plots and analysis figures
β”‚   └── .gitkeep                        # Ensures directory is version-controlled even if empty
β”œβ”€β”€ logs/                               # MATLAB-side logs from MAIN.m and helpers
β”‚   └── MATLAB-console-log_YYYYMMDD_HHMMSS.log   # Full transcript of preprocessing and orchestration steps
β”œβ”€β”€ recycle/                            # Archived metaData.mat snapshots for reproducibility
β”‚   └── .gitkeep                        # Retains directory before first archival entry
β”œβ”€β”€ RunBycycle/                         # Logs from the Python-side Bycycle extraction
β”‚   └── PythonLog_YYYYMMDD_HHMMSS.log   # Bycycle runtime debug/info trace for subject/session runs
β”œβ”€β”€ src/                                # Core logic: hybrid MATLAB + Python pipeline code
β”‚   β”œβ”€β”€ call_python_bycycle.m           # Calls RunBycycle.py with metaDataExt as input
β”‚   β”œβ”€β”€ checkMetaDataFile.m             # Verifies presence of expected keys in metaData.mat
β”‚   β”œβ”€β”€ compareVersions.m               # Reports version drift warnings (MATLAB modules)
β”‚   β”œβ”€β”€ createDirectories.m             # Sets up output folders and structure
β”‚   β”œβ”€β”€ definePaths.m                   # Constructs absolute/relative project paths
β”‚   β”œβ”€β”€ extractLFP.m                    # Loads LFP signals into memory for each trial
β”‚   β”œβ”€β”€ extractMatFilesToText.m         # (Optional) Writes a human-readable view of .mat contents
β”‚   β”œβ”€β”€ filterSubjectsLFP.m             # Removes subjects missing required brain-region LFP
β”‚   β”œβ”€β”€ loadMetaData.m                  # Reads base metaData.mat into MATLAB workspace
β”‚   β”œβ”€β”€ logMessage.m                    # Simple multi-level logger for structured console output
β”‚   β”œβ”€β”€ processSubjects.m               # Dynamically builds subject data structs
β”‚   β”œβ”€β”€ recycleMetaData.m               # Moves current metaData.mat to `/recycle` with timestamp
β”‚   β”œβ”€β”€ RunBycycle.py                   # πŸš€ Python script to compute per-cycle waveform features via Bycycle
β”‚   β”œβ”€β”€ saveExtendedMetadata.m          # Merges metadata and saves metaDataExt.mat
β”‚   β”œβ”€β”€ saveFolderTree.m                # (Optional) Prints the full directory structure to .txt
β”‚   β”œβ”€β”€ selectSubjectsAndRegions.m      # Lets user choose specific sessions/regions to run 
β”‚   β”œβ”€β”€ setupLogging.m                  # Creates timestamped MATLAB log file
β”‚   β”œβ”€β”€ setupProject.m                  # Initializes project paths and folder checks
β”‚   β”œβ”€β”€ verifyAndMoveFiles.m            # Moves data into expected structure if misplaced
β”‚   β”œβ”€β”€ verifyProcessedDataFiles.m      # Confirms presence of expected .mat LFP trials
β”‚   └── verifyRawDataFiles.m            # Checks raw files are present before MATLAB starts
β”œβ”€β”€ .gitattributes                      # Ensures consistent line endings across platforms
β”œβ”€β”€ LICENSE                             # Open-source MIT license for reuse and modification
β”œβ”€β”€ MAIN.m                              # πŸš€ Primary entry script that orchestrates full pipeline from MATLAB
β”œβ”€β”€ README.md                           # This documentation file 
└── requirements.txt                    # Python dependencies (Bycycle, NumPy, Pandas, etc.)

πŸ“‘ License

MIT License β€” free for academic and commercial use.