Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

Abstract

Foundation models are reshaping EEG analysis, yet an important problem of EEG tokenization remains a challenge. This paper presents TFM-Tokenizer, a novel tokenization framework that learns a vocabulary of time-frequency motifs from single-channel EEG signals and encodes them into discrete tokens. We propose a dual-path architecture with time–frequency masking to capture robust motif representations, and it is model-agnostic, supporting both lightweight transformers and existing foundation models for downstream tasks. Our study demonstrates three key benefits: Accuracy: Experiments on four diverse EEG benchmarks demonstrate consistent performance gains across both single- and multi-dataset pretraining settings, achieving up to 17% improvement in Cohen’s Kappa over strong baselines. Generalization: Moreover, as a plug-and-play component, it consistently boosts the performance of diverse foundation models, including BIOT and LaBraM. Scalability: By operating at the single-channel level rather than relying on the strict 10–20 EEG system, our method has the potential to be device-agnostic. Experiments on ear-EEG sleep staging, which differs from the pretraining data in signal format, channel configuration, recording device, and task, show that our tokenizer outperforms baselines by 14%. A comprehensive token analysis reveals strong class-discriminative, frequency-aware, and consistent structure, enabling improved representation quality and interpretability.

Getting Started

conda create --name tfm_tokenizer python=3.10
conda activate tfm_tokenizer
pip install -r requirements.txt

Dataset Generation

The datasets used for this study can be accessed at:

TUEV and TUAB: https://isip.piconepress.com/projects/nedc/html/tuh_eeg/
CHB-MIT: https://physionet.org/content/chbmit/1.0.0/
EarEEG (EESM23): https://openneuro.org/datasets/ds005178/versions/1.0.0 The ./dataset_processing folder contains scripts for processing the above data. You can execute the processing by running the script shown below. Ensure that you update the paths in the script to correctly access and save the data:

./datasets_processing/data_set_processing.sh

TFM-Token Training

Update the "data_dir" field in ./configs/dataset_configs.yaml to the appropriate directory path. Then run the following script to pretrain TFM-Tokenizer, followed by pretraining of TFM-Encoder and fine-tuning.

For single dataset pretraining setting:

./tfm_tokenizer_training_script_single_dataset.sh

For multiple dataset pretraining setting:

./tfm_tokenizer_training_script_multiple_dataset.sh

TFM-Token Inference

The ./pretrained_weights directory provides our pretrained weights for both the TFM-Tokenizer and downstream transformer for both single and multiple dataset settings. Edit and run the following scripts to obtain evaluation results on the test set. ( need to uncomment based on the experiment setting in the .sh file)

./tfm_tokenizer_inference.sh

Token Visualization

We also provide ./token_visualization_samples.ipynb notebook with code to visualize the tokens from our tokenizer.

Citation

If you find our work or this repository useful, please consider giving a star ⭐ and citation.

@article{pradeepkumar2025single,
  title={Single-channel eeg tokenization through time-frequency modeling},
  author={Pradeepkumar, Jathurshan and Piao, Xihao and Chen, Zheng and Sun, Jimeng},
  journal={arXiv preprint arXiv:2502.16060},
  year={2025}
}

We appreciate your interest in our work! 😃😃😃😃😃

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
datasets		datasets
datasets_processing		datasets_processing
models		models
pretrained_weigths		pretrained_weigths
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
downstream_transformer_finetuning.py		downstream_transformer_finetuning.py
downstream_transformer_masked_token_prediction_pretraining.py		downstream_transformer_masked_token_prediction_pretraining.py
requirements.txt		requirements.txt
tfm_tokenizer_inference.py		tfm_tokenizer_inference.py
tfm_tokenizer_inference.sh		tfm_tokenizer_inference.sh
tfm_tokenizer_token_learning.py		tfm_tokenizer_token_learning.py
tfm_tokenizer_training_script_multiple_dataset.sh		tfm_tokenizer_training_script_multiple_dataset.sh
tfm_tokenizer_training_script_single_dataset.sh		tfm_tokenizer_training_script_single_dataset.sh
token_visualization_samples.ipynb		token_visualization_samples.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

Abstract

Getting Started

Dataset Generation

TFM-Token Training

TFM-Token Inference

Token Visualization

Citation

About

Uh oh!

Releases

Packages

Languages

License

Jathurshan0330/TFM-Tokenizer

Folders and files

Latest commit

History

Repository files navigation

Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

Abstract

Getting Started

Dataset Generation

TFM-Token Training

TFM-Token Inference

Token Visualization

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages