Skip to content

Jathurshan0330/TFM-Tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

Story_fig_test_5

Abstract

Foundation models are reshaping EEG analysis, yet an important problem of EEG tokenization remains a challenge. This paper presents TFM-Tokenizer, a novel tokenization framework that learns a vocabulary of time-frequency motifs from single-channel EEG signals and encodes them into discrete tokens. We propose a dual-path architecture with time–frequency masking to capture robust motif representations, and it is model-agnostic, supporting both lightweight transformers and existing foundation models for downstream tasks. Our study demonstrates three key benefits: Accuracy: Experiments on four diverse EEG benchmarks demonstrate consistent performance gains across both single- and multi-dataset pretraining settings, achieving up to 17% improvement in Cohen’s Kappa over strong baselines. Generalization: Moreover, as a plug-and-play component, it consistently boosts the performance of diverse foundation models, including BIOT and LaBraM. Scalability: By operating at the single-channel level rather than relying on the strict 10–20 EEG system, our method has the potential to be device-agnostic. Experiments on ear-EEG sleep staging, which differs from the pretraining data in signal format, channel configuration, recording device, and task, show that our tokenizer outperforms baselines by 14%. A comprehensive token analysis reveals strong class-discriminative, frequency-aware, and consistent structure, enabling improved representation quality and interpretability.

Getting Started

conda create --name tfm_tokenizer python=3.10
conda activate tfm_tokenizer
pip install -r requirements.txt

Dataset Generation

The datasets used for this study can be accessed at:

./datasets_processing/data_set_processing.sh

TFM-Token Training

Update the "data_dir" field in ./configs/dataset_configs.yaml to the appropriate directory path. Then run the following script to pretrain TFM-Tokenizer, followed by pretraining of TFM-Encoder and fine-tuning.

For single dataset pretraining setting:

./tfm_tokenizer_training_script_single_dataset.sh

For multiple dataset pretraining setting:

./tfm_tokenizer_training_script_multiple_dataset.sh

TFM-Token Inference

The ./pretrained_weights directory provides our pretrained weights for both the TFM-Tokenizer and downstream transformer for both single and multiple dataset settings. Edit and run the following scripts to obtain evaluation results on the test set. ( need to uncomment based on the experiment setting in the .sh file)

./tfm_tokenizer_inference.sh

Token Visualization

We also provide ./token_visualization_samples.ipynb notebook with code to visualize the tokens from our tokenizer.

Citation

If you find our work or this repository useful, please consider giving a star ⭐ and citation.

@article{pradeepkumar2025single,
  title={Single-channel eeg tokenization through time-frequency modeling},
  author={Pradeepkumar, Jathurshan and Piao, Xihao and Chen, Zheng and Sun, Jimeng},
  journal={arXiv preprint arXiv:2502.16060},
  year={2025}
}

We appreciate your interest in our work! 😃😃😃😃😃

About

Official Code Repository of "Tokenizing Single-Channel EEG with Time-Frequency Motif Learning". arXiv: https://arxiv.org/abs/2502.16060

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published