This repository contains Jupyter notebooks for semantic search in time series using IBM's TSPulse model, with a focus on the PTB-XL ECG dataset.
Install all dependencies from the project root:
pip install -r requirements.txtOr with conda:
conda install --file requirements.txt| Package | Purpose |
|---|---|
| granite-tsfm | TSPulse model for time series |
| transformers | Hugging Face models |
| torch | PyTorch |
| numpy, pandas | Data handling |
| scikit-learn | Metrics and utilities |
| matplotlib, seaborn | Plotting |
| wfdb | PhysioNet/WFDB (PTB-XL) |
PTB-XL ECG waveforms → time series
- Load PTB-XL data (WFDB)
- Convert each lead into a time series
- Organize by patient
- Export to NPZ for downstream notebooks
Semantic search over full time series
- Load time series from NPZ
- Extract semantic embeddings with TSPulse
- Find similar series via cosine similarity
- Visualize results
Semantic search over sliding windows
- Split series into ~2.5 s windows
- Embed each window
- Search for similar windows
- Show where matches lie in the original series
Time series + metadata search
- Combine TSPulse embeddings with metadata (age, sex, height, weight)
- Configurable weights (temporal vs metadata)
- Query by metadata criteria
- Suited for clinical-style retrieval
-
Install dependencies
pip install -r requirements.txt
-
Download PTB-XL
- PTB-XL 1.0.3 (~3 GB)
- Set the dataset path in the conversion notebook
-
Run notebooks in order
ptbxl_timeseries_conversion.ipynb— build NPZtspulse_semantic_search.ipynb— full-series searchtspulse_window_search.ipynb— window searchtspulse_hybrid_search.ipynb— hybrid search
TimeSeries_SLM/
├── requirements.txt
├── README.md
├── ptbxl_timeseries_conversion.ipynb
├── tspulse_semantic_search.ipynb
├── tspulse_window_search.ipynb
├── tspulse_hybrid_search.ipynb
└── series_ptbxl_tspulse.npz # generated by conversion notebook
- TSPulse expects time series with at least 512 points.
- For large datasets, run in batches.
- Notebooks check that required packages are installed.
See the repository and dataset licenses (PTB-XL, TSPulse) for terms of use.