Skip to content

anapaulaappel/time-series-semantic-search

Repository files navigation

Time Series Semantic Search with TSPulse

This repository contains Jupyter notebooks for semantic search in time series using IBM's TSPulse model, with a focus on the PTB-XL ECG dataset.

Requirements

Installation

Install all dependencies from the project root:

pip install -r requirements.txt

Or with conda:

conda install --file requirements.txt

Main Dependencies

Package Purpose
granite-tsfm TSPulse model for time series
transformers Hugging Face models
torch PyTorch
numpy, pandas Data handling
scikit-learn Metrics and utilities
matplotlib, seaborn Plotting
wfdb PhysioNet/WFDB (PTB-XL)

Notebooks

1. ptbxl_timeseries_conversion.ipynb

PTB-XL ECG waveforms → time series

  • Load PTB-XL data (WFDB)
  • Convert each lead into a time series
  • Organize by patient
  • Export to NPZ for downstream notebooks

2. tspulse_semantic_search.ipynb

Semantic search over full time series

  • Load time series from NPZ
  • Extract semantic embeddings with TSPulse
  • Find similar series via cosine similarity
  • Visualize results

3. tspulse_window_search.ipynb

Semantic search over sliding windows

  • Split series into ~2.5 s windows
  • Embed each window
  • Search for similar windows
  • Show where matches lie in the original series

4. tspulse_hybrid_search.ipynb

Time series + metadata search

  • Combine TSPulse embeddings with metadata (age, sex, height, weight)
  • Configurable weights (temporal vs metadata)
  • Query by metadata criteria
  • Suited for clinical-style retrieval

Quick Start

  1. Install dependencies

    pip install -r requirements.txt
  2. Download PTB-XL

    • PTB-XL 1.0.3 (~3 GB)
    • Set the dataset path in the conversion notebook
  3. Run notebooks in order

    • ptbxl_timeseries_conversion.ipynb — build NPZ
    • tspulse_semantic_search.ipynb — full-series search
    • tspulse_window_search.ipynb — window search
    • tspulse_hybrid_search.ipynb — hybrid search

Project Layout

TimeSeries_SLM/
├── requirements.txt
├── README.md
├── ptbxl_timeseries_conversion.ipynb
├── tspulse_semantic_search.ipynb
├── tspulse_window_search.ipynb
├── tspulse_hybrid_search.ipynb
└── series_ptbxl_tspulse.npz   # generated by conversion notebook

References

Notes

  • TSPulse expects time series with at least 512 points.
  • For large datasets, run in batches.
  • Notebooks check that required packages are installed.

License

See the repository and dataset licenses (PTB-XL, TSPulse) for terms of use.

About

Semantic search in time series using IBM TSPulse and PTB-XL ECG data. Jupyter notebooks for full-series, window, and hybrid (metadata) search.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors