Skip to content

Transformer-Based Cuffless Blood Pressure Estimation Driven by Speckle Contrast Optical Spectroscopy (SCOS)

License

Notifications You must be signed in to change notification settings

tianrui-qi/SCOS-BP

Repository files navigation

Please check the project presentation for a quick overview, the manuscript for a detailed description, and the web app at scos-bp.streamlit.app for interactive visualization of results (may take a few seconds to load on first visit).

Installation

Environment

This project is built using PyTorch Lightning (lightning=2.5) for deep learning model development, Hydra (hydra-core=1.3) for configuration management, and Plotly together with Streamlit for interactive visualization. Familiarity with these frameworks is recommended for further development. Python dependencies are managed with Conda. To set up the environment,

# clone the repository
git clone git@github.com:tianrui-qi/SCOS-BP.git
cd SCOS-BP
# create the conda environment
conda env create -f environment.yaml
conda activate scos-bp

Data and Pretrained Models

All demonstrations in this README are based on data and pretrained model checkpoints available on OSF. You can download them from command line as follows

# clone the OSF storage
osf --project yqpht clone
# merge OSF storage into the project root
rsync -av --progress yqpht/osfstorage/ ./
rm -r yqpht

If command rsync is not available on your system, you may use mv yqpht/osfstorage/* ./ instead but less safe. Make sure understand what these commands do before running them.

Sanity Check

We provide a sanity-check script to verify environment and data are correctly set up. This script trains the model on a single fixed batch for few steps using the reconstruction objective. To run the script,

python -m script.sanity

The loss printed out should decrease over time and final plot should show reconstruction starting to fit input. In addition, this script is configured by config/pipeline/sanity.yaml and supports command line overrides via Hydra's syntax. It's a good starting point to get familiar with configuration system of this project.

Data

Three files are provided on OSF under data/raw/,

  • x.npy Optical waveforms (33635 samples × 3 channels × 1000 time points), stored as float32. Channels including Finger 808nm BFi, Finger 808nm PPG, Wrist 808nm BFi.
  • y.npy Blood pressure (BP) waveforms (33635 samples × 1000 time points), stored as float32.
  • profile.csv Metadata for each sample.

The figure below illustrates sample preparation process from raw measurements. The bottom-right panel shows two example samples: the first sample passes quality control for all waveforms, while the second sample passes quality control for only two optical waveforms. Samples are retained even when some channels are missing; in such cases, missing channels are represented as NaNs in x.npy and y.npy.

fig-DataPreparation

A brief preview of profile.csv:

import pandas as pd
profile = pd.read_csv("data/raw/profile.csv")
print(profile)
subject group health system age measurement repeat arm pulse pulse_norm condition systole diastole
0 S001 original True False 28 S001 False False 0 0.0000 1 nan nan
1 S001 original True False 28 S001 False False 1 0.0023 1 nan nan
2 S001 original True False 28 S001 False False 2 0.0045 1 138.2190 92.0960
3 S001 original True False 28 S001 False False 3 0.0068 1 139.8688 91.0879
... ... ... ... ... ... ... ... ... ... ... ... ... ...
33631 H006 hypertensive False True 52 H006_R True True 207 0.9857 6 126.9918 103.4106
33632 H006 hypertensive False True 52 H006_R True True 208 0.9905 6 126.5592 103.0878
33633 H006 hypertensive False True 52 H006_R True True 209 0.9952 6 nan nan
33634 H006 hypertensive False True 52 H006_R True True 210 1.0000 6 nan nan

Columns subject to age are subject-level metadata, measurement to arm are measurement-level metadata, and pulse to diastole are sample-level metadata. Some samples (13191/33635) miss systolic/diastolic values due to quality control filtering. Several columns are derived from others for convenience:

profile["subject"] = profile["measurement"].str.split("_").str[0]
profile["health"] = profile["group"] != "hypertensive"
profile["system"] = profile["group"] != "original"
profile["pulse"] = profile.groupby("measurement").cumcount()
profile["pulse_norm"] = (profile.groupby("measurement")["pulse"].transform(
    lambda s: 0.0 if len(s) <= 1 else (s - s.min()) / (s.max() - s.min())
).round(4))

The figure below summarizes samples with a valid blood pressure waveform and at least one valid optical waveform (n = 31,105),

fig-DataProfile

To apply this project to your own data, the data should be organized into the same three-file structure (x.npy, y.npy, and profile.csv). These three files define the overall data interface assumed by the project. Different use cases may rely on only a subset of the data or metadata fields. For example, the dimensionality of x.npy (e.g., number of channels or time points) is flexible, and y.npy, which serves as labels during supervised training, is not required for self-supervised representation learning or downstream analysis. Please refer to the documentation and implementation of specific use cases for exact requirements.

Model

The figure below illustrates backbone architecture of model. For more details, please refer to implement in src/model/model.py.

fig-ModelArchitecture

Two pretrained models checkpoints are provided on OSF under ckpt/,

  • pretrain-t/epoch3885.ckpt Model pretrained on unsupervised reconstruction task using optical waveforms. (stage 1)
  • pretrain-h/last.ckpt Model further pretrained with supervised regression task using optical waveforms and blood pressure waveforms. (stage 2)

The figure below illustrates the complete three-stage training pipeline. Note that checkpoints for stage 3 are not provided, as this stage performs measurement-specific finetuning, where a separate model is trained for each measurement. This finetuning step is computationally lightweight and typically completes within minutes. Please refer to Finetune and Prediction section for details.

fig-ModelTraining

Pretrain

To reproduce the pretraining of provided models,

# pretrain configured by `config/pipeline/pretrain-t.yaml`
python -m script.pretrain +pipeline=pretrain-t
# pretrain configured by `config/pipeline/pretrain-h.yaml`
python -m script.pretrain +pipeline=pretrain-h

We use Hydra's syntax to define, manage, and override configuration parameters. All pretraining settings are defined declaratively in .yaml files under config/ and can be modified directly from the command line. For example, to reuse an existing configuration but change the batch size:

# pretrain configured by `config/pipeline/pretrain-t.yaml`
python -m script.pretrain +pipeline=pretrain-t data.batch_size=32

To define a new experiment, create a new .yaml file under config/, for example config/custom/experiment/01.yaml,

# @package _global_

defaults:
  - /schema/data@_here_
  - /schema/model@_here_
  - /schema/objective@_here_
  - /schema/trainer@_here_
  - _self_

name: experiment/01

data:
  batch_size: 32

and launch pretraining with

python -m script.pretrain +custom=experiment/01

Hydra also supports running multiple experiments with parameter combinations via hydra.mode=MULTIRUN. As an example, config/experiment/b/14.yaml defines a multi-run over data.batch_size. Please refer to Hydra's documentation for additional configuration features.

Note that training log and model checkpoints are automatically saved under log/$name/ and ckpt/$name/ respectively, where $name is defined in the configuration file. Remember to set different names for different experiments to avoid overwriting previous results. To check training log,

tensorboard --logdir log/

We highly modularized the pretraining pipeline into four components: data, model, objective, and trainer. We strictly followed PyTorch and PyTorch Lightning API in our implementation. More specifically,

src/
├── data/
│   ├── datamodule.py   # inherits: lightning.LightningDataModule
│   └── dataset.py      # inherits: torch.utils.data.Dataset
├── model/
│   └── model.py        # inherits: torch.nn.Module
├── objective/
│   └── pretrain.py     # inherits: lightning.LightningModule
└── trainer/
    └── trainer.py      # wrapper:  lightning.Trainer

Thus, current pipeline can be easily modify and extended by following the API. Please check the implementation for more details.

Evaluation

After representation learning, we compute representations for all samples and project them into a low-dimensional space using UMAP and PCA for visualization. We developed an web app at scos-bp.streamlit.app using Plotly for plotting, Streamlit as the frontend framework, and Streamlit Community Cloud for deployment. You can also run the app locally by

streamlit run website/app.py

Results for two pretrained models are provided under data/evaluation/. By default, the web app will load results from pretrain-t/profile.csv.parquet for demonstration. To explore other results, simply upload a .csv or .parquet file through dataframe tab in the web app interface.

If you wish to run the evaluation pipeline yourself on provided data and pretrained models,

python -m script.evaluation ckpt_load_path=ckpt/pretrain-t/epoch3885.ckpt
python -m script.evaluation ckpt_load_path=ckpt/pretrain-h/last.ckpt

If data_save_fold is not specified, the script assumes ckpt_load_path follows the pattern ckpt/$name/*.ckpt and set data_save_fold to data/evaluation/$name/ accordingly. Results are saved under data_save_fold including

  • profile.csv (for readability) and profile.csv.parquet (for visualization), an updated profile with appended UMAP/PCA coordinates.
  • r.npy containing representations of samples.
  • x.npy and y.npy with the same filtering rules (controlled by data.filter_level) applied during evaluation so that all outputs remain aligned in length and order.

Additional parameters defined in config/pipeline/evaluation.yaml can be overridden from command line through Hydra's syntax. For example, to evaluate custom data with a new model, adjust batch size to fit your hardware, and save results to a specific directory:

python -m script.evaluation \
    data_save_fold=path/to/your/data/save/folder/ \
    ckpt_load_path=path/to/your/checkpoint.ckpt \
    data.data_load_fold=path/to/your/data/load/folder/ \
    data.batch_size=32

Finetune and Prediction

To perform measurement-specific finetuning and prediction using a pretrained model,

python -m script.downstream ckpt_load_path=ckpt/pretrain-h/last.ckpt

If data_save_fold is not specified, the script assumes ckpt_load_path follows the pattern ckpt/$name/*.ckpt and set data_save_fold to data/downstream/$name/ accordingly. Results are saved under data_save_fold, including

  • z.npy containing predicted blood pressure waveform.
  • profile.csv, x.npy, and y.npy with the same filtering rules (controlled by data.filter_level) applied during finetuning and prediction so that all outputs remain aligned in length and order.

Additional parameters defined in config/pipeline/downstream.yaml can be overridden from command line through Hydra's syntax.

This implementation serves as a reference for running the finetuning and prediction pipeline. Further hyperparameter tuning is required for optimal performance.

Acknowledgements

This project was developed by Tianrui Qi during his Ph.D. lab rotation in Biomedical Optical Technologies Lab at Boston University. Thanks Dr. Darren Roblyer for hosting the rotation, and Dr. Ariane Garrett and Ana Perez for their support throughout the project.

References

  1. Garrett, A. et al. Speckle contrast optical spectroscopy for cuffless blood pressure estimation based on microvascular blood flow and volume oscillations. Biomedical Optics Express 16, 3004–3016 (2025). doi:10.1364/BOE.560022

  2. Yang, C., Westover, M. B. & Sun, J. BIOT: Cross-data biosignal learning in the wild (2023). arXiv:2305.10351

  3. Wang, Y., Li, T., Yan, Y., Song, W. & Zhang, X. How to evaluate your medical time series classification? (2024). arXiv: 2410.03057

Contributors

Languages