GitHub - bowang-lab/ECG-FM: An electrocardiogram analysis foundation model.

ECG-FM is a foundation model for electrocardiogram (ECG) analysis. Committed to open-source practices, ECG-FM was developed in collaboration with the fairseq_signals framework, which implements a collection of deep learning methods for ECG analysis. This repository serves as a landing page and will host project-specific scripts as this work progresses.

Getting Started

🛠️ Installation

Clone fairseq_signals and refer to the requirements and installation section in the top-level README.

🚀 Quick Start

Please refer to our inference quickstart tutorial, which outlines inference and visualization pipelines.

📦 Model

Model checkpoints have been made publicly available for download on HuggingFace. Specifically, there is:

mimic_iv_ecg_physionet_pretrained.pt

Pretrained on MIMIC-IV-ECG v1.0 and PhysioNet 2021 v1.0.3.

mimic_iv_ecg_finetuned.pt

Finetuned from mimic_iv_ecg_physionet_pretrained.pt on MIMIC-IV-ECG v1.0 dataset.

ECG-FM has 90.9 million parameters, adopts the wav2vec 2.0 architecture, and was pretrained using the W2V+CMSC+RLM (WCR) method. Further details are available in our paper.

🫀 Data Preparation

We implemented a flexible, end-to-end, multi-source data preprocessing pipeline. Please refer to it here.

⚙️ Command-line Usage

The command-line inference tutorial describes the result extraction and post-processing. There is also a script for performing linear probing experiments.

All training is performed through the fairseq_signals framework. To maximize reproducibility, we have provided configuration files.

Pretraining

Our pretraining uses the mimic_iv_ecg_physionet_pretrained.yaml config (can modify w2v_cmsc_rlm.yaml as desired).

After modifying the relevant configuration file as desired, pretraining is performed using hydra's command line interface. This command highlights some popular config overrides:

FAIRSEQ_SIGNALS_ROOT="<TODO>"
MANIFEST_DIR="<TODO>/cmsc"
OUTPUT_DIR="<TODO>"

fairseq-hydra-train \
    task.data=$MANIFEST_DIR \
    dataset.valid_subset=valid \
    dataset.batch_size=64 \
    dataset.num_workers=10 \
    dataset.disable_validation=false \
    distributed_training.distributed_world_size=4 \
    optimization.update_freq=[2] \
    checkpoint.save_dir=$OUTPUT_DIR \
    checkpoint.save_interval=10 \
    checkpoint.keep_last_epochs=0 \
    common.log_format=csv \
    --config-dir $FAIRSEQ_SIGNALS_ROOT/examples/w2v_cmsc/config/pretraining \
    --config-name w2v_cmsc_rlm

Notes:

With CMSC pretraining, the batch size refers to pairs of adjacent segments. Therefore, the effective pretraining batch size is 64 pairs * 2 segments per pair * 4 GPUs * 2 gradient accumulations (update_freq) = 1024 segments.

Finetuning

Our finetuning uses the mimic_iv_ecg_finetuned.yaml config (can modify diagnosis.yaml as desired).

This command highlights some popular config overrides:

FAIRSEQ_SIGNALS_ROOT="<TODO>"
PRETRAINED_MODEL="<TODO>"
MANIFEST_DIR="<TODO>"
LABEL_DIR="<TODO>"
OUTPUT_DIR="<TODO>"
NUM_LABELS=$(($(wc -l < "$LABEL_DIR/label_def.csv") - 1))
POS_WEIGHT=$(cat $LABEL_DIR/pos_weight.txt)

fairseq-hydra-train \
    task.data=$MANIFEST_DIR \
    model.model_path=$PRETRAINED_MODEL \
    model.num_labels=$NUM_LABELS \
    optimization.lr=[1e-06] \
    optimization.max_epoch=140 \
    dataset.batch_size=256 \
    dataset.num_workers=5 \
    dataset.disable_validation=true \
    distributed_training.distributed_world_size=1 \
    distributed_training.find_unused_parameters=True \
    checkpoint.save_dir=$OUTPUT_DIR \
    checkpoint.save_interval=1 \
    checkpoint.keep_last_epochs=0 \
    common.log_format=csv \
    +task.label_file=$LABEL_DIR/y.npy \
    +criterion.pos_weight=$POS_WEIGHT \
    --config-dir $FAIRSEQ_SIGNALS_ROOT/examples/w2v_cmsc/config/finetuning/ecg_transformer \
    --config-name diagnosis

🏷️ Labeler

Functionality for our comphensive free-text pattern matching and knowledge graph-based label manipulation is showcased in the labeler.ipynb notebook.

💬 Questions

Inquiries may be directed to [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ckpts		ckpts
data		data
docs		docs
labeler		labeler
notebooks		notebooks
outputs		outputs
scripts		scripts
splits		splits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Getting Started

🛠️ Installation

🚀 Quick Start

📦 Model

🫀 Data Preparation

⚙️ Command-line Usage

Pretraining

Finetuning

🏷️ Labeler

💬 Questions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

bowang-lab/ECG-FM

Folders and files

Latest commit

History

Repository files navigation

Getting Started

🛠️ Installation

🚀 Quick Start

📦 Model

🫀 Data Preparation

⚙️ Command-line Usage

Pretraining

Finetuning

🏷️ Labeler

💬 Questions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages