Skip to content

yastcher/physionet-ecg-image-digitization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ECG Image Digitization

Kaggle Competition Solution | PhysioNet ECG Image Digitization

Classical computer vision pipeline for extracting 12-lead ECG time-series data from scanned paper ECG printouts.

Problem Statement

Medical ECG records are often stored as paper printouts or scanned images. Converting these to digital signals enables:

  • Retrospective analysis of historical patient data
  • Integration with modern diagnostic systems
  • Machine learning applications for cardiac disease detection

Challenge: Extract accurate voltage-time signals from images with varying quality, grid patterns, rotation, and noise.

Solution Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                           ECG IMAGE INPUT                               │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  PREPROCESSING                                                          │
│  ├─ Rotation correction (Hough Transform on grid lines)                │
│  ├─ Grid removal (HSV color space for pink/red grids)                  │
│  └─ Trace mask extraction (adaptive thresholding)                      │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  ROI DETECTION                                                          │
│  ├─ Horizontal projection analysis for row detection                   │
│  ├─ Peak finding with minimum distance constraints                     │
│  └─ Signal column boundaries (calibration pulse → signal end)          │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  SIGNAL EXTRACTION                                                      │
│  ├─ Column-wise trace detection (topmost pixel method)                 │
│  ├─ Gap interpolation for discontinuities                              │
│  └─ Median filtering for noise reduction                               │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  CALIBRATION & OUTPUT                                                   │
│  ├─ Calibration pulse detection (1mV reference)                        │
│  ├─ Pixel-to-mV conversion                                             │
│  ├─ Resampling to target frequency (500 Hz)                            │
│  └─ Baseline wander removal (Butterworth high-pass filter)             │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     12-LEAD ECG TIME SERIES (WFDB)                      │
└─────────────────────────────────────────────────────────────────────────┘

ECG Layout Understanding

Standard 12-lead ECG paper format (4 rows × 2.5 seconds per lead):

Row 0-2.5s 2.5-5s 5-7.5s 7.5-10s
0 I aVR V1 V4
1 II aVL V2 V5
2 III aVF V3 V6
3 II rhythm strip (full 10 seconds)

Physical parameters:

  • Paper speed: 25 mm/s (horizontal)
  • Amplitude: 10 mm/mV (vertical)
  • Target sampling rate: 500 Hz
  • Duration: 10 seconds → 5000 samples per lead

Key Technical Decisions

Why Classical CV Instead of Deep Learning?

  1. Interpretability: Each pipeline step can be visualized and debugged
  2. No training data required: Works on any ECG format without retraining
  3. Computational efficiency: Runs on CPU in seconds per image
  4. Competitive results: PhysioNet 2024 winner used Hough Transform for rotation (99.7% accuracy)

Grid Removal Strategy

HSV color space provides robust separation of pink/red grid from black trace:

# Trace pixels: low Value (dark)
trace_mask = hsv[:, :, 2] < 100

Morphological fallback for grayscale images handles cases where color information is unavailable.

Calibration Approach

The 1mV calibration pulse at the left of each row provides the critical pixel-to-voltage conversion factor. Connected components analysis identifies the pulse height reliably even with broken traces.

Project Structure

├── src/
│   ├── config.py              # Constants, ECG layout definition
│   ├── preprocessing_v2.py    # Grid removal, ROI detection
│   ├── pipeline_v2.py         # Main digitizer (production)
│   ├── pipeline_v4.py         # Experimental improvements
│   ├── extraction.py          # Mask → 1D signal conversion
│   ├── calibration.py         # Amplitude scaling
│   └── evaluation.py          # SNR calculation
├── notebooks/
│   ├── 01_eda_and_baseline.ipynb
│   ├── 02_pipeline_demo.ipynb
│   └── 03_diagnostic.ipynb
├── diagnose.py                # Visual debugging tool
├── create_submission.py       # Kaggle submission generator
└── submission_v2.ipynb        # Kaggle notebook (self-contained)

Evaluation Metric

Signal-to-Noise Ratio (SNR):

SNR = 10 × log₁₀(Σ signal² / Σ noise²)

where noise = predicted - ground_truth

SNR (dB) Interpretation
< 0 Worse than predicting zeros
0-5 Basic signal recovery
5-10 Good quality (classical methods ceiling)
> 10 Excellent (typically requires deep learning)

Quick Start

# Install dependencies
pip install -e .

# Download competition data
kaggle competitions download -c physionet-ecg-image-digitization -p data/
unzip data/physionet-ecg-image-digitization.zip -d data/

# Run diagnostics on sample
python diagnose.py --sample-id 7663343 --save-plots

# Batch evaluation
python diagnose.py --batch 10

# Create submission
python create_submission.py --output submission.zip

Usage

from src.pipeline_v2 import ECGDigitizerV2

digitizer = ECGDigitizerV2()
result_df = digitizer.process(
    image_path='path/to/ecg.png',
    sampling_rate=500,
    duration_sec=10.0
)

# Result: DataFrame with columns [I, II, III, aVR, aVL, aVF, V1-V6]
# Each column contains 5000 samples (10s × 500Hz) in mV

Diagnostic Visualization

The diagnose.py script provides step-by-step visualization:

  1. Preprocessing: Original → Grid removed → Trace mask
  2. ROI Detection: Detected row boundaries overlaid on image
  3. Per-row extraction: Mask and extracted signal for each row
  4. Ground truth comparison: Predicted vs actual for all 12 leads

Tech Stack

Component Technology
Image Processing OpenCV, scikit-image
Signal Processing SciPy (Butterworth filter, interpolation)
Data Handling Pandas, NumPy
Visualization Matplotlib
Output Format WFDB (PhysioNet standard)

Known Limitations

  • Mobile phone photos: Perspective distortion not fully handled
  • Stained/deteriorated paper: Color-based grid removal may fail
  • Non-standard layouts: Assumes 3×4 + rhythm strip format
  • Amplitude accuracy: Depends on calibration pulse detection quality

References

License

MIT