Kaggle Competition Solution | PhysioNet ECG Image Digitization
Classical computer vision pipeline for extracting 12-lead ECG time-series data from scanned paper ECG printouts.
Medical ECG records are often stored as paper printouts or scanned images. Converting these to digital signals enables:
- Retrospective analysis of historical patient data
- Integration with modern diagnostic systems
- Machine learning applications for cardiac disease detection
Challenge: Extract accurate voltage-time signals from images with varying quality, grid patterns, rotation, and noise.
┌─────────────────────────────────────────────────────────────────────────┐
│ ECG IMAGE INPUT │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ PREPROCESSING │
│ ├─ Rotation correction (Hough Transform on grid lines) │
│ ├─ Grid removal (HSV color space for pink/red grids) │
│ └─ Trace mask extraction (adaptive thresholding) │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ ROI DETECTION │
│ ├─ Horizontal projection analysis for row detection │
│ ├─ Peak finding with minimum distance constraints │
│ └─ Signal column boundaries (calibration pulse → signal end) │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ SIGNAL EXTRACTION │
│ ├─ Column-wise trace detection (topmost pixel method) │
│ ├─ Gap interpolation for discontinuities │
│ └─ Median filtering for noise reduction │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ CALIBRATION & OUTPUT │
│ ├─ Calibration pulse detection (1mV reference) │
│ ├─ Pixel-to-mV conversion │
│ ├─ Resampling to target frequency (500 Hz) │
│ └─ Baseline wander removal (Butterworth high-pass filter) │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ 12-LEAD ECG TIME SERIES (WFDB) │
└─────────────────────────────────────────────────────────────────────────┘
Standard 12-lead ECG paper format (4 rows × 2.5 seconds per lead):
| Row | 0-2.5s | 2.5-5s | 5-7.5s | 7.5-10s |
|---|---|---|---|---|
| 0 | I | aVR | V1 | V4 |
| 1 | II | aVL | V2 | V5 |
| 2 | III | aVF | V3 | V6 |
| 3 | II rhythm strip (full 10 seconds) |
Physical parameters:
- Paper speed: 25 mm/s (horizontal)
- Amplitude: 10 mm/mV (vertical)
- Target sampling rate: 500 Hz
- Duration: 10 seconds → 5000 samples per lead
- Interpretability: Each pipeline step can be visualized and debugged
- No training data required: Works on any ECG format without retraining
- Computational efficiency: Runs on CPU in seconds per image
- Competitive results: PhysioNet 2024 winner used Hough Transform for rotation (99.7% accuracy)
HSV color space provides robust separation of pink/red grid from black trace:
# Trace pixels: low Value (dark)
trace_mask = hsv[:, :, 2] < 100Morphological fallback for grayscale images handles cases where color information is unavailable.
The 1mV calibration pulse at the left of each row provides the critical pixel-to-voltage conversion factor. Connected components analysis identifies the pulse height reliably even with broken traces.
├── src/
│ ├── config.py # Constants, ECG layout definition
│ ├── preprocessing_v2.py # Grid removal, ROI detection
│ ├── pipeline_v2.py # Main digitizer (production)
│ ├── pipeline_v4.py # Experimental improvements
│ ├── extraction.py # Mask → 1D signal conversion
│ ├── calibration.py # Amplitude scaling
│ └── evaluation.py # SNR calculation
├── notebooks/
│ ├── 01_eda_and_baseline.ipynb
│ ├── 02_pipeline_demo.ipynb
│ └── 03_diagnostic.ipynb
├── diagnose.py # Visual debugging tool
├── create_submission.py # Kaggle submission generator
└── submission_v2.ipynb # Kaggle notebook (self-contained)
Signal-to-Noise Ratio (SNR):
SNR = 10 × log₁₀(Σ signal² / Σ noise²)
where noise = predicted - ground_truth
| SNR (dB) | Interpretation |
|---|---|
| < 0 | Worse than predicting zeros |
| 0-5 | Basic signal recovery |
| 5-10 | Good quality (classical methods ceiling) |
| > 10 | Excellent (typically requires deep learning) |
# Install dependencies
pip install -e .
# Download competition data
kaggle competitions download -c physionet-ecg-image-digitization -p data/
unzip data/physionet-ecg-image-digitization.zip -d data/
# Run diagnostics on sample
python diagnose.py --sample-id 7663343 --save-plots
# Batch evaluation
python diagnose.py --batch 10
# Create submission
python create_submission.py --output submission.zipfrom src.pipeline_v2 import ECGDigitizerV2
digitizer = ECGDigitizerV2()
result_df = digitizer.process(
image_path='path/to/ecg.png',
sampling_rate=500,
duration_sec=10.0
)
# Result: DataFrame with columns [I, II, III, aVR, aVL, aVF, V1-V6]
# Each column contains 5000 samples (10s × 500Hz) in mVThe diagnose.py script provides step-by-step visualization:
- Preprocessing: Original → Grid removed → Trace mask
- ROI Detection: Detected row boundaries overlaid on image
- Per-row extraction: Mask and extracted signal for each row
- Ground truth comparison: Predicted vs actual for all 12 leads
| Component | Technology |
|---|---|
| Image Processing | OpenCV, scikit-image |
| Signal Processing | SciPy (Butterworth filter, interpolation) |
| Data Handling | Pandas, NumPy |
| Visualization | Matplotlib |
| Output Format | WFDB (PhysioNet standard) |
- Mobile phone photos: Perspective distortion not fully handled
- Stained/deteriorated paper: Color-based grid removal may fail
- Non-standard layouts: Assumes 3×4 + rhythm strip format
- Amplitude accuracy: Depends on calibration pulse detection quality
- PhysioNet Challenge 2024
- SignalSavants Winner Code (nnU-Net approach)
- ECG-Image-Kit (synthetic data generation)