CLARUS-VAD: Contrastive Learning and Anomaly Detection for Respiratory Ultrasound Screening - Video-level Anomaly Detection
CLARUS_VAD is a dual-branch framework that detects lung consolidations in pediatric lung ultrasound (LUS) at the video level by fusing:
- an unsupervised branch: VAE → UMAP anomaly detection using Euclidean distance from a normal centroid, tuned by a fixed distance threshold; and
- a self-supervised branch: contrastive learning with a Convolutional Autoencoder (CAE) on positive/negative frame pairs plus a lightweight binary classifier head.
Frame-wise outputs from both branches are combined via a weighted fusion score to yield a single video-level decision. The system exposes tunable thresholds and weights to emphasize sensitivity or specificity for clinical needs. Evaluation is patient-independent via Leave-One-Out Cross-Validation (LOOCV). Key ideas (from the paper): - VAE encoder learns latent features; UMAP reduces to 2D; frames far from the normal centroid are flagged as anomalies (useful for large consolidations).
- CAE + contrastive loss learns similarity/dissimilarity on pairs (useful for small consolidations); a classifier head produces frame-level probabilities.
- Fusion score: a weighted combination of normalized SSL probability and normalized UMAP anomaly percentage produces the video label using a single cutoff. Parameters (SSLthreshold, FixedThreshold, SSLweight, UMAPweight, Tfusion) are tuned empirically.
-
PedLUS dataset (Lusaka, Zambia; mBSUS study): 200 children with pneumonia symptoms and 200 age/sex-matched healthy controls; 12 standardized sweeps per participant (six lung regions, sagittal & transverse).Curated and Annotated Dataset of Lung US Images in Zambian Children with Clinical Pneumonia
-
Expert-annotated subset: 176 sweeps from 57 children with start/end frame labels for small vs large consolidations.
-
Preprocessing: removal of textual artifacts prior to modeling.
-
Unsupervised branch training set: 265 videos (balanced: consolidation / no-consolidation) to fit the VAE/UMAP pipeline.
-
Self-supervised branch pairs: 940 frames from 97 videos to form positive/negative pairs (pairs are drawn from different videos).
-
Evaluation: LOOCV at the patient level (one patient held out for testing per fold; no validation set, fixed epoch training).
CLARUS_VAD, fusion models with tunable weights/thresholds (no external pretraining):
- Fusion-Model-1 — F1 0.62, Accuracy 0.51, Sensitivity 0.86, Specificity 0.20, Balanced Acc 0.53.
- Fusion-Model-2 — F1 0.66, Accuracy 0.58, Sensitivity 0.91, Specificity 0.31, Balanced Acc 0.62.
- Fusion-Model-3 — F1 0.68 (best), Accuracy 0.61, Sensitivity 0.91, Specificity 0.35, Balanced Acc 0.63.
Notes on interpretability and behavior:
- Grad-CAM visualizations (SSL branch) highlight clinically relevant regions for both small and large consolidations.
- UMAP distance plots show clear spikes at frames containing large consolidations relative to a normal centroid and fixed distance threshold.
Overall, CLARUS_VAD prioritizes high sensitivity (up to 0.91) with competitive F1 (0.68) while avoiding reliance on large labeled datasets or external pretraining, and offers clinically tunable parameters for deployment.