A two-stage 3D deep learning solution for the RSNA Intracranial Aneurysm Detection competition on Kaggle.
End-to-end test AUROC: 78.4% (22nd of 1147, silver medal).
- Detects the presence and location of intracranial aneurysms from multimodal head scans (CTA, MRA, MRI T1 post-contrast, T2).
- Trained on 4,400 real clinical tests; outputs 14 probabilities (13 per-artery + 1 global).
- Evaluation metric: average of AUROCs across outputs with the competition’s weighting.
- Two stages to manage data scale and computational complexity while maintaining sensitivity:
- Circle of Willis localizer (Stage 1)
- Aneurysm detector (Stage 2)
- Modified ResNet backbones with affine instance normalization to cope with strong modality shifts.
- Cross-validated ensembles with test-time augmentation (horizontal flip).
- Inputs: 2D projections (AIP, MIP, SDP) along XY/XZ/YZ → 9-channel image from a 256³ resampled volume; per-channel clipping (1st-99th percentiles) and z-norm.
- Outputs: 3D bounding boxes (center and size in [0, 1]); targets derived from segmentation masks.
- Model: ResNet-like with grouped convolutions and affine instance norm; 6-value linear head.
- Augmentation: coupled random flips, scale, translation; intensity — brightness, contrast (per-channel), gamma, bias field, noise.
- Training: Huber loss; Adam; cosine LR with warmup; EMA; 4-fold CV; deterministic.
- Inference: 4-model ensemble + flip TTA; mean of 8 predictions.
- Result: mean CV IoU = 76.6% (no TTA); 76.7% with TTA.
- Inputs: z-normed 128³ resample of the Stage-1 RoI.
- Outputs: 13 per-artery probabilities; global value computed as max of the 13; targets derived from localizers within the RoI.
- Model: 3D ResNet-like with affine instance norm; sigmoid-activated linear head.
- Augmentation: 3D random flips, scale, rotation (world-aware), translation; intensity — bias field, brightness, contrast (per-slice), gamma, noise.
- Training: Focal loss; AdamW with weight decay; warmup then constant LR; EMA; small batches; mixed precision; 8-fold CV.
- Inference: 8-model ensemble + flip TTA; mean of 16 probabilities.
- Result: mean CV AUROC = 80.0% (no TTA); 80.2% with TTA.
- DICOM loading: custom Pydicom-based series reader that orients volumes to SPL and tracks voxel↔patient affine transforms.
- Disk cache: HDF5 (h5py) with int16 quantization; optional Zstd compression (helpful on Kaggle’s 20 GB limit).
- Data hygiene: excluded a small number of corrupted/transposed scans and a few broken segmentation masks; remaining masks corrected for flipping (detected via left and right artery centroid coordinate tests) and distant blobs (detected via DBSCAN).
- Stage-2 targets: Using derived targets outperformed training on the original targets or masking disagreements in the loss. The model likely picks up partially visible aneurysms; it is nevertheless trained on derived targets for best AUROC.
- Stage-2 batch size: Smaller batches generalized better. Batch size 8 (with LR scaled accordingly) improved validation AUROC by ~5% vs. 16, and was kept.
- Stage-2 multi-crop TTA: Running inference on nine offset crops yielded marginal gain (<0.1% AUROC); excluded due to complexity vs. benefit.
- CT windowing/normalization: HU windowing, masking, multichannel windows, percentile clipping, and dataset-level z-norm did not help; per-channel/per-volume z-norm remained.
- Augmentation library: TorchIO was CPU-bound and slow. A couple of intensity transforms were re-implemented in PyTorch for fast GPU augmentation.
- Skull stripping: SynthStrip tended to over-strip inferior Circle of Willis; not used.
- Effect of ensembling: +4% end-to-end test AUROC.
- Download
rsna-intracranial-aneurysm-detection.zipfrom Kaggle. - Unzip to
data/rsna-intracranial-aneurysm-detection. - Create a virtualenv and install
requirements.txt. - Train Stage 1 (localizer):
bin/ensemble/train_cowloc.sh→models/cowloc. - Optional CV:
bin/crossval.py cowloc -n 4. - Generate RoIs:
bin/pred_cowloc.py -n 4→output. - Train Stage 2 (detector):
bin/ensemble/train_icadet.sh→models/icadet. - Optional CV:
bin/crossval.py icadet -n 8. - Predict:
bin/predict.py --nc 4 --ni 8 --ctta --itta <series_path> ....
- Stage-2 safety margin: intended +10% box margin was inadvertently disabled during training; config reflects trained setup.
- Robust loading for hidden test: defaults applied for missing metadata and color→mono conversion improved test by ~2%. Only the training-time loader is released, for simplicity.
- Possible extension: learn global output with a dedicated head (reported to help by others) instead of max over 13 values.
- Python, NumPy, PyTorch, Pandas, Matplotlib, Jupyter, Linux
- PyTorch Lightning — flexible training
- Weights & Biases — experiment tracking
- Click — CLI framework
- H5py — caching layer
- Pydicom — DICOM handling
Special thanks to Kaggle and RSNA for the competition and data.
- Project code: CC BY-NC 4.0.
- Includes a modified file from PyTorch Lightning (Apache-2.0). See NOTICE for details.
