Deep learning pipeline for lung nodule detection and classification using LIDC-IDRI dataset. Includes preprocessing, UNet-based segmentation, ResNet-based malignancy classification, and full training/visualization workflows.
- Segmentation: 2D U-Net trained on CT slices with Dice + BCE loss.
- Classification: ResNet18 fine-tuned on extracted nodule patches.
To run this project, make sure you have the following packages installed:
pip install -r requirements.txtBefore training or running any models, you must extract 2D image slices and segmentation masks from the original LIDC-IDRI dataset.
⚠️ Important: Follow the instructions in
LIDC-IDRI-Preprocessing-master/PREPROCESSING_README.md
This step uses the pylidc library to:
- Parse raw DICOM scans
- Segment lungs
- Extract nodule masks based on consensus
- Save 2D
.npyslices and their masks
Once completed, the following directories will be created:
LIDC-IDRI-Preprocessing-master/ ├── data/ │ ├── Image/ # 2D CT slices as .npy │ ├── Mask/ # Corresponding binary nodule masks │ └── Meta/ # CSV with metadata for each slice
or just replace the entire data folder in LIDC-IDRI-Preprocessing-master with this folder you can download here
After installing requirements and running the preprocessing script, open the main Jupyter notebook to run the complete pipeline:
The notebook includes:
- Trains on 2D slices from preprocessed CT scans.
- Uses Dice + BCE loss with a positive pixel weighting strategy.
- Saves the best model as
best_model.pth.
- Extracts 64×64 patches around nodules from the images using bounding boxes.
- Patches are saved under:
patches/benign/patches/malignant/
- Classifies each patch as benign or malignant.
- Best model saved as
best_classifier.pth.
- Randomly picks a nodule slice and patch.
- Displays:
- Original CT image
- Ground truth segmentation
- U-Net prediction
- Classification result vs. ground truth
-
You can download the weights for the trained segmentation model here
-
You can download the weights for the trained classifier here
- Segmentation Dice Score: ~0.75 (on validation set of 30 patients)
- Classification Accuracy: ~85% (benign vs malignant)
- Sample output:

- Source: LIDC-IDRI Dataset
- We used a subset of 150 patients for training and validation.
- Preprocessing handled using
pylidc.
- MONAI: Medical Open Network for AI - https://monai.io/
- Jaeho Lee. LIDC-IDRI-Preprocessing. GitHub repository.
- Matthew Hancock. pylidc. GitHub repository.
- The Cancer Imaging Archive. LIDC-IDRI. National Biomedical Imaging Archive.