This project implements drivable area segmentation using the U-Net architecture on the BDD100K dataset. The model identifies three key areas in driving scenes: ego lane (direct drivable area), adjacent lanes (alternative drivable areas), and background (non-drivable areas).
U-Net Architecture - Original paper by Ronneberger et al.
BDD100K Drivable Area Segmentation (3 Classes)
The dataset is automatically downloaded from Google Drive when you run the training script for the first time.
- Dataset Source: Pre-processed BDD100K drivable area data (180x320 resolution)
- Google Drive ID:
1sX6kHxpYoEICMTfjxxhK9lTW3B7OUxql - Size: ~100MB (compressed)
| Class ID | Category | Color (RGB) | Description |
|---|---|---|---|
| 0 | direct | (171, 44, 236) | Current/ego lane - the lane the vehicle is driving in |
| 1 | alternative | (86, 211, 19) | Adjacent/alternative lanes - other drivable lanes |
| 2 | background | (0, 0, 0) | Non-drivable areas - sidewalks, buildings, etc. |
This 3-class approach is essential for:
- Lane keeping assistance systems
- Autonomous navigation and path planning
- Drivable area detection for ADAS
- Real-time decision making in autonomous vehicles
| Metric | Value |
|---|---|
| Best Mean IoU | 75.07% |
| Best Validation Loss | 0.2200 |
| Final Training Loss | 0.0594 |
| Training Time | ~2 hours (RTX 3060) |
| Inference Speed | 30+ FPS (GPU) |
The model demonstrates excellent convergence with steady decrease in training and validation loss, consistent improvement in mean IoU metric, and no overfitting observed (validation tracks training).
The model accurately segments:
- Magenta regions: Ego lane (safe to drive straight)
- Green regions: Adjacent lanes (safe for lane changes)
- Black regions: Non-drivable areas (obstacles, sidewalks, buildings)
The model performs real-time segmentation on various driving scenarios:
The U-Net architecture consists of:
Encoder (Contracting Path)
- 4 downsampling blocks with max pooling
- Layer channels: [64, 128, 256, 512]
- Each block: 2× (Conv2D → BatchNorm → ReLU)
Bottleneck
- Double convolution at lowest resolution
- 1024 channels for maximum feature extraction
Decoder (Expanding Path)
- 4 upsampling blocks with skip connections
- Transposed convolutions for spatial resolution recovery
- Feature fusion via concatenation with encoder outputs
Output Layer
- 1×1 convolution for 3-class pixel-wise classification
- Total Parameters: 31,037,763
Loss Function: Dice Loss (multiclass)
Optimizer: Adam
Learning Rate: 3e-4 (OneCycleLR scheduler)
Batch Size: 8
Input Resolution: 180×320×3
Output Classes: 3
Data Split: 70% train, 20% val, 10% test- Python 3.8+
- CUDA-capable GPU (recommended for training)
- 8GB+ RAM
- 2GB+ disk space
- Clone the repository
git clone https://github.com/Mark-Moawad/UNet-Drivable-Area-Segmentation.git
cd UNet-Drivable-Area-Segmentation- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txtThe dataset (BDD100K drivable area subset) will be automatically downloaded on first run.
python unet_segmentation.pyThe script automatically:
- Downloads and extracts the BDD100K dataset (3,430 images)
- Splits data into train/val/test sets
- Trains the U-Net model with threshold-based early stopping
- Saves the best model checkpoint
- Generates training curves and prediction visualizations
To run inference on your own driving videos:
- Enable video processing in
unet_segmentation.py:
process_videos_flag = True-
Place videos in
data/dataset/testing/ -
Run inference:
python unet_segmentation.pyOutput videos with overlaid segmentation masks will be saved to data/processed/.
UNet-Drivable-Area-Segmentation/
├── unet_segmentation.py # Main training & inference pipeline
├── utils.py # Utility functions (metrics, visualization)
├── requirements.txt # Python dependencies
├── README.md # This file
│
├── data/
│ ├── dataset/ # BDD100K dataset (auto-downloaded)
│ │ ├── image_180_320.npy # Pre-processed images (3,430 samples)
│ │ ├── label_180_320.npy # Segmentation labels
│ │ └── testing/ # Test videos for inference
│ ├── models/ # Trained model checkpoints
│ │ ├── UNet_baseline.pt # Best model weights
│ │ └── UNet_baseline_training_stats.csv
│ ├── outputs/ # Training visualizations
│ │ ├── UNet_baseline_training_curves.png
│ │ └── UNet_baseline_predictions.png
│ ├── processed/ # Inference output videos
│
├── media/ # Photos and demo videos for README
│
└── venv/ # Python virtual environment
- U-Net Paper: Convolutional Networks for Biomedical Image Segmentation (Ronneberger et al., 2015)
- BDD100K Dataset: A Diverse Driving Video Database (Yu et al., 2018)
- Berkeley DeepDrive: Official Website
This project is licensed under the MIT License - see the LICENSE file for details.
Mark Moawad
Autonomous Systems Engineer | Computer Vision Specialist
This project demonstrates practical computer vision and deep learning skills for autonomous driving applications, showcasing end-to-end development from model training to production-ready inference.
- Original U-Net architecture by Ronneberger, Fischer, and Brox
- BDD100K dataset team at UC Berkeley
- PyTorch and segmentation_models_pytorch communities
For questions, collaboration opportunities, or professional inquiries:
- GitHub: @Mark-Moawad
- Email: [email protected]
- LinkedIn: https://www.linkedin.com/in/markmoawad96/












