Skip to content

XinyuWuu/SegFormer-BDD100K-Finetune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SegFormer Finetuning on BDD100K for Autonomous Driving: An End-to-End Deployment Pipeline with TensorRT INT8 Quantization

This project demonstrates a complete, end-to-end pipeline for developing and deploying a high-performance visual perception model for autonomous driving. A state-of-the-art SegFormer model is fine-tuned on the BDD100K dataset for semantic segmentation. Subsequently, the model is optimized using the NVIDIA TensorRT SDK, converting it from PyTorch to highly efficient FP32, FP16, and INT8 inference engines. This optimization process is benchmarked to quantify the significant gains in performance and reduction in resource usage, making the model suitable for real-time applications.

Key Features

  • Model: NVIDIA's SegFormer (nvidia/segformer-b5-finetuned-ade-640-640 about 85 million parameters).
  • Dataset: BDD100K, a large-scale, industry-standard autonomous driving dataset.
  • Training Strategies: Implements both full model fine-tuning and a more efficient head-only fine-tuning approach.
  • Frameworks: PyTorch for training, Hugging Face Transformers for model handling.
  • Optimization Stack: A production-grade workflow using ONNX as an intermediate representation and NVIDIA TensorRT for final optimization and inference.
  • Quantization: Advanced optimization using INT8 quantization with a custom calibrator to achieve maximum performance.
  • Benchmarking: Rigorous, end-to-end benchmarking of performance (latency, throughput), memory usage, and accuracy (mIoU) across all model versions.

Training and Fine-Tuning

Two fine-tuning strategies were evaluated: a full fine-tuning of all model parameters and a head-only fine-tuning where the pre-trained encoder is frozen. Head-only fine-tuning proved to be significantly more efficient, using about 1/3 of the GPU memory (6G/18 G) and training 2.5x faster (7.3vs3 it/s) than full fine-tuning on an NVIDIA RTX 5880 Ada Generation.

  • Gray Line: Full Fine-Tuning
  • Cyan Line: Head-Only Fine-Tuning
Training Loss Validation Loss
Training Loss Validation Loss
Validation mIoU Validation Accuracy
mIoU Accuracy

Performance and Accuracy Benchmark Results

The models were benchmarked on an NVIDIA GeForce RTX 3060 12G. The results demonstrate the clear advantages of the TensorRT optimization pipeline.

Full Fine-Tuning (Epoch 5)

This model was selected for its peak validation mIoU during the full fine-tuning run.

Metric PyTorch (FP32) TensorRT (FP32) TensorRT (FP16) TensorRT (INT8)
Latency (ms) 107.80 69.62 29.29 26.86
Throughput (FPS) 9.28 14.36 34.14 37.23
Memory (MB) 1362.36 127.84 128.34 128.34
mIoU 0.53 0.06 0.06 0.07
Mean Accuracy 0.63 0.12 0.13 0.13

Head-Only Fine-Tuning (Epoch 10)

This model was selected as the best-performing checkpoint from the head-only fine-tuning run.

Metric PyTorch (FP32) TensorRT (FP32) TensorRT (FP16) TensorRT (INT8)
Latency (ms) 107.72 69.36 29.43 26.92
Throughput (FPS) 9.28 14.42 33.97 37.14
Memory (MB) 1362.36 127.84 128.34 128.34
mIoU 0.51 0.07 0.07 0.07
Mean Accuracy 0.59 0.11 0.12 0.12

Setup and Usage

1. Environment Setup

This project uses Conda for environment management. Ensure you have an NVIDIA GPU, the appropriate driver, the CUDA Toolkit and TensorRT installed.

# Create and activate the conda environment
conda create -n tensorrt_project python=3.12 -y
conda activate tensorrt_project

# Install other dependencies
pip install -r requirements.txt

# Download dataset
./download_BDD100K.sh

2. Running the Pipeline

The project is divided into several executable scripts located in the scripts/ directory. They should be run in the following order.

Step 1: Fine-Tune the Model This script fine-tunes the pre-trained SegFormer model on a subset of the BDD100K dataset and saves the weights.

# For full fine-tuning
python scripts/finetune.py --output_path "models/segformer_bdd100k_finetuned.pth"

# For head-only fine-tuning
python scripts/finetune.py --freeze-encoder --output_path "models/segformer_bdd100k_finetuned_headonly.pth"

Step 2: Export to ONNX This script converts the fine-tuned PyTorch model to the ONNX format.

# For full fine-tuned model
python scripts/onnx_export.py --finetuned_checkpoint "models/segformer_bdd100k_finetuned_epoch_005.pth" --onnx_output_path "models/segformer.onnx"

# For head-only model
python scripts/onnx_export.py --finetuned_checkpoint "models/segformer_bdd100k_finetuned_headonly_epoch_009.pth" --onnx_output_path "models/segformer_headonly.onnx"

Step 3: Build TensorRT Engines This script builds the optimized FP32, FP16, and INT8 engines from the ONNX file.

# For full fine-tuned model
python scripts/build_engine.py

# For head-only model
python scripts/build_engine.py --headonly

Step 4: Run Benchmarking and Visualization This final script runs a performance and accuracy comparison of all model versions and generates the visual result images.

# Benchmark the full fine-tuned model
python scripts/benchmark.py

# Benchmark the head-only model (modify SUFFIX in benchmark.py to "_headonly")
python scripts/benchmark.py

# Generate visualizations
python scripts/visualizatize.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors