Uncertainty Evaluation for SOTIF Analysis of LiDAR Object Detection

This repository provides the evaluation methodology and experimental code for the following research:

Uncertainty Evaluation to Support Safety of the Intended Functionality Analysis for Identifying Performance Insufficiencies in ML-Based LiDAR Object Detection

Milin Patel and Rolf Jung, Kempten University of Applied Sciences (2026)

Uncertainty Representation in a SOTIF-Related Use Case with Dempster-Shafer Theory for LiDAR Sensor-Based Object Detection

Milin Patel and Rolf Jung, Kempten University of Applied Sciences (2025), arXiv:2503.02087

Motivation

ISO 21448 (SOTIF) requires systematic identification of performance insufficiencies and triggering conditions in automated driving perception. However, its analysis techniques assume explicitly specified system behaviour and cannot directly address ML-based systems whose decision logic is learned from data.

This work bridges the gap by using prediction uncertainty from deep ensembles (K=6 SECOND detectors) as a proxy for detection reliability, enabling three ISO 21448 activities:

SOTIF Activity	ISO 21448 Clause	Output
Performance insufficiency identification	Clause 7	Per-detection AUROC separating TP from FP
Triggering condition ranking	Clause 7	Conditions ranked by FP share and uncertainty
Acceptance criteria documentation	Clause 11	Operating points with coverage and FAR

Experimental Results

The methodology is evaluated on two datasets: real-world KITTI and synthetic CARLA (SOTIF-PCOD, 547 frames across 22 weather configurations).

Discrimination (AUROC)

Indicator	KITTI (Real-World)	CARLA (Synthetic)
Mean confidence	0.999	0.895
Confidence variance	0.889	0.738
Geometric disagreement	0.912	0.974

Calibration

Metric	KITTI	CARLA
ECE	0.202	0.257
NLL	0.235	0.557
Brier Score	0.049	0.197

Operating Points

At confidence-plus-variance gate (s >= 0.70, var <= 0.005) on KITTI: 25.8% coverage at zero FAR (120 of 465 proposals retained).

At confidence-plus-geometric gate (s >= 0.35, d_IoU <= 0.49) on CARLA: 38.3% coverage at zero FAR (737 of 1,924 proposals retained).

Full results, figures, and cross-dataset analysis are in results/.

Repository Structure

.
├── paper/                                 # Research paper sources
│   ├── SOTIF_Uncertainty_Conference_Paper.tex
│   └── SOTIF_Uncertainty_Conference_Paper.md
│
├── sotif_uncertainty/                     # Core Python package
│   ├── __init__.py                        # Public API
│   ├── uncertainty.py                     # Uncertainty indicators (Eqs. 1-3)
│   ├── ensemble.py                        # DBSCAN clustering + uncertainty decomposition
│   ├── matching.py                        # TP/FP/FN matching (BEV IoU >= 0.5)
│   ├── metrics.py                         # AUROC, AURC, ECE, NLL, Brier Score
│   ├── sotif_analysis.py                  # TC ranking, frame flags, acceptance gates
│   ├── visualization.py                   # Publication-quality figures (13 plot types)
│   ├── carla_case_study.py                # CARLA case study data generation
│   ├── demo_data.py                       # Lightweight synthetic data for demos
│   ├── dst_uncertainty.py                 # Dempster-Shafer Theory decomposition
│   ├── mc_dropout.py                      # MC Dropout alternative to ensembles
│   ├── kitti_utils.py                     # KITTI calibration, label I/O
│   ├── weather_augmentation.py            # Physics-based weather effects
│   └── dataset_adapter.py                 # Unified adapter for KITTI/CARLA formats
│
├── scripts/                               # Execution scripts
│   ├── evaluate.py                        # Evaluation (demo / carla_study / real modes)
│   ├── run_pipeline.py                    # End-to-end pipeline orchestrator
│   ├── execute_evaluation.py              # Cross-dataset evaluation with report
│   ├── run_inference.py                   # Ensemble inference + DBSCAN + evaluation
│   ├── train_ensemble.sh                  # Train K SECOND models via OpenPCDet
│   ├── prepare_kitti.py                   # Download and prepare KITTI dataset
│   └── generate_carla_data.py             # Generate CARLA simulation data
│
├── results/                               # Evaluation outputs
│   ├── evaluation_report.md               # Full cross-dataset analysis report
│   ├── comparison_summary.json            # Machine-readable cross-dataset metrics
│   ├── results_summary.json               # CARLA case study detailed metrics
│   ├── evaluation_results.json            # CARLA case study summary
│   └── figures/
│       ├── carla_case_study/              # 13 CARLA case study figures
│       ├── carla_synthetic/               # 13 CARLA cross-dataset figures
│       ├── kitti_real_world/              # 13 KITTI figures
│       └── comparison/                    # 5 cross-dataset comparison figures
│
├── docs/
│   └── extending.md                       # Guide: adding detectors, datasets, indicators
│
├── notebooks/
│   └── SOTIF_Uncertainty_Evaluation_Demo.ipynb   # Interactive Colab demo
│
├── configs/
│   └── second_sotif_ensemble.yaml         # OpenPCDet SECOND config for K=6 ensemble
│
├── tests/
│   └── test_pipeline.py                   # 58 tests across all pipeline stages
│
├── CITATION.cff
├── requirements.txt
├── pyproject.toml
├── setup.py
└── LICENSE

Methodology

The evaluation pipeline comprises five stages:

LiDAR Frame + K Ensemble Members
         │
   ┌─────▼─────┐
   │  Stage 1   │  Ensemble Inference (K independent forward passes)
   └─────┬─────┘
         │  K × D^(k) detections
   ┌─────▼─────┐
   │  Stage 2   │  DBSCAN Clustering + Uncertainty Indicators
   └─────┬─────┘   ├─ Mean confidence (s̄_j): existence uncertainty
         │         ├─ Confidence variance (σ²_s,j): epistemic uncertainty
         │         └─ Geometric disagreement (d_iou,j): localisation uncertainty
   ┌─────▼─────┐
   │  Stage 3   │  Correctness Determination (greedy matching, BEV IoU ≥ 0.5)
   └─────┬─────┘   └─ TP / FP / FN labels per proposal
         │
   ┌─────▼─────┐
   │  Stage 4   │  Metric Computation
   └─────┬─────┘   ├─ Discrimination: AUROC, AURC
         │         ├─ Calibration: ECE, NLL, Brier
         │         └─ Operating characteristics: Coverage, FAR at thresholds
   ┌─────▼─────┐
   │  Stage 5   │  SOTIF Analysis Artefacts
   └───────────┘   ├─ TC ranking (Clause 7)
                   ├─ Frame-level triage (Clause 7, Area 3 → Area 2)
                   ├─ Acceptance gates (Clause 11)
                   └─ Confidence interpretation (Clause 10)

Uncertainty Indicators

Indicator	Formula	Uncertainty Type	Safety Concern
Mean confidence	(1/K) Σ s_j^(k)	Existence	False or missed detection
Confidence variance	(1/(K−1)) Σ (s_j^(k) − s̄)²	Epistemic	Unknown operating condition
Geometric disagreement	1 − mean pairwise BEV IoU	Localisation	Incorrect distance estimate

Acceptance Gate

G(s̄, σ²_s, d_iou) = [s̄ ≥ τ_s] ∧ [σ²_s ≤ τ_v] ∧ [d_iou ≤ τ_d]

The multi-indicator gate filters detections using all three uncertainty dimensions, achieving lower FAR than confidence alone at comparable coverage.

DBSCAN Detection Association

Detections from K ensemble members are clustered using DBSCAN on a BEV IoU distance matrix:

Collect all detections from K members for each frame
Compute pairwise (1 − BEV IoU) distance matrix
Run DBSCAN with eps = 1 − iou_threshold (default 0.5)
Voting strategies control min_samples:
- Affirmative (min_samples=1): keep all detections
- Consensus (min_samples=K//2+1): majority agreement
- Unanimous (min_samples=K): all members agree
Aggregate: mean box position, yaw from highest-confidence member

Dempster-Shafer Theory Decomposition

Ensemble member scores are converted to mass functions and combined using Dempster's rule, decomposing total uncertainty into:

Component	Interpretation
Aleatoric	Irreducible sensor noise (Shannon entropy of pignistic probability)
Epistemic	Model ignorance (Plausibility − Belief interval width)
Ontological	Evidence for unknown unknowns (combined conflict mass)

Quick Start

Google Colab (No Setup)

Click the Colab badge above or open notebooks/SOTIF_Uncertainty_Evaluation_Demo.ipynb. Runs end-to-end with synthetic data — no GPU or dataset download needed.

Local Installation

git clone https://github.com/milinpatel07/SOTIF_Uncertainty_Evaluation_Lidar_Object_Detection.git
cd SOTIF_Uncertainty_Evaluation_Lidar_Object_Detection
pip install -e .

# Quick demo (synthetic data, generates figures)
python scripts/evaluate.py

# CARLA case study (547 frames, 22 weather configs, full analysis)
python scripts/evaluate.py --mode carla_study

# End-to-end pipeline with figure generation
python scripts/run_pipeline.py --mode carla_study --output_dir results

# Run tests (58 tests)
pytest tests/ -v

Cross-Dataset Evaluation (CARLA + KITTI)

# Clone the SOTIF-PCOD dataset
git clone https://github.com/milinpatel07/SOTIF-PCOD.git

# Run cross-dataset evaluation with report generation
python scripts/execute_evaluation.py \
    --carla_root SOTIF-PCOD/SOTIF_Scenario_Dataset \
    --output_dir results

# Outputs:
#   results/evaluation_report.md         — full analysis with tables
#   results/comparison_summary.json      — machine-readable metrics
#   results/figures/carla_synthetic/     — 13 CARLA figures
#   results/figures/kitti_real_world/    — 13 KITTI figures
#   results/figures/comparison/          — 5 cross-dataset comparison figures

Real Data Pipeline

For reproducing results with actual LiDAR data and trained ensemble models.

Prerequisites

Component	Version	Purpose
Python	>= 3.8	Runtime
PyTorch	>= 1.10	Training and inference
CUDA	11.x	GPU acceleration
spconv	v2.x	Sparse convolutions
OpenPCDet	0.6+	SECOND detector framework

Step 1: Prepare Dataset

KITTI:

python scripts/prepare_kitti.py --data_root data/kitti

CARLA:

python scripts/generate_carla_data.py --output_dir data/carla --frames_per_config 10

Both produce KITTI-format output:

data/{kitti,carla}/
├── training/
│   ├── velodyne/       # .bin point cloud files
│   ├── label_2/        # .txt label files
│   ├── calib/          # .txt calibration files
│   └── image_2/        # .png images (optional)
├── ImageSets/
│   ├── train.txt
│   └── val.txt
└── conditions.json     # Per-frame weather metadata (CARLA only)

Step 2: Train Ensemble (K=6)

bash scripts/train_ensemble.sh --seeds 0 1 2 3 4 5

# With custom parameters
bash scripts/train_ensemble.sh \
    --seeds 0 1 2 3 4 5 \
    --epochs 80 \
    --batch_size 4 \
    --openpcdet_root /path/to/OpenPCDet

Each member trains the SECOND architecture (MeanVFE → VoxelBackBone8x → HeightCompression → BaseBEVBackbone → AnchorHeadSingle) with identical hyperparameters, differing only in random seed.

Step 3: Ensemble Inference

python scripts/run_inference.py \
    --ckpt_dirs output/ensemble/seed_0 output/ensemble/seed_1 \
                output/ensemble/seed_2 output/ensemble/seed_3 \
                output/ensemble/seed_4 output/ensemble/seed_5 \
    --data_path data/kitti \
    --gt_path data/kitti/training/label_2 \
    --voting consensus

Step 4: Evaluate

python scripts/evaluate.py \
    --input results/ensemble_results.pkl \
    --gt_path data/kitti/training/label_2 \
    --calib_path data/kitti/training/calib

Generated Figures

The pipeline produces 13 publication-quality figures per dataset:

Figure	Description	Paper Reference
Reliability diagram	Calibration: predicted confidence vs. actual accuracy	Section 5.3
Risk-coverage curve	Selective prediction: risk reduction with coverage	Section 5.3
Confidence-variance scatter	TP/FP separation in uncertainty space	Section 5.2
ROC curves	Discrimination for all three indicators	Table 3
Frame risk scatter	Per-frame mean confidence vs. variance	Section 5.5
TC ranking bar chart	Triggering condition FP share ranking	Table 7
Operating points comparison	Coverage and FAR at multiple thresholds	Table 6
ISO 21448 scenario grid	SOTIF Area 1–4 mapping	Figure 1
Indicator distributions	Histogram of each indicator by TP/FP	—
Condition boxplots	Per-condition score and variance distributions	—
Member agreement heatmap	Score correlation across ensemble members	—
Condition breakdown	Stacked bar of TP/FP by environment	—
Operating point heatmap	2D threshold sweep (confidence × variance)	—

Testing

pytest tests/ -v          # 58 tests
python tests/test_pipeline.py   # alternative without pytest

Coverage includes: uncertainty indicators, DBSCAN clustering, TP/FP matching, all metrics (AUROC, ECE, NLL, Brier, AURC), SOTIF analysis, weather augmentation, DST decomposition, dataset adapter, and end-to-end pipeline.

Dependencies

Demo / Colab (no GPU needed):

numpy >= 1.20
matplotlib >= 3.4
scikit-learn >= 0.24

Full pipeline (real data, GPU required):

OpenPCDet >= 0.6
PyTorch >= 1.10
spconv v2.x
CUDA 11.x

Available Datasets

Dataset	Size	Weather Conditions	Access
SOTIF-PCOD	547 frames	22 CARLA configs	Free
KITTI	~29 GB	Clear	Free
CARLA	Custom	Any (simulated)	Free

Citation

@inproceedings{patel2026uncertainty,
  title={Uncertainty Evaluation to Support Safety of the Intended Functionality
         Analysis for Identifying Performance Insufficiencies in ML-Based
         LiDAR Object Detection},
  author={Patel, Milin and Jung, Rolf},
  year={2026},
  institution={Kempten University of Applied Sciences}
}

@article{patel2025dst,
  title={Uncertainty Representation in a SOTIF-Related Use Case with
         Dempster-Shafer Theory for LiDAR Sensor-Based Object Detection},
  author={Patel, Milin and Jung, Rolf},
  journal={arXiv preprint arXiv:2503.02087},
  year={2025}
}

References

ISO 21448:2022 — Safety of the Intended Functionality (SOTIF)
ISO/PAS 8800:2024 — Safety for AI-based systems in road vehicles
Lakshminarayanan et al. (2017) — Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Feng et al. (2018, 2021) — Uncertainty estimation for LiDAR 3D detection
Yan et al. (2018) — SECOND: Sparsely Embedded Convolutional Detection
Pitropov et al. (2022) — LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR 3D Object Detection
Shafer (1976) — A Mathematical Theory of Evidence

License

MIT License — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uncertainty Evaluation for SOTIF Analysis of LiDAR Object Detection

Motivation

Experimental Results

Discrimination (AUROC)

Calibration

Operating Points

Repository Structure

Methodology

Uncertainty Indicators

Acceptance Gate

DBSCAN Detection Association

Dempster-Shafer Theory Decomposition

Quick Start

Google Colab (No Setup)

Local Installation

Cross-Dataset Evaluation (CARLA + KITTI)

Real Data Pipeline

Prerequisites

Step 1: Prepare Dataset

Step 2: Train Ensemble (K=6)

Step 3: Ensemble Inference

Step 4: Evaluate

Generated Figures

Testing

Dependencies

Available Datasets

Citation

References

License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
docs		docs
figures		figures
notebooks		notebooks
paper		paper
results		results
scripts		scripts
sotif_uncertainty		sotif_uncertainty
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
generate_paper_figures.py		generate_paper_figures.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

milinpatel07/SOTIF_Uncertainty_Evaluation_Lidar_Object_Detection

Folders and files

Latest commit

History

Repository files navigation

Uncertainty Evaluation for SOTIF Analysis of LiDAR Object Detection

Motivation

Experimental Results

Discrimination (AUROC)

Calibration

Operating Points

Repository Structure

Methodology

Uncertainty Indicators

Acceptance Gate

DBSCAN Detection Association

Dempster-Shafer Theory Decomposition

Quick Start

Google Colab (No Setup)

Local Installation

Cross-Dataset Evaluation (CARLA + KITTI)

Real Data Pipeline

Prerequisites

Step 1: Prepare Dataset

Step 2: Train Ensemble (K=6)

Step 3: Ensemble Inference

Step 4: Evaluate

Generated Figures

Testing

Dependencies

Available Datasets

Citation

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages