GitHub - deel-ai/oodeel-benchmark

Minimal, reproducible harness to benchmark OODeel detectors on any mix of ID datasets × OOD datasets × models × feature-layer-packs × detector hparams grids.

configs/                # YAML knobs (no code)
└─ datasets/            # 1 file per ID dataset
└─ models/              # 1 file per architecture (feature layer packs)
└─ methods/             # 1 file per detector (hyper-params)
src/                    # code
└─ dataset/             # dataset loaders (ID + OOD)
└─ openood_networks/    # model loaders (ID)
└─ utils.py             # utils (seed, etc.)
└─ run.py               # launch everything (crash-safe, resumable)
results/                # one .parquet per (ID, model, detector, …)

Live view

Local console (Rich)	W&B dashboard

Quick start

git clone git@github.com:y-prudent/oodeel-benchmark.git
cd oodeel-benchmark
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt      # torch, oodeel[torch], rich, wandb…
CUDA_VISIBLE_DEVICES=0 python -m src.run    # single GPU

Parquet files and W&B dashboards appear as the sweep progresses.

Multi-GPU / multi-machine

# 2 GPUs on the same box
CUDA_VISIBLE_DEVICES=0 python src/run.py --shard-index 0 --num-shards 2 &
CUDA_VISIBLE_DEVICES=1 python -m src/run.py --shard-index 1 --num-shards 2 &

Each process grabs its slice of the sweep; they meet only in the shared results/ folder. Restarting is instant—completed files are skipped.

Profiling detectors

python -m src.profile_efficiency

A Parquet table is written to profile_results/imagenet_resnet50.parquet.

Customising the sweep

Add / edit YAMLs under configs/ to declare new datasets, models, layer packs or detector grids (see the existing examples).

Optional: limit huge training splits by inserting

fit_subset:
  per_class: 50 # ≤50 imgs / class
  max_samples: 50000

in a dataset YAML.

Metrics & plots

Per OOD pair we save raw scores and auroc, tpr5fpr in Parquet.
Live AUROC × TPR scatter plots are logged to Weights-and-Biases (project=oodeel-bench) — filter by method, model or layer pack.

TL;DR

One command runs the whole grid, auto-resumes, and streams metrics to W&B — all configs stay in plain YAML.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.devcontainer		.devcontainer
configs		configs
evaluate_models		evaluate_models
reduced_results		reduced_results
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Live view

Quick start

Multi-GPU / multi-machine

Profiling detectors

Customising the sweep

Metrics & plots

TL;DR

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Live view

Quick start

Multi-GPU / multi-machine

Profiling detectors

Customising the sweep

Metrics & plots

TL;DR

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages