Validation Framework

This repo now includes a full validation path for player, shuttle, and optional pose evaluation.

What is included

benchmark frame extraction
manual annotation workflow for player boxes and shuttle points
model inference on labeled frames
automatic metric calculation and report generation

Files

scripts/create_validation_manifest.py
scripts/annotate_validation.py
scripts/run_validation_inference.py
scripts/bootstrap_validation_labels.py
scripts/eval_validation.py
scripts/render_validation_review.py
scripts/run_benchmark_bundle.py
scripts/select_validation_subset.py
src/eval/metrics.py
src/model_defaults.py
validation/schema/frame_labels.schema.json
validation/examples/frame_label.example.json

Recommended workflow

1. Extract benchmark frames

python scripts/create_validation_manifest.py \
  --video badminton_sample.mp4 \
  --out-dir validation/sample \
  --every 30 \
  --max-frames 20

Outputs:

validation/sample/manifest.jsonl
validation/sample/frames/*.jpg

2. Annotate player boxes and shuttle

python scripts/annotate_validation.py --manifest validation/sample/manifest.jsonl

Controls:

1 annotate P1 box by drag
2 annotate P2 box by drag
s shuttle mode, left click to place shuttle
o mark shuttle as occluded / not visible
c clear current mode annotation
n next frame
p previous frame
w save
q quit

Default output:

validation/sample/labels.jsonl

3. Run model inference on the labeled frames

.venv/bin/python scripts/run_validation_inference.py \
  --manifest validation/sample/labels.jsonl \
  --with-pose

Default output:

validation/sample/predictions.jsonl

4. Score the predictions

python scripts/eval_validation.py \
  --labels validation/sample/labels.jsonl \
  --predictions validation/sample/predictions.jsonl

Outputs:

validation/sample/predictions.eval.json
validation/sample/predictions.eval.md

Optional bootstrap workflow

If you want a faster review loop, first generate model predictions, then turn them into prefilled labels for later correction:

.venv/bin/python scripts/bootstrap_validation_labels.py \
  --manifest validation/sample/manifest.jsonl \
  --predictions validation/sample/predictions.jsonl

Output:

validation/sample/labels.bootstrap.jsonl

Then review that file inside the annotator by copying it to labels.jsonl or passing it as your working labels file.

Visual review output

You can also render a contact sheet for quick inspection:

.venv/bin/python scripts/render_validation_review.py \
  --input validation/sample/predictions.jsonl

Outputs:

validation/sample/review/*.jpg

This is useful for fast regression review before doing detailed manual correction.

One-command benchmark bundle

If you want a repeatable regression bundle from a video sample, run:

.venv/bin/python scripts/run_benchmark_bundle.py \
  --video badminton_sample.mp4 \
  --out-dir validation/bundles/fullcourt12 \
  --every 45 \
  --max-frames 24 \
  --indices 7,8,9,10,11,12,13,14,20,21,22,23 \
  --with-pose

Outputs include:

sampled manifest
predictions
bootstrap labels
review contact sheet
bundle.summary.json

This is the fastest way to regenerate a benchmark package after tracker or model changes.

Curating a cleaner benchmark subset

If a sampled benchmark includes close-ups, logo cards, or other non-rally frames, you can carve out a cleaner subset:

python scripts/select_validation_subset.py \
  --input validation/sample/manifest.jsonl \
  --out validation/sample/manifest.rally.jsonl \
  --indices 0,1,4,5,6

Default model selection

The repo now prefers the local shuttle specialized weight when present:

models/weights/shuttle_best.pt

Player detection still defaults to generic yolo11n.pt unless you explicitly pass --player-model, because the local player_best.pt may underperform on some sample videos.

Metrics

Shuttle

visibility recall
mean / median pixel error
normalized error by image diagonal
accuracy at 5px / 10px / 20px

Players

mean IoU for P1 and P2
recall at IoU 0.5 and 0.75
mean center error in pixels

Pose

PCK@0.1
PCK@0.2

Pose metrics are only computed when ground truth pose keypoints are present in the labels file.

Design notes

Ground truth labels use semantic player ids: P1, P2
Predictions use tracker slots 1, 2, and the evaluator maps them to P1, P2
This lets the current pipeline be evaluated without changing its runtime output format
The framework is intended for repeated regression checks after model or tracker changes

Practical recommendation

Start by labeling shuttle + players on 100 to 300 frames across several hard scenes. That already gives a useful benchmark before investing in full pose keypoint annotation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation Framework

What is included

Files

Recommended workflow

1. Extract benchmark frames

2. Annotate player boxes and shuttle

3. Run model inference on the labeled frames

4. Score the predictions

Optional bootstrap workflow

Visual review output

One-command benchmark bundle

Curating a cleaner benchmark subset

Default model selection

Metrics

Shuttle

Players

Pose

Design notes

Practical recommendation

FilesExpand file tree

VALIDATION.md

Latest commit

History

VALIDATION.md

File metadata and controls

Validation Framework

What is included

Files

Recommended workflow

1. Extract benchmark frames

2. Annotate player boxes and shuttle

3. Run model inference on the labeled frames

4. Score the predictions

Optional bootstrap workflow

Visual review output

One-command benchmark bundle

Curating a cleaner benchmark subset

Default model selection

Metrics

Shuttle

Players

Pose

Design notes

Practical recommendation