Release v0.0.2

Highlights

`vla-eval test` — Unified Smoke Test CLI

The three separate commands (validate, test-server, test-benchmark) have been replaced with a single vla-eval test entry point.

vla-eval test                # config validation only (fast, safe default)
vla-eval test --all          # run all categories
vla-eval test --server       # model server smoke tests only
vla-eval test --benchmark    # benchmark smoke tests only
vla-eval test --list         # show inventory + readiness

Parallel GPU execution: --parallel [N] with automatic GPU slot allocation
Graceful Ctrl+C: prints partial results + auto-saves stderr logs to results/smoke-logs/
Smoke step cap: max_steps=50 prevents timeouts on slow benchmarks
--fail-fast, --list readiness overview

Docker Dev Mode

All 13 benchmark Dockerfiles now use editable install (uv pip install -e .). The new --dev flag on vla-eval run bind-mounts host src/ into the container, enabling rapid iteration without image rebuilds.

vla-eval run --config configs/libero_smoke_test.yaml --dev

Rich CLI Output

Colored pass/fail/skip symbols, category headers, and status lines across all CLI output. Auto-disables when piped (NO_COLOR, isatty, TERM=dumb). rich is lazy-imported to keep CLI startup fast.

Bug Fixes

Python 3.8 compatibility: dict[str, Any] → Dict[str, Any], TypeAlias guarded behind TYPE_CHECKING — fixes import errors in 6 benchmark Docker images (LIBERO, CALVIN, RoboCerebra, RLBench)
RLBench Dockerfile: extract BuildKit-only heredoc to standalone rlbench_entrypoint.sh
RLBench entrypoint: replace sleep with Xvfb socket polling for reliable startup

CI & Leaderboard

Merged redundant sync-external.yml into update-data.yml
Added --source filter input to update-data workflow dispatch
Fixed unquoted ${{ }} YAML expressions that broke workflow parsing
All leaderboard scripts now write structured summaries to $GITHUB_OUTPUT for informative PR bodies
Synced external leaderboard scores (RoboChallenge + RoboArena)

Breaking Changes

vla-eval validate, vla-eval test-server, vla-eval test-benchmark have been removed. Use vla-eval test instead.

Contributors

@MilkClouds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Release v0.0.2

Highlights

`vla-eval test` — Unified Smoke Test CLI

Docker Dev Mode

Rich CLI Output

Bug Fixes

CI & Leaderboard

Breaking Changes

Contributors

Uh oh!

v0.0.2

Release v0.0.2

Highlights

vla-eval test — Unified Smoke Test CLI

Docker Dev Mode

Rich CLI Output

Bug Fixes

CI & Leaderboard

Breaking Changes

Contributors

Uh oh!

`vla-eval test` — Unified Smoke Test CLI