feat(dirs): unified XDG asset cache + runtime licence confirmation by MilkClouds · Pull Request #58 · allenai/vla-evaluation-harness

MilkClouds · 2026-04-29T16:55:20Z

Summary

Single host-cache resolver and licence-confirm primitive shared by every site in the harness that needs to put something on disk and (optionally) gate it behind a user opt-in. Replaces the per-site re-implementations of $XDG_CACHE_HOME/vla-eval/<name> and the per-server git clone or skip blocks; collapses the BEHAVIOR-1K reproduction from a two-step dance (manual docker download → eval) to one vla-eval run invocation.

The branch was originally about a vla-eval data fetch subcommand. After two rounds of design review the abstraction collapsed: the licence-acceptance handshake doesn't need a separate pre-flight step; runtime stdin prompt + env-var bypass is enough, and the explicit subcommand was reverted (commit 51bf04d). The branch now ships the smaller, sharper primitives.

What's in here

New: src/vla_eval/dirs.py (mirrors HuggingFace's HF_HOME / HF_ASSETS_CACHE precedence shape)

Function	Precedence
`home()`	`$VLA_EVAL_HOME > $XDG_CACHE_HOME/vla-eval > ~/.cache/vla-eval`
`assets_cache(subdir)`	`$VLA_EVAL_ASSETS_CACHE > home()/assets` (+ optional subdir)
`ensure_git_clone(name, repo, rev, *, shallow)`	host-side git helper used by lazy model-server clones
`ensure_license(license_id, *, url, description)`	`$VLA_EVAL_ACCEPTED_LICENSES` env var → stdin prompt (`isatty`) → `SystemExit` with hint

Layout when no env vars are set:

~/.cache/vla-eval/
└── assets/
    ├── behavior1k/      # OmniGibson scene + task instance files
    ├── vlanext/         # github.com/DravenALG/VLANeXt clone
    └── mme-vla/         # github.com/RoboMME/robomme_policy_learning clone

Future namespaces (e.g. datasets/ for HF Hub datasets, tokens/ for auth) sit alongside assets/ under home() without adding new env vars.

Runtime licence-confirm flow

Behavior1KBenchmark._ensure_assets(data_path), called from _init_og, fetches BEHAVIOR-1K scene + task files on first run via the OmniGibson download_* helpers and gates the prompt through ensure_license.
vla-eval run adds --accept-license <id> (repeatable). The orchestrator forwards it into the eval container as VLA_EVAL_ACCEPTED_LICENSES=id1,id2,… so ensure_license skips the stdin prompt non-interactively.
_run_via_docker conditionally adds -i / -it based on host isatty, so an interactive user sees the prompt and a CI/sharded run goes straight through the env-var bypass.

Existing model-server clones migrate to the shared primitives

vlanext.py: _VLANEXT_CACHE → assets_cache("vlanext"); the manual git clone lines → ensure_git_clone(name="vlanext", …). The VLANEXT_ROOT editable-mode override is unchanged.
mme_vla.py: same pattern, shallow=True for the branch clone.

Config & docs

configs/behavior1k_eval.yaml volume uses the same four-level OmegaConf fallback chain that assets_cache resolves to in Python; drops :ro so the in-container fetch can populate the host cache.
docs/reproductions/behavior1k.md collapses the old two-step "fetch then run" into a single vla-eval run --accept-license behavior-dataset-tos -c … invocation.
.claude/skills/add-benchmark/SKILL.md gains a small "Optional: external dataset acquisition" section pointing at the new pattern.

Earlier-commit improvements still in this PR (carry-over)

Behavior1KBenchmark.task_instance_id accepts int | list[int] | None and sweeps via episode_idx % len(list), so the 50-task × 10-instance challenge protocol is reachable.
Behavior1KDemoReplayModelServer step cursor is keyed on (session_id, episode_id) and managed via on_episode_start / on_episode_end; predict() raises if it's called before on_episode_start for that key.
Behavior1KBenchmark.get_metadata() declares action_dim=23 so realtime modes don't fall back to the orchestrator's 7-D default.
README.md Docker Images table marks rlbench and behavior1k rows as build-locally with a 🔒 marker; CONTRIBUTING.md benchmark roster refreshed.
cli/_console.py and cli/_docker.py host docker / console helpers shared by cli/main.py.
cli/config_loader.py always runs OmegaConf.to_container(resolve=True) so ${oc.env:…} interpolations work uniformly across configs.

Verification

make check                          # ruff + ty
make test                           # pytest, 306 passed (incl. test_dirs.py x10), 1 skipped
uv run vla-eval test --validate     # 63 / 63 configs valid

# Resolver picks up each env-var arrangement
uv run python -c "from vla_eval.dirs import home, assets_cache; print(home(), assets_cache('behavior1k'))"
VLA_EVAL_HOME=/srv/vla-eval uv run python -c "from vla_eval.dirs import assets_cache; print(assets_cache('behavior1k'))"
VLA_EVAL_ASSETS_CACHE=/fast/ssd/assets uv run python -c "from vla_eval.dirs import assets_cache; print(assets_cache('behavior1k'))"
XDG_CACHE_HOME=/var/cache uv run python -c "from vla_eval.dirs import assets_cache; print(assets_cache('behavior1k'))"

# Licence path: env-var bypass succeeds
VLA_EVAL_ACCEPTED_LICENSES=behavior-dataset-tos \
  uv run python -c "from vla_eval.dirs import ensure_license; ensure_license('behavior-dataset-tos', url='https://x', description='y'); print('ok')"
# -> ok

# Licence path: non-TTY without env var exits with hint
echo | uv run python -c "from vla_eval.dirs import ensure_license; ensure_license('behavior-dataset-tos', url='https://x', description='y')"
# -> Licence required: y
#    ID: behavior-dataset-tos
#    URL: https://x
#    Non-interactive context (no TTY).  Re-run with --accept-license behavior-dataset-tos …

End-to-end docker invocation hasn't been re-run on the cluster (would re-download ~35 GiB for no extra coverage); the existing reproduction artefact under docs/reproductions/data/ still applies.

Checklist

I have read the relevant contributing guide (CONTRIBUTING.md)

Code changes:

make check passes (ruff + ty)
make test passes — 306 passed, 1 skipped
Smoke-tested affected configs

Smoke test commands run:

make check
make test
uv run vla-eval test --validate
# resolver + licence smoke (see above)

Adds the BEHAVIOR-1K (OmniGibson + NVIDIA Isaac Sim 4.5.0) benchmark. The integration covers the standard StepBenchmark surface plus a demo-replay model server used to verify the dataloader against the released LeRobot v2.1 trajectories. What's added: - ``src/vla_eval/benchmarks/behavior1k/benchmark.py`` Behavior1KBenchmark with the required StepBenchmark methods. R1Pro robot, 23-D action, RGB head + L/R wrist cameras. The async bridge is overridden so reset/step/cleanup run on a worker thread (Isaac Sim's SimulationApp ``signal.signal`` calls have to be monkey-patched while the worker thread is set up — they assume the main thread). - ``src/vla_eval/model_servers/behavior1k_baseline.py`` Zero-action baseline (smoke-test sanity check). - ``src/vla_eval/model_servers/behavior1k_demo_replay.py`` Plays back the recorded actions from a LeRobot v2.1 parquet episode. Used to verify the env wiring matches the dataset. - ``docker/Dockerfile.behavior1k`` Isaac Sim 4.5.0 from pypi.nvidia.com (26 omni-* / isaacsim-* wheels), bddl3 + OmniGibson[eval] + joylo from BEHAVIOR-1K v3.7.2. Gated behind ``ARG ACCEPT_NVIDIA_EULA=YES`` (NVIDIA Omniverse EULA, see https://docs.omniverse.nvidia.com/eula/). - ``configs/behavior1k_eval.yaml`` — turning_on_radio task instance 1 - ``configs/model_servers/behavior1k/baseline.yaml`` — zero-action server - ``docs/reproductions/behavior1k.md`` — repro write-up + data files The behavior1k entry is registered in ``docker/build.sh`` (gated via ``--accept-license behavior1k``) and listed under ``NO_REDIST`` in ``docker/push.sh`` so the image is built locally only. Verification: - demo-replay on turning_on_radio (task instance 1) → success=True at step 1364/2000 (within the human-annotated press-skill window [1162, 1434] from the BEHAVIOR Dataset annotations). - zero-action baseline → success=False at max_steps (expected). Skill notes (``.claude/skills/add-benchmark`` and ``add-model-server``) gain a short reminder not to add ``tests/test_<name>_benchmark.py`` or ``tests/test_<name>_server.py`` with mocked sim/model libraries — ``tests/`` is for harness mechanics, not per-sim integration, and mocked modules drift from upstream and miss real bugs.

Companion to the baseline.yaml — points the demo-replay server at a LeRobot v2.1 parquet episode. ``demo_path`` is a placeholder; users swap in their own path before running.

CI surfaced three ty errors after the rebase: - ``anyio.to_thread.run_sync(...)`` was unresolved through the module attribute path. Use the same import-as-name style the rest of the codebase already uses (``predict.py``, ``serve.py``, ``rtc.py``). - ``signal.signal = lambda ...`` triggered ``invalid-assignment``. Use ``setattr`` so the rebinding is opaque to the type checker (the runtime behaviour — restoring the handler in ``finally`` — is unchanged). - Drop the leftover ``# type: ignore`` mypy-style pragmas that were carrying the old workarounds; ty doesn't honour them anyway. While here, refresh the docs that mention benchmark coverage: - ``README.md``: BEHAVIOR-1K badge promoted from ``planned`` to ``integrated``; rlbench dropped from the registry-pulled image table; new "Build-locally images" note covering rlbench and behavior1k; build-script example shows ``--accept-license``. - ``CONTRIBUTING.md``: integrated-benchmark roster updated to match the actual contents of ``src/vla_eval/benchmarks/`` (was missing LIBERO-Plus/Mem, RoboMME, MolmoSpaces, Kinetix; now also adds BEHAVIOR-1K).

Build-locally images (rlbench, behavior1k) now appear in both the top-of-readme support table and the Docker Images table, with a 🔒 marker indicating they're not pulled from ghcr.io and require an explicit licence opt-in. - Top support table: 🔒 appended after the rlbench and behavior1k badges. Status legend gains a fourth entry explaining 🔒. - Docker Images table: rlbench is restored (was dropped in the prior pass), behavior1k is added at its 23.6 GB position. For both, the Image column shows the name without a ghcr.io link, and the row carries 🔒. - Replaces the earlier "Build-locally images" paragraph with a single caption under the table that explains the marker.

Reverts the 🔒 markers added next to the RLBench / BEHAVIOR-1K badges in the top support table, and the matching legend entry. Build-mechanism details belong in the Docker Images table further down — the support table just tracks integration / reproduction status.

Three issues from review: - ``Behavior1KBenchmark.task_instance_id`` was set once at construction and never varied, so ``episodes_per_task > 1`` runs reloaded the same TRO state every episode (and aggregate scores could not match the 50-task × 10-instance challenge protocol). Accept ``int | list[int] | None`` and index by ``task["episode_idx"]`` cyclically when a list is given; the scalar form preserves the demo-replay use case. - ``Behavior1KDemoReplayModelServer`` kept a single ``_current_episode_id`` / ``_step_idx`` for the whole process, so two concurrent benchmark sessions on one server would race and consume a mixed action stream. Key the cursor on ``(session_id, episode_id)``, initialise in ``on_episode_start`` and free in ``on_episode_end`` so the dict stays bounded. - ``docs/reproductions/behavior1k.md`` build command did not pass ``--accept-license behavior1k``, so the new gated build skipped the image and the next step failed with "image not found". Updated the command and added the licence URL inline.

Some benchmarks ship the simulator in the docker image but expect the dataset to come from a separate, licence-restricted source (BEHAVIOR-1K is the first concrete consumer — its dataset is governed by the BEHAVIOR Dataset ToS and OmniGibson assets). Previously, users had to manually run a docker invocation that wired ``download_*`` helpers inside the image and then edit ``configs/behavior1k_eval.yaml`` so the volume pointed at their host download directory. Two awkward steps that don't appear anywhere a first-time reader would naturally look. This change introduces a uniform mechanism the harness can use for any benchmark with the same shape: - ``Benchmark.data_requirements()`` (classmethod, base default returns ``None``) declares a :class:`DataRequirement`: licence id + URL, the in-container data path, a marker file, and the argv to run inside the image. ``Behavior1KBenchmark`` returns the BEHAVIOR Dataset ToS opt-in plus the canonical ``download_*`` helper script. - New ``vla-eval data fetch -c <config> --accept-license <id>`` CLI: resolves the benchmark class, mounts a host cache directory read-write at the container's data path, and runs the declared download command. Idempotent (skips when the marker is already present). Symmetric with ``docker/build.sh --accept-license``, so opt-in surface is the same shape across build and fetch. - Default cache path: ``${VLA_EVAL_DATA_DIR}/<benchmark>`` if the env var is set, else ``${HOME}/.cache/vla-eval/<benchmark>``. ``--data-dir`` overrides explicitly. ``configs/behavior1k_eval.yaml`` resolves the same expression in its ``volumes:`` line via OmegaConf, so a fresh clone + ``vla-eval data fetch`` + ``vla-eval run`` works without any per-host config edits. - ``cli/config_loader.py`` now always runs ``OmegaConf.to_container`` with ``resolve=True`` (previously only on configs that used ``extends:``), so ``${oc.env:VAR,default}`` interpolations are honoured uniformly. No-op for configs without interpolations. - ``docs/reproductions/behavior1k.md`` replaces the manual docker download incantation with the new ``vla-eval data fetch`` step and notes the auto-resolved cache path.

Three concrete cleanups from the simplify pass: - Use ``gpu_docker_flag(...)`` instead of building ``--gpus`` by hand. The previous form passed ``--gpus 0`` literally; ``docker run`` needs ``--gpus device=0`` for non-``all`` specs, so this was a real bug that only manifested when ``--gpus 0`` was passed at fetch time. - Add ``cache_key`` to ``DataRequirement`` so the cache subdir is declared by the benchmark instead of derived from import-string parsing (the old form sliced ``parts[-2]`` from the module path, which silently couples the CLI to a package layout convention). - De-duplicate ``_stderr_console`` — the new fetch handler had reimplemented it with a different rich Console signature. Hoisted to ``cli/_console.py``; both ``main.py`` and ``cmd_data.py`` now import it from one place. While here: - Drop the redundant ``image`` parameter from ``_build_docker_argv`` (it was always ``docker_cfg.image``). - Drop a dead ``or []`` branch in ``set(args.accept_license or [])`` (argparse already defaults to ``[]``). - Trim the ``DataRequirement`` and ``Behavior1KBenchmark.data_requirements`` docstrings to match the project's existing density. - Refresh the ``config_loader.load_config`` docstring (it claimed no-extends configs went through ``yaml.safe_load`` only, which is no longer true since the prior commit always resolved interpolations).

Two issues from review: - ``Behavior1KBenchmark.get_metadata()`` did not declare ``action_dim``, so realtime modes (``Orchestrator`` reads ``metadata.get("action_dim", 7)`` to size the action buffer) would fall back to 7 and instantly crash on the first ``step()`` once it saw a 23-D R1Pro action. Add the entry. - Drop ``--data-dir`` from ``vla-eval data fetch``. When users passed ``--data-dir /custom`` for fetch, the dataset went there, but ``vla-eval run`` mounts the path resolved by the YAML's ``${VLA_EVAL_DATA_DIR:-~/.cache/vla-eval}/<cache_key>`` — fetch-target and run-target diverged silently. ``VLA_EVAL_DATA_DIR`` is the single source of truth for the cache base; both fetch and run honour it. Users who need a different disk set the env var once.

Six concrete improvements from the post-/simplify deep review: - Hoist docker daemon/image helpers out of ``cli/main.py`` into ``cli/_docker.py`` so ``cmd_data`` can reuse them. ``cmd_data_fetch`` now ``check_docker_daemon``-s the host and ``image_exists_locally``-s the configured image; if the image isn't built yet it points the user at the right ``docker/build.sh ... --accept-license ...`` invocation instead of letting docker fail with a manifest-not-found error a few seconds later (``NO_REDIST`` images can't be pulled). - ``cmd_data._resolve_benchmark_class`` now warns when a config has more than one benchmark entry — the fetcher only acts on the first, and the silent skip was a footgun for multi-benchmark configs. - ``Behavior1KDemoReplayModelServer.predict`` raises if it's called without a prior ``on_episode_start`` for that ``(session, episode)`` key. The previous ``_step_idx.get(key, 0)`` silently restarted the cursor at 0, which would corrupt a replay if the harness ever called ``predict`` before sending ``EPISODE_START`` (e.g. on a mid- episode reconnect). Matches the explicit ``RuntimeError`` pattern ``predict.py`` uses elsewhere for protocol violations. - ``docs/reproductions/behavior1k.md`` "How to Reproduce" header said ``100 steps``, but the config sets ``max_steps: 2000``; fix the header. - ``.claude/skills/add-benchmark/SKILL.md`` gains a section on ``data_requirements()`` so a contributor adding a new benchmark with externally-licensed data discovers the fetch hook through the skill rather than by reading ``behavior1k`` source. - ``docker/build.sh`` ``EULA_GATED`` and ``docker/push.sh`` ``NO_REDIST`` now cross-reference each other in their headers. These two lists must stay in sync (any image gated on build is also unpushable); the comments make the invariant visible to future contributors.

…ctions Re-orient PR #58 around a smaller infrastructure change. The ``vla-eval data fetch`` subcommand + ``DataRequirement`` declarative metadata layer were over-built for the actual lifecycle: the license-acceptance handshake doesn't need to be a separate pre-flight step; it can be runtime, prompted on first need, just like model-server git clones already do. Moving the licence confirmation to runtime collapses the asymmetry between benchmark-asset fetch and model-server clone fetch — both become lazy, both go through the same primitives. This commit removes the abstraction. The next commit adds the runtime-licence flow and the unified host-cache resolver. Removed: - ``src/vla_eval/cli/cmd_data.py`` (the ``vla-eval data fetch`` subcommand and its docker-side fetch dispatch). - ``DataRequirement`` dataclass and ``Benchmark.data_requirements`` classmethod on ``src/vla_eval/benchmarks/base.py``. - ``Behavior1KBenchmark.data_requirements`` method. - ``cmd_data.register(sub)`` wiring in ``cli/main.py``. Reverted to the PR #57 baseline: - ``configs/behavior1k_eval.yaml`` — the data-fetch comment block and the OmegaConf volume interpolation; the next commit puts the interpolation back in extended XDG-aware form. - ``docs/reproductions/behavior1k.md`` step 2. - ``.claude/skills/add-benchmark/SKILL.md`` ``data_requirements`` section. Kept (independent improvements that survive this rewrite): - ``cli/_console.py``, ``cli/_docker.py`` (helper hoists). - ``cli/config_loader.py`` always-on OmegaConf interpolation. - ``Behavior1KBenchmark.task_instance_id`` per-episode sweep. - Demo-replay per-(session, episode) cursor + ``on_episode_start`` fail-loud hook. - ``Behavior1KBenchmark.get_metadata`` declaring ``action_dim=23``. - README "Build-locally images" caption + 🔒 marker on rlbench/ behavior1k rows; CONTRIBUTING benchmark roster refresh.

Single host-cache resolver, one git-clone helper, one licence-confirm helper. All three sites that previously rolled their own resolution (``cli/cmd_data``, ``model_servers/vlanext``, ``model_servers/mme_vla``) now go through them. License acceptance moves from the explicit pre-flight ``vla-eval data fetch`` step (reverted in the previous commit) to a runtime stdin prompt inside the eval container, with a ``--accept-license`` / ``$VLA_EVAL_ACCEPTED_LICENSES`` non-interactive escape hatch. New module ``vla_eval/dirs.py`` (mirrors HuggingFace's ``HF_HOME`` / ``HF_ASSETS_CACHE`` precedence shape): - ``home()`` — ``$VLA_EVAL_HOME > $XDG_CACHE_HOME/vla-eval > ~/.cache/vla-eval``. - ``assets_cache(subdir)`` — ``$VLA_EVAL_ASSETS_CACHE > home()/assets``, with optional per-component subdir. - ``ensure_git_clone(name, repo, rev, *, shallow=False)`` — lazy clone into ``assets_cache(name)`` at the pinned rev. Idempotent. - ``ensure_license(license_id, *, url, description)`` — env-var bypass first (``$VLA_EVAL_ACCEPTED_LICENSES``), then interactive stdin prompt (``isatty``), else ``SystemExit`` with a hint pointing at ``--accept-license`` / the env var. Tests in ``tests/test_dirs.py`` cover env-var precedence + every ``ensure_license`` branch. Behavior1KBenchmark gains ``_ensure_assets(data_path)`` called from ``_init_og``. First-run on a fresh host prompts for the BEHAVIOR Dataset ToS via ``ensure_license`` and runs the three OmniGibson ``download_*`` helpers. No more separate fetch step; the eval container itself populates its own cache. Model servers ``vlanext`` and ``mme_vla`` lose their hand-rolled ``$XDG_CACHE_HOME/vla-eval/<name>`` resolution and the ``git clone or skip`` block; both now call ``assets_cache`` + ``ensure_git_clone``. ``VLANEXT_ROOT`` (vlanext-specific editable-mode override that bypasses the cache entirely) stays at the call site. Orchestrator (``cli/main.py``) changes: - Adds ``--accept-license <id>`` (repeatable) to ``vla-eval run``; forwarded into the eval container as ``VLA_EVAL_ACCEPTED_LICENSES=id1,id2,…``. - Conditionally appends ``-i``/``-it`` to ``docker run`` so the in-container licence prompt can read the host's stdin in interactive contexts. Config / docs: - ``configs/behavior1k_eval.yaml`` volumes use the four-level OmegaConf fallback chain that mirrors ``assets_cache``'s Python precedence, and drops ``:ro`` so the in-container download can populate the host cache. - ``docs/reproductions/behavior1k.md`` collapses the old two-step reproduction (manual ``docker run`` download + eval) into a single ``vla-eval run --accept-license behavior-dataset-tos`` invocation. - ``.claude/skills/add-benchmark/SKILL.md`` gains a small "Optional: external dataset acquisition" section pointing at ``vla_eval.dirs.ensure_license`` + ``assets_cache`` and the ``_init_og``-style lazy pattern.

Five concrete cleanups from the post-feat review: - ``ACCEPTED_LICENSES_ENV`` constant in ``dirs.py``; the env var name no longer repeats as a string literal across modules. - ``tty_docker_flags()`` helper in ``docker_resources.py`` next to ``gpu_docker_flag`` / ``shard_docker_flags``. Replaces the inline ``isatty`` ladder in ``_run_via_docker``. - Two unit tests for ``ensure_git_clone`` (idempotent short-circuit when ``.git`` exists; argv shape for ``shallow=True``). The network call still isn't unit-tested but the dispatch is now covered. - ``ensure_license`` non-interactive message gets a banner box + ``[vla-eval]`` prefix + concrete two-line "re-run with one of" block, so the licence prompt isn't drowned in OmniGibson's initialisation logs. - Docstring / comment widths re-flowed up to the project's 119 line-length (was wrapping ~70 chars by reflex).

Sweep across the surface PR #58 touches: comments and docstrings that were wrapping at ~62-72 chars get reflowed to fit the project's 119 line-length, and a small set of pure-narration banner comments are removed. Reflowed (no information loss, just wider lines): - ``benchmarks/behavior1k/benchmark.py`` — module docstring, class docstring + ``task_instance_id`` Args block, ``_load_task_instance`` docstring, signal-handler / cleanup / async-bridge / start_episode comment blocks. - ``model_servers/behavior1k_demo_replay.py`` — module docstring, parquet ``columns=`` rationale, per-(session, episode) cursor note. - ``model_servers/behavior1k_baseline.py`` — module docstring. - ``model_servers/vlanext.py`` — ``_ensure_vlanext`` docstring, shallow-clone rationale, class docstring. - ``model_servers/mme_vla.py`` — module-level shallow-clone rationale, class docstring + Args block. - ``docker_resources.py`` — ``tty_docker_flags`` summary, the OMP/MKL block. Removed (clearly noise, not navigation aids): - ``cli/main.py`` `# CLI override for ...` / `# Validate shard args` / `# Resource allocation` / `# Decide whether to run via Docker` banners are kept (they aid navigation in long ``cmd_run``). The truly redundant ones — `# Extra volumes from config`, `# Extra env vars`, `# 1. CWD (running from repo root)` — go. - ``cli/_docker.py`` module docstring's stale ``cmd_data`` reference.

MilkClouds added 10 commits April 29, 2026 16:19

configs(behavior1k): add demo_replay model-server config

bfb7988

Companion to the baseline.yaml — points the demo-replay server at a LeRobot v2.1 parquet episode. ``demo_path`` is a placeholder; users swap in their own path before running.

MilkClouds force-pushed the feat/data-fetch-cli branch from 4e43a6f to 25eae2f Compare April 30, 2026 02:32

MilkClouds added 2 commits April 30, 2026 03:20

MilkClouds changed the title ~~feat(cli): vla-eval data fetch for external benchmark datasets~~ feat(dirs): unified XDG asset cache + runtime licence confirmation Apr 30, 2026

MilkClouds added 2 commits April 30, 2026 04:28

MilkClouds changed the base branch from feat/behavior1k-integration to main April 30, 2026 04:57

MilkClouds merged commit 0636346 into main Apr 30, 2026
6 checks passed

MilkClouds deleted the feat/data-fetch-cli branch April 30, 2026 04:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dirs): unified XDG asset cache + runtime licence confirmation#58

feat(dirs): unified XDG asset cache + runtime licence confirmation#58
MilkClouds merged 14 commits intomainfrom
feat/data-fetch-cli

MilkClouds commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MilkClouds commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in here

Verification

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MilkClouds commented Apr 29, 2026 •

edited

Loading