Skip to content

feat(dirs): unified XDG asset cache + runtime licence confirmation#58

Merged
MilkClouds merged 14 commits intomainfrom
feat/data-fetch-cli
Apr 30, 2026
Merged

feat(dirs): unified XDG asset cache + runtime licence confirmation#58
MilkClouds merged 14 commits intomainfrom
feat/data-fetch-cli

Conversation

@MilkClouds
Copy link
Copy Markdown
Collaborator

@MilkClouds MilkClouds commented Apr 29, 2026

Summary

Single host-cache resolver and licence-confirm primitive shared by every site in the harness that needs to put something on disk and (optionally) gate it behind a user opt-in. Replaces the per-site re-implementations of $XDG_CACHE_HOME/vla-eval/<name> and the per-server git clone or skip blocks; collapses the BEHAVIOR-1K reproduction from a two-step dance (manual docker download → eval) to one vla-eval run invocation.

The branch was originally about a vla-eval data fetch subcommand. After two rounds of design review the abstraction collapsed: the licence-acceptance handshake doesn't need a separate pre-flight step; runtime stdin prompt + env-var bypass is enough, and the explicit subcommand was reverted (commit 51bf04d). The branch now ships the smaller, sharper primitives.

What's in here

New: src/vla_eval/dirs.py (mirrors HuggingFace's HF_HOME / HF_ASSETS_CACHE precedence shape)

Function Precedence
home() $VLA_EVAL_HOME > $XDG_CACHE_HOME/vla-eval > ~/.cache/vla-eval
assets_cache(subdir) $VLA_EVAL_ASSETS_CACHE > home()/assets (+ optional subdir)
ensure_git_clone(name, repo, rev, *, shallow) host-side git helper used by lazy model-server clones
ensure_license(license_id, *, url, description) $VLA_EVAL_ACCEPTED_LICENSES env var → stdin prompt (isatty) → SystemExit with hint

Layout when no env vars are set:

~/.cache/vla-eval/
└── assets/
    ├── behavior1k/      # OmniGibson scene + task instance files
    ├── vlanext/         # github.com/DravenALG/VLANeXt clone
    └── mme-vla/         # github.com/RoboMME/robomme_policy_learning clone

Future namespaces (e.g. datasets/ for HF Hub datasets, tokens/ for auth) sit alongside assets/ under home() without adding new env vars.

Runtime licence-confirm flow

  • Behavior1KBenchmark._ensure_assets(data_path), called from _init_og, fetches BEHAVIOR-1K scene + task files on first run via the OmniGibson download_* helpers and gates the prompt through ensure_license.
  • vla-eval run adds --accept-license <id> (repeatable). The orchestrator forwards it into the eval container as VLA_EVAL_ACCEPTED_LICENSES=id1,id2,… so ensure_license skips the stdin prompt non-interactively.
  • _run_via_docker conditionally adds -i / -it based on host isatty, so an interactive user sees the prompt and a CI/sharded run goes straight through the env-var bypass.

Existing model-server clones migrate to the shared primitives

  • vlanext.py: _VLANEXT_CACHEassets_cache("vlanext"); the manual git clone lines → ensure_git_clone(name="vlanext", …). The VLANEXT_ROOT editable-mode override is unchanged.
  • mme_vla.py: same pattern, shallow=True for the branch clone.

Config & docs

  • configs/behavior1k_eval.yaml volume uses the same four-level OmegaConf fallback chain that assets_cache resolves to in Python; drops :ro so the in-container fetch can populate the host cache.
  • docs/reproductions/behavior1k.md collapses the old two-step "fetch then run" into a single vla-eval run --accept-license behavior-dataset-tos -c … invocation.
  • .claude/skills/add-benchmark/SKILL.md gains a small "Optional: external dataset acquisition" section pointing at the new pattern.

Earlier-commit improvements still in this PR (carry-over)

  • Behavior1KBenchmark.task_instance_id accepts int | list[int] | None and sweeps via episode_idx % len(list), so the 50-task × 10-instance challenge protocol is reachable.
  • Behavior1KDemoReplayModelServer step cursor is keyed on (session_id, episode_id) and managed via on_episode_start / on_episode_end; predict() raises if it's called before on_episode_start for that key.
  • Behavior1KBenchmark.get_metadata() declares action_dim=23 so realtime modes don't fall back to the orchestrator's 7-D default.
  • README.md Docker Images table marks rlbench and behavior1k rows as build-locally with a 🔒 marker; CONTRIBUTING.md benchmark roster refreshed.
  • cli/_console.py and cli/_docker.py host docker / console helpers shared by cli/main.py.
  • cli/config_loader.py always runs OmegaConf.to_container(resolve=True) so ${oc.env:…} interpolations work uniformly across configs.

Verification

make check                          # ruff + ty
make test                           # pytest, 306 passed (incl. test_dirs.py x10), 1 skipped
uv run vla-eval test --validate     # 63 / 63 configs valid

# Resolver picks up each env-var arrangement
uv run python -c "from vla_eval.dirs import home, assets_cache; print(home(), assets_cache('behavior1k'))"
VLA_EVAL_HOME=/srv/vla-eval uv run python -c "from vla_eval.dirs import assets_cache; print(assets_cache('behavior1k'))"
VLA_EVAL_ASSETS_CACHE=/fast/ssd/assets uv run python -c "from vla_eval.dirs import assets_cache; print(assets_cache('behavior1k'))"
XDG_CACHE_HOME=/var/cache uv run python -c "from vla_eval.dirs import assets_cache; print(assets_cache('behavior1k'))"

# Licence path: env-var bypass succeeds
VLA_EVAL_ACCEPTED_LICENSES=behavior-dataset-tos \
  uv run python -c "from vla_eval.dirs import ensure_license; ensure_license('behavior-dataset-tos', url='https://x', description='y'); print('ok')"
# -> ok

# Licence path: non-TTY without env var exits with hint
echo | uv run python -c "from vla_eval.dirs import ensure_license; ensure_license('behavior-dataset-tos', url='https://x', description='y')"
# -> Licence required: y
#    ID: behavior-dataset-tos
#    URL: https://x
#    Non-interactive context (no TTY).  Re-run with --accept-license behavior-dataset-tos …

End-to-end docker invocation hasn't been re-run on the cluster (would re-download ~35 GiB for no extra coverage); the existing reproduction artefact under docs/reproductions/data/ still applies.

Checklist

Code changes:

  • make check passes (ruff + ty)
  • make test passes — 306 passed, 1 skipped
  • Smoke-tested affected configs

Smoke test commands run:

make check
make test
uv run vla-eval test --validate
# resolver + licence smoke (see above)

Adds the BEHAVIOR-1K (OmniGibson + NVIDIA Isaac Sim 4.5.0) benchmark.
The integration covers the standard StepBenchmark surface plus a
demo-replay model server used to verify the dataloader against the
released LeRobot v2.1 trajectories.

What's added:

- ``src/vla_eval/benchmarks/behavior1k/benchmark.py``
  Behavior1KBenchmark with the required StepBenchmark methods.
  R1Pro robot, 23-D action, RGB head + L/R wrist cameras.  The async
  bridge is overridden so reset/step/cleanup run on a worker thread
  (Isaac Sim's SimulationApp ``signal.signal`` calls have to be
  monkey-patched while the worker thread is set up — they assume the
  main thread).

- ``src/vla_eval/model_servers/behavior1k_baseline.py``
  Zero-action baseline (smoke-test sanity check).

- ``src/vla_eval/model_servers/behavior1k_demo_replay.py``
  Plays back the recorded actions from a LeRobot v2.1 parquet
  episode.  Used to verify the env wiring matches the dataset.

- ``docker/Dockerfile.behavior1k``
  Isaac Sim 4.5.0 from pypi.nvidia.com (26 omni-* / isaacsim-*
  wheels), bddl3 + OmniGibson[eval] + joylo from BEHAVIOR-1K
  v3.7.2.  Gated behind ``ARG ACCEPT_NVIDIA_EULA=YES`` (NVIDIA
  Omniverse EULA, see https://docs.omniverse.nvidia.com/eula/).

- ``configs/behavior1k_eval.yaml`` — turning_on_radio task instance 1
- ``configs/model_servers/behavior1k/baseline.yaml`` — zero-action server
- ``docs/reproductions/behavior1k.md`` — repro write-up + data files

The behavior1k entry is registered in ``docker/build.sh`` (gated
via ``--accept-license behavior1k``) and listed under ``NO_REDIST``
in ``docker/push.sh`` so the image is built locally only.

Verification:

- demo-replay on turning_on_radio (task instance 1) → success=True
  at step 1364/2000 (within the human-annotated press-skill window
  [1162, 1434] from the BEHAVIOR Dataset annotations).
- zero-action baseline → success=False at max_steps (expected).

Skill notes (``.claude/skills/add-benchmark`` and ``add-model-server``)
gain a short reminder not to add ``tests/test_<name>_benchmark.py``
or ``tests/test_<name>_server.py`` with mocked sim/model libraries —
``tests/`` is for harness mechanics, not per-sim integration, and
mocked modules drift from upstream and miss real bugs.
Companion to the baseline.yaml — points the demo-replay server at a
LeRobot v2.1 parquet episode.  ``demo_path`` is a placeholder; users
swap in their own path before running.
CI surfaced three ty errors after the rebase:

- ``anyio.to_thread.run_sync(...)`` was unresolved through the module
  attribute path.  Use the same import-as-name style the rest of the
  codebase already uses (``predict.py``, ``serve.py``, ``rtc.py``).
- ``signal.signal = lambda ...`` triggered ``invalid-assignment``.
  Use ``setattr`` so the rebinding is opaque to the type checker
  (the runtime behaviour — restoring the handler in ``finally`` —
  is unchanged).
- Drop the leftover ``# type: ignore`` mypy-style pragmas that were
  carrying the old workarounds; ty doesn't honour them anyway.

While here, refresh the docs that mention benchmark coverage:

- ``README.md``: BEHAVIOR-1K badge promoted from ``planned`` to
  ``integrated``; rlbench dropped from the registry-pulled image
  table; new "Build-locally images" note covering rlbench and
  behavior1k; build-script example shows ``--accept-license``.
- ``CONTRIBUTING.md``: integrated-benchmark roster updated to match
  the actual contents of ``src/vla_eval/benchmarks/`` (was missing
  LIBERO-Plus/Mem, RoboMME, MolmoSpaces, Kinetix; now also adds
  BEHAVIOR-1K).
Build-locally images (rlbench, behavior1k) now appear in both the
top-of-readme support table and the Docker Images table, with a
🔒 marker indicating they're not pulled from ghcr.io and require an
explicit licence opt-in.

- Top support table: 🔒 appended after the rlbench and behavior1k
  badges.  Status legend gains a fourth entry explaining 🔒.
- Docker Images table: rlbench is restored (was dropped in the prior
  pass), behavior1k is added at its 23.6 GB position.  For both, the
  Image column shows the name without a ghcr.io link, and the row
  carries 🔒.
- Replaces the earlier "Build-locally images" paragraph with a single
  caption under the table that explains the marker.
Reverts the 🔒 markers added next to the RLBench / BEHAVIOR-1K
badges in the top support table, and the matching legend entry.
Build-mechanism details belong in the Docker Images table further
down — the support table just tracks integration / reproduction
status.
Three issues from review:

- ``Behavior1KBenchmark.task_instance_id`` was set once at
  construction and never varied, so ``episodes_per_task > 1`` runs
  reloaded the same TRO state every episode (and aggregate scores
  could not match the 50-task × 10-instance challenge protocol).
  Accept ``int | list[int] | None`` and index by
  ``task["episode_idx"]`` cyclically when a list is given; the
  scalar form preserves the demo-replay use case.

- ``Behavior1KDemoReplayModelServer`` kept a single
  ``_current_episode_id`` / ``_step_idx`` for the whole process,
  so two concurrent benchmark sessions on one server would race
  and consume a mixed action stream.  Key the cursor on
  ``(session_id, episode_id)``, initialise in
  ``on_episode_start`` and free in ``on_episode_end`` so the
  dict stays bounded.

- ``docs/reproductions/behavior1k.md`` build command did not pass
  ``--accept-license behavior1k``, so the new gated build skipped
  the image and the next step failed with "image not found".
  Updated the command and added the licence URL inline.
Some benchmarks ship the simulator in the docker image but expect the
dataset to come from a separate, licence-restricted source (BEHAVIOR-1K
is the first concrete consumer — its dataset is governed by the
BEHAVIOR Dataset ToS and OmniGibson assets).  Previously, users had to
manually run a docker invocation that wired ``download_*`` helpers
inside the image and then edit ``configs/behavior1k_eval.yaml`` so the
volume pointed at their host download directory.  Two awkward steps
that don't appear anywhere a first-time reader would naturally look.

This change introduces a uniform mechanism the harness can use for any
benchmark with the same shape:

- ``Benchmark.data_requirements()`` (classmethod, base default returns
  ``None``) declares a :class:`DataRequirement`: licence id + URL,
  the in-container data path, a marker file, and the argv to run
  inside the image.  ``Behavior1KBenchmark`` returns the BEHAVIOR
  Dataset ToS opt-in plus the canonical ``download_*`` helper script.

- New ``vla-eval data fetch -c <config> --accept-license <id>`` CLI:
  resolves the benchmark class, mounts a host cache directory
  read-write at the container's data path, and runs the declared
  download command.  Idempotent (skips when the marker is already
  present).  Symmetric with ``docker/build.sh --accept-license``,
  so opt-in surface is the same shape across build and fetch.

- Default cache path: ``${VLA_EVAL_DATA_DIR}/<benchmark>`` if the env
  var is set, else ``${HOME}/.cache/vla-eval/<benchmark>``.
  ``--data-dir`` overrides explicitly.  ``configs/behavior1k_eval.yaml``
  resolves the same expression in its ``volumes:`` line via OmegaConf,
  so a fresh clone + ``vla-eval data fetch`` + ``vla-eval run``
  works without any per-host config edits.

- ``cli/config_loader.py`` now always runs ``OmegaConf.to_container``
  with ``resolve=True`` (previously only on configs that used
  ``extends:``), so ``${oc.env:VAR,default}`` interpolations are
  honoured uniformly.  No-op for configs without interpolations.

- ``docs/reproductions/behavior1k.md`` replaces the manual docker
  download incantation with the new ``vla-eval data fetch`` step
  and notes the auto-resolved cache path.
Three concrete cleanups from the simplify pass:

- Use ``gpu_docker_flag(...)`` instead of building ``--gpus`` by hand.
  The previous form passed ``--gpus 0`` literally; ``docker run`` needs
  ``--gpus device=0`` for non-``all`` specs, so this was a real bug
  that only manifested when ``--gpus 0`` was passed at fetch time.

- Add ``cache_key`` to ``DataRequirement`` so the cache subdir is
  declared by the benchmark instead of derived from import-string
  parsing (the old form sliced ``parts[-2]`` from the module path,
  which silently couples the CLI to a package layout convention).

- De-duplicate ``_stderr_console`` — the new fetch handler had
  reimplemented it with a different rich Console signature.  Hoisted
  to ``cli/_console.py``; both ``main.py`` and ``cmd_data.py`` now
  import it from one place.

While here:

- Drop the redundant ``image`` parameter from ``_build_docker_argv``
  (it was always ``docker_cfg.image``).
- Drop a dead ``or []`` branch in ``set(args.accept_license or [])``
  (argparse already defaults to ``[]``).
- Trim the ``DataRequirement`` and ``Behavior1KBenchmark.data_requirements``
  docstrings to match the project's existing density.
- Refresh the ``config_loader.load_config`` docstring (it claimed
  no-extends configs went through ``yaml.safe_load`` only, which is
  no longer true since the prior commit always resolved
  interpolations).
Two issues from review:

- ``Behavior1KBenchmark.get_metadata()`` did not declare
  ``action_dim``, so realtime modes (``Orchestrator`` reads
  ``metadata.get("action_dim", 7)`` to size the action buffer)
  would fall back to 7 and instantly crash on the first
  ``step()`` once it saw a 23-D R1Pro action.  Add the entry.

- Drop ``--data-dir`` from ``vla-eval data fetch``.  When users
  passed ``--data-dir /custom`` for fetch, the dataset went there,
  but ``vla-eval run`` mounts the path resolved by the YAML's
  ``${VLA_EVAL_DATA_DIR:-~/.cache/vla-eval}/<cache_key>`` —
  fetch-target and run-target diverged silently.  ``VLA_EVAL_DATA_DIR``
  is the single source of truth for the cache base; both fetch and
  run honour it.  Users who need a different disk set the env var
  once.
Six concrete improvements from the post-/simplify deep review:

- Hoist docker daemon/image helpers out of ``cli/main.py`` into
  ``cli/_docker.py`` so ``cmd_data`` can reuse them. ``cmd_data_fetch``
  now ``check_docker_daemon``-s the host and ``image_exists_locally``-s
  the configured image; if the image isn't built yet it points the
  user at the right ``docker/build.sh ... --accept-license ...``
  invocation instead of letting docker fail with a manifest-not-found
  error a few seconds later (``NO_REDIST`` images can't be pulled).

- ``cmd_data._resolve_benchmark_class`` now warns when a config has
  more than one benchmark entry — the fetcher only acts on the
  first, and the silent skip was a footgun for multi-benchmark
  configs.

- ``Behavior1KDemoReplayModelServer.predict`` raises if it's called
  without a prior ``on_episode_start`` for that ``(session, episode)``
  key.  The previous ``_step_idx.get(key, 0)`` silently restarted
  the cursor at 0, which would corrupt a replay if the harness ever
  called ``predict`` before sending ``EPISODE_START`` (e.g. on a mid-
  episode reconnect).  Matches the explicit ``RuntimeError`` pattern
  ``predict.py`` uses elsewhere for protocol violations.

- ``docs/reproductions/behavior1k.md`` "How to Reproduce" header
  said ``100 steps``, but the config sets ``max_steps: 2000``;
  fix the header.

- ``.claude/skills/add-benchmark/SKILL.md`` gains a section on
  ``data_requirements()`` so a contributor adding a new benchmark
  with externally-licensed data discovers the fetch hook
  through the skill rather than by reading ``behavior1k`` source.

- ``docker/build.sh`` ``EULA_GATED`` and ``docker/push.sh``
  ``NO_REDIST`` now cross-reference each other in their headers.
  These two lists must stay in sync (any image gated on build is
  also unpushable); the comments make the invariant visible to
  future contributors.
@MilkClouds MilkClouds force-pushed the feat/data-fetch-cli branch from 4e43a6f to 25eae2f Compare April 30, 2026 02:32
…ctions

Re-orient PR #58 around a smaller infrastructure change.  The
``vla-eval data fetch`` subcommand + ``DataRequirement`` declarative
metadata layer were over-built for the actual lifecycle: the
license-acceptance handshake doesn't need to be a separate pre-flight
step; it can be runtime, prompted on first need, just like model-server
git clones already do.  Moving the licence confirmation to runtime
collapses the asymmetry between benchmark-asset fetch and model-server
clone fetch — both become lazy, both go through the same primitives.

This commit removes the abstraction.  The next commit adds the
runtime-licence flow and the unified host-cache resolver.

Removed:

- ``src/vla_eval/cli/cmd_data.py`` (the ``vla-eval data fetch``
  subcommand and its docker-side fetch dispatch).
- ``DataRequirement`` dataclass and ``Benchmark.data_requirements``
  classmethod on ``src/vla_eval/benchmarks/base.py``.
- ``Behavior1KBenchmark.data_requirements`` method.
- ``cmd_data.register(sub)`` wiring in ``cli/main.py``.

Reverted to the PR #57 baseline:

- ``configs/behavior1k_eval.yaml`` — the data-fetch comment block and
  the OmegaConf volume interpolation; the next commit puts the
  interpolation back in extended XDG-aware form.
- ``docs/reproductions/behavior1k.md`` step 2.
- ``.claude/skills/add-benchmark/SKILL.md`` ``data_requirements``
  section.

Kept (independent improvements that survive this rewrite):

- ``cli/_console.py``, ``cli/_docker.py`` (helper hoists).
- ``cli/config_loader.py`` always-on OmegaConf interpolation.
- ``Behavior1KBenchmark.task_instance_id`` per-episode sweep.
- Demo-replay per-(session, episode) cursor + ``on_episode_start``
  fail-loud hook.
- ``Behavior1KBenchmark.get_metadata`` declaring ``action_dim=23``.
- README "Build-locally images" caption + 🔒 marker on rlbench/
  behavior1k rows; CONTRIBUTING benchmark roster refresh.
Single host-cache resolver, one git-clone helper, one licence-confirm
helper.  All three sites that previously rolled their own resolution
(``cli/cmd_data``, ``model_servers/vlanext``, ``model_servers/mme_vla``)
now go through them.  License acceptance moves from the explicit
pre-flight ``vla-eval data fetch`` step (reverted in the previous
commit) to a runtime stdin prompt inside the eval container, with a
``--accept-license`` / ``$VLA_EVAL_ACCEPTED_LICENSES`` non-interactive
escape hatch.

New module ``vla_eval/dirs.py`` (mirrors HuggingFace's
``HF_HOME`` / ``HF_ASSETS_CACHE`` precedence shape):

- ``home()`` — ``$VLA_EVAL_HOME > $XDG_CACHE_HOME/vla-eval > ~/.cache/vla-eval``.
- ``assets_cache(subdir)`` — ``$VLA_EVAL_ASSETS_CACHE > home()/assets``,
  with optional per-component subdir.
- ``ensure_git_clone(name, repo, rev, *, shallow=False)`` — lazy clone
  into ``assets_cache(name)`` at the pinned rev.  Idempotent.
- ``ensure_license(license_id, *, url, description)`` — env-var
  bypass first (``$VLA_EVAL_ACCEPTED_LICENSES``), then interactive
  stdin prompt (``isatty``), else ``SystemExit`` with a hint pointing
  at ``--accept-license`` / the env var.

Tests in ``tests/test_dirs.py`` cover env-var precedence + every
``ensure_license`` branch.

Behavior1KBenchmark gains ``_ensure_assets(data_path)`` called from
``_init_og``.  First-run on a fresh host prompts for the BEHAVIOR
Dataset ToS via ``ensure_license`` and runs the three OmniGibson
``download_*`` helpers.  No more separate fetch step; the eval
container itself populates its own cache.

Model servers ``vlanext`` and ``mme_vla`` lose their hand-rolled
``$XDG_CACHE_HOME/vla-eval/<name>`` resolution and the ``git clone or
skip`` block; both now call ``assets_cache`` + ``ensure_git_clone``.
``VLANEXT_ROOT`` (vlanext-specific editable-mode override that
bypasses the cache entirely) stays at the call site.

Orchestrator (``cli/main.py``) changes:

- Adds ``--accept-license <id>`` (repeatable) to ``vla-eval run``;
  forwarded into the eval container as
  ``VLA_EVAL_ACCEPTED_LICENSES=id1,id2,…``.
- Conditionally appends ``-i``/``-it`` to ``docker run`` so the
  in-container licence prompt can read the host's stdin in
  interactive contexts.

Config / docs:

- ``configs/behavior1k_eval.yaml`` volumes use the four-level OmegaConf
  fallback chain that mirrors ``assets_cache``'s Python precedence,
  and drops ``:ro`` so the in-container download can populate the
  host cache.
- ``docs/reproductions/behavior1k.md`` collapses the old two-step
  reproduction (manual ``docker run`` download + eval) into a single
  ``vla-eval run --accept-license behavior-dataset-tos`` invocation.
- ``.claude/skills/add-benchmark/SKILL.md`` gains a small "Optional:
  external dataset acquisition" section pointing at
  ``vla_eval.dirs.ensure_license`` + ``assets_cache`` and the
  ``_init_og``-style lazy pattern.
@MilkClouds MilkClouds changed the title feat(cli): vla-eval data fetch for external benchmark datasets feat(dirs): unified XDG asset cache + runtime licence confirmation Apr 30, 2026
Five concrete cleanups from the post-feat review:

- ``ACCEPTED_LICENSES_ENV`` constant in ``dirs.py``; the env var name
  no longer repeats as a string literal across modules.
- ``tty_docker_flags()`` helper in ``docker_resources.py`` next to
  ``gpu_docker_flag`` / ``shard_docker_flags``.  Replaces the inline
  ``isatty`` ladder in ``_run_via_docker``.
- Two unit tests for ``ensure_git_clone`` (idempotent short-circuit
  when ``.git`` exists; argv shape for ``shallow=True``).  The
  network call still isn't unit-tested but the dispatch is now
  covered.
- ``ensure_license`` non-interactive message gets a banner box +
  ``[vla-eval]`` prefix + concrete two-line "re-run with one of"
  block, so the licence prompt isn't drowned in OmniGibson's
  initialisation logs.
- Docstring / comment widths re-flowed up to the project's 119
  line-length (was wrapping ~70 chars by reflex).
Sweep across the surface PR #58 touches: comments and docstrings that
were wrapping at ~62-72 chars get reflowed to fit the project's 119
line-length, and a small set of pure-narration banner comments are
removed.

Reflowed (no information loss, just wider lines):

- ``benchmarks/behavior1k/benchmark.py`` — module docstring, class
  docstring + ``task_instance_id`` Args block, ``_load_task_instance``
  docstring, signal-handler / cleanup / async-bridge / start_episode
  comment blocks.
- ``model_servers/behavior1k_demo_replay.py`` — module docstring,
  parquet ``columns=`` rationale, per-(session, episode) cursor note.
- ``model_servers/behavior1k_baseline.py`` — module docstring.
- ``model_servers/vlanext.py`` — ``_ensure_vlanext`` docstring,
  shallow-clone rationale, class docstring.
- ``model_servers/mme_vla.py`` — module-level shallow-clone rationale,
  class docstring + Args block.
- ``docker_resources.py`` — ``tty_docker_flags`` summary, the
  OMP/MKL block.

Removed (clearly noise, not navigation aids):

- ``cli/main.py`` `# CLI override for ...` / `# Validate shard args` /
  `# Resource allocation` / `# Decide whether to run via Docker`
  banners are kept (they aid navigation in long ``cmd_run``).  The
  truly redundant ones — `# Extra volumes from config`, `# Extra env
  vars`, `# 1. CWD (running from repo root)` — go.
- ``cli/_docker.py`` module docstring's stale ``cmd_data`` reference.
@MilkClouds MilkClouds changed the base branch from feat/behavior1k-integration to main April 30, 2026 04:57
@MilkClouds MilkClouds merged commit 0636346 into main Apr 30, 2026
6 checks passed
@MilkClouds MilkClouds deleted the feat/data-fetch-cli branch April 30, 2026 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant