Skip to content

Commit 51bf04d

Browse files
committed
revert: drop DataRequirement / vla-eval data fetch / cache_key abstractions
Re-orient PR #58 around a smaller infrastructure change. The ``vla-eval data fetch`` subcommand + ``DataRequirement`` declarative metadata layer were over-built for the actual lifecycle: the license-acceptance handshake doesn't need to be a separate pre-flight step; it can be runtime, prompted on first need, just like model-server git clones already do. Moving the licence confirmation to runtime collapses the asymmetry between benchmark-asset fetch and model-server clone fetch — both become lazy, both go through the same primitives. This commit removes the abstraction. The next commit adds the runtime-licence flow and the unified host-cache resolver. Removed: - ``src/vla_eval/cli/cmd_data.py`` (the ``vla-eval data fetch`` subcommand and its docker-side fetch dispatch). - ``DataRequirement`` dataclass and ``Benchmark.data_requirements`` classmethod on ``src/vla_eval/benchmarks/base.py``. - ``Behavior1KBenchmark.data_requirements`` method. - ``cmd_data.register(sub)`` wiring in ``cli/main.py``. Reverted to the PR #57 baseline: - ``configs/behavior1k_eval.yaml`` — the data-fetch comment block and the OmegaConf volume interpolation; the next commit puts the interpolation back in extended XDG-aware form. - ``docs/reproductions/behavior1k.md`` step 2. - ``.claude/skills/add-benchmark/SKILL.md`` ``data_requirements`` section. Kept (independent improvements that survive this rewrite): - ``cli/_console.py``, ``cli/_docker.py`` (helper hoists). - ``cli/config_loader.py`` always-on OmegaConf interpolation. - ``Behavior1KBenchmark.task_instance_id`` per-episode sweep. - Demo-replay per-(session, episode) cursor + ``on_episode_start`` fail-loud hook. - ``Behavior1KBenchmark.get_metadata`` declaring ``action_dim=23``. - README "Build-locally images" caption + 🔒 marker on rlbench/ behavior1k rows; CONTRIBUTING benchmark roster refresh.
1 parent 25eae2f commit 51bf04d

7 files changed

Lines changed: 30 additions & 309 deletions

File tree

.claude/skills/add-benchmark/SKILL.md

Lines changed: 0 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -121,35 +121,6 @@ class MyBenchmark(StepBenchmark):
121121
- **Image preprocessing**: Handle non-standard images (flipped, wrong resolution) in `make_obs()`.
122122
- **EGL headless rendering**: Add `os.environ.setdefault("PYOPENGL_PLATFORM", "egl")` at module top if the sim uses OpenGL.
123123

124-
### Optional: external dataset declaration
125-
126-
If the benchmark's dataset is licensed independently and shouldn't be baked into the docker image, override `data_requirements()` (classmethod) so the harness's uniform fetch path picks it up:
127-
128-
```python
129-
from vla_eval.benchmarks.base import DataRequirement
130-
131-
class MyBenchmark(StepBenchmark):
132-
@classmethod
133-
def data_requirements(cls) -> DataRequirement:
134-
return DataRequirement(
135-
license_id="my-dataset-tos", # --accept-license <id>
136-
license_url="https://example.com/license",
137-
cache_key="my_bench", # host cache subdir name
138-
container_data_path="/app/data", # mount target inside the image
139-
marker="dataset_ready_marker", # file/dir whose presence skips refetch
140-
download_command=("python", "-c", "<download script>"),
141-
)
142-
```
143-
144-
Users then run `vla-eval data fetch -c configs/<name>_eval.yaml --accept-license <license_id>` once. The fetcher mounts `${VLA_EVAL_DATA_DIR:-~/.cache/vla-eval}/<cache_key>` read-write at `container_data_path` and runs `download_command`. The eval config's `volumes:` entry should mount the same host path read-only via OmegaConf interpolation:
145-
146-
```yaml
147-
volumes:
148-
- "${oc.env:VLA_EVAL_DATA_DIR,${oc.env:HOME}/.cache/vla-eval}/<cache_key>:<container_data_path>:ro"
149-
```
150-
151-
Reference: `Behavior1KBenchmark.data_requirements()` in `benchmarks/behavior1k/benchmark.py`.
152-
153124
## 3. Create config YAML
154125

155126
Create `configs/<name>_eval.yaml`:

configs/behavior1k_eval.yaml

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,9 @@
11
# BEHAVIOR-1K (OmniGibson / Isaac Sim) — 50-task household-activity suite.
22
#
3-
# Run ``vla-eval data fetch -c configs/behavior1k_eval.yaml
4-
# --accept-license behavior-dataset-tos`` once before evaluating to
5-
# populate the dataset cache. The default cache lives at
6-
# ``$VLA_EVAL_DATA_DIR/behavior1k`` (or ``~/.cache/vla-eval/behavior1k``
7-
# when the env var is unset); set ``VLA_EVAL_DATA_DIR`` to redirect to
8-
# a faster disk. An NVIDIA GPU with Vulkan + EGL is required.
3+
# Before running, edit the dataset volume below to point at your local
4+
# BEHAVIOR-1K data directory (the one populated by the three
5+
# ``download_*`` calls documented in docs/reproductions/behavior1k.md).
6+
# An NVIDIA GPU with Vulkan + EGL is required.
97
server:
108
url: "ws://localhost:8000"
119

@@ -22,12 +20,11 @@ docker:
2220
# GPU" error and a segfault deep in omni.kit.xr on first launch.
2321
- "VK_ICD_FILENAMES=/etc/vulkan/icd.d/nvidia_icd.json"
2422
volumes:
25-
# OmniGibson reads ``gm.DATA_PATH=/app/BEHAVIOR-1K/datasets`` at
26-
# import time. The host path resolves via OmegaConf:
27-
# ``${VLA_EVAL_DATA_DIR}/behavior1k`` if set, else
28-
# ``${HOME}/.cache/vla-eval/behavior1k``. This is the same layout
29-
# ``vla-eval data fetch`` writes to.
30-
- "${oc.env:VLA_EVAL_DATA_DIR,${oc.env:HOME}/.cache/vla-eval}/behavior1k:/app/BEHAVIOR-1K/datasets:ro"
23+
# OmniGibson reads gm.DATA_PATH=/app/BEHAVIOR-1K/datasets at import time.
24+
# Replace the host side with the directory holding
25+
# ``omnigibson-robot-assets/``, ``behavior-1k-assets/``, and
26+
# ``2025-challenge-task-instances/``.
27+
- "/data/og_data:/app/BEHAVIOR-1K/datasets:ro"
3128

3229
output_dir: "./results"
3330

docs/reproductions/behavior1k.md

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -110,23 +110,32 @@ test set.
110110
- **Max steps:** 5000 default (or 2× human demo length when configured;
111111
see `learning/eval.py` for the dataset-driven path).
112112

113-
## How to Reproduce (zero-action baseline, 1 task, 2000 step cap)
113+
## How to Reproduce (zero-action baseline, 1 task, 100 steps)
114114

115115
```bash
116116
# 1. Build the image (heavy: ~17 min, 23.5 GB).
117117
# The behavior1k Dockerfile is gated behind a licence opt-in
118118
# (NVIDIA Omniverse EULA — https://docs.omniverse.nvidia.com/eula/).
119119
docker/build.sh behavior1k --accept-license behavior1k
120120

121-
# 2. Download the dataset (~35 GiB) into the harness cache. This drives
122-
# the official ``download_omnigibson_robot_assets`` /
123-
# ``download_behavior_1k_assets`` / ``download_2025_challenge_task_instances``
124-
# helpers inside the image and accepts the BEHAVIOR Dataset ToS. The
125-
# cache lives at ``$VLA_EVAL_DATA_DIR/behavior1k`` (defaults to
126-
# ``~/.cache/vla-eval/behavior1k``) — set ``VLA_EVAL_DATA_DIR`` to
127-
# redirect to a faster disk before running.
128-
uv run vla-eval data fetch -c configs/behavior1k_eval.yaml \
129-
--accept-license behavior-dataset-tos
121+
# 2. Download the dataset (~35 GiB). Mount-target inside the image
122+
# is /app/BEHAVIOR-1K/datasets — that's where gm.DATA_PATH points.
123+
mkdir -p /path/to/og_data
124+
docker run --rm --gpus all \
125+
-e OMNI_KIT_ACCEPT_EULA=YES \
126+
-v /path/to/og_data:/app/BEHAVIOR-1K/datasets \
127+
--entrypoint conda \
128+
ghcr.io/allenai/vla-evaluation-harness/behavior1k:latest \
129+
run --no-capture-output -n behavior python -c "
130+
from omnigibson.utils.asset_utils import (
131+
download_omnigibson_robot_assets,
132+
download_behavior_1k_assets,
133+
download_2025_challenge_task_instances,
134+
)
135+
download_omnigibson_robot_assets()
136+
download_behavior_1k_assets(accept_license=True)
137+
download_2025_challenge_task_instances()
138+
"
130139

131140
# 3. Start the zero-action baseline server.
132141
uv run --script src/vla_eval/model_servers/behavior1k_baseline.py \
@@ -140,10 +149,7 @@ uv run vla-eval run -c configs/behavior1k_eval.yaml \
140149
--gpus 0 --yes
141150
```
142151

143-
The eval config picks up the cache directory automatically (the
144-
``volumes`` entry resolves
145-
``${VLA_EVAL_DATA_DIR}/behavior1k`` with a fallback to
146-
``${HOME}/.cache/vla-eval/behavior1k``); no per-host edits required.
152+
Edit `configs/behavior1k_eval.yaml` `volumes` to point at your dataset path.
147153

148154
## What Trained-VLA Reproduction Still Needs
149155

src/vla_eval/benchmarks/base.py

Lines changed: 0 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -33,26 +33,6 @@ class StepResult:
3333
info: dict[str, Any]
3434

3535

36-
@dataclass(frozen=True)
37-
class DataRequirement:
38-
"""Declares a benchmark's externally-licensed dataset.
39-
40-
The CLI uses this to drive ``vla-eval data fetch``: it mounts
41-
``${VLA_EVAL_DATA_DIR:-~/.cache/vla-eval}/<cache_key>`` at
42-
``container_data_path`` (read-write) and runs ``download_command``.
43-
``marker`` is a host-relative path the download produces last; its
44-
presence short-circuits re-fetches. ``license_id`` is the
45-
user-facing kebab-case token compared against ``--accept-license``.
46-
"""
47-
48-
license_id: str
49-
license_url: str
50-
cache_key: str
51-
container_data_path: str
52-
marker: str
53-
download_command: tuple[str, ...]
54-
55-
5636
# ---------------------------------------------------------------------------
5737
# Async Benchmark ABC (parent)
5838
# ---------------------------------------------------------------------------
@@ -152,14 +132,6 @@ def get_metadata(self) -> dict[str, Any]:
152132
"""Return benchmark defaults and metadata. Optional override."""
153133
return {}
154134

155-
@classmethod
156-
def data_requirements(cls) -> DataRequirement | None:
157-
"""Optional: declare an external dataset for ``vla-eval data fetch``.
158-
159-
Default ``None`` — most benchmarks bundle data in the docker image.
160-
"""
161-
return None
162-
163135
def cleanup(self) -> None:
164136
"""Release benchmark resources (environments, renderers, etc.). Optional override."""
165137

src/vla_eval/benchmarks/behavior1k/benchmark.py

Lines changed: 1 addition & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
import numpy as np
3333
from anyio.to_thread import run_sync as _run_in_thread
3434

35-
from vla_eval.benchmarks.base import DataRequirement, StepBenchmark, StepResult
35+
from vla_eval.benchmarks.base import StepBenchmark, StepResult
3636
from vla_eval.specs import IMAGE_RGB, LANGUAGE, RAW, DimSpec
3737
from vla_eval.types import Action, EpisodeResult, Observation, Task
3838

@@ -213,42 +213,6 @@ def __init__(
213213
self._current_task_name: str | None = None
214214
self._available_tasks: dict[str, Any] | None = None
215215

216-
# ------------------------------------------------------------------
217-
# Data fetch
218-
# ------------------------------------------------------------------
219-
220-
@classmethod
221-
def data_requirements(cls) -> DataRequirement:
222-
# The download_* helpers are idempotent (no-op when files exist);
223-
# the 2025-challenge task instances are written last, so its
224-
# presence implies the prior two completed.
225-
download_script = (
226-
"from omnigibson.utils.asset_utils import ("
227-
"download_omnigibson_robot_assets, "
228-
"download_behavior_1k_assets, "
229-
"download_2025_challenge_task_instances); "
230-
"download_omnigibson_robot_assets(); "
231-
"download_behavior_1k_assets(accept_license=True); "
232-
"download_2025_challenge_task_instances()"
233-
)
234-
return DataRequirement(
235-
license_id="behavior-dataset-tos",
236-
license_url="https://behavior.stanford.edu/dataset",
237-
cache_key="behavior1k",
238-
container_data_path="/app/BEHAVIOR-1K/datasets",
239-
marker="2025-challenge-task-instances",
240-
download_command=(
241-
"conda",
242-
"run",
243-
"--no-capture-output",
244-
"-n",
245-
"behavior",
246-
"python",
247-
"-c",
248-
download_script,
249-
),
250-
)
251-
252216
# ------------------------------------------------------------------
253217
# Lazy initialization
254218
# ------------------------------------------------------------------

0 commit comments

Comments
 (0)