Skip to content

Commit 29943b1

Browse files
committed
feat(cli): add vla-eval data fetch for external datasets
Some benchmarks ship the simulator in the docker image but expect the dataset to come from a separate, licence-restricted source (BEHAVIOR-1K is the first concrete consumer — its dataset is governed by the BEHAVIOR Dataset ToS and OmniGibson assets). Previously, users had to manually run a docker invocation that wired ``download_*`` helpers inside the image and then edit ``configs/behavior1k_eval.yaml`` so the volume pointed at their host download directory. Two awkward steps that don't appear anywhere a first-time reader would naturally look. This change introduces a uniform mechanism the harness can use for any benchmark with the same shape: - ``Benchmark.data_requirements()`` (classmethod, base default returns ``None``) declares a :class:`DataRequirement`: licence id + URL, the in-container data path, a marker file, and the argv to run inside the image. ``Behavior1KBenchmark`` returns the BEHAVIOR Dataset ToS opt-in plus the canonical ``download_*`` helper script. - New ``vla-eval data fetch -c <config> --accept-license <id>`` CLI: resolves the benchmark class, mounts a host cache directory read-write at the container's data path, and runs the declared download command. Idempotent (skips when the marker is already present). Symmetric with ``docker/build.sh --accept-license``, so opt-in surface is the same shape across build and fetch. - Default cache path: ``${VLA_EVAL_DATA_DIR}/<benchmark>`` if the env var is set, else ``${HOME}/.cache/vla-eval/<benchmark>``. ``--data-dir`` overrides explicitly. ``configs/behavior1k_eval.yaml`` resolves the same expression in its ``volumes:`` line via OmegaConf, so a fresh clone + ``vla-eval data fetch`` + ``vla-eval run`` works without any per-host config edits. - ``cli/config_loader.py`` now always runs ``OmegaConf.to_container`` with ``resolve=True`` (previously only on configs that used ``extends:``), so ``${oc.env:VAR,default}`` interpolations are honoured uniformly. No-op for configs without interpolations. - ``docs/reproductions/behavior1k.md`` replaces the manual docker download incantation with the new ``vla-eval data fetch`` step and notes the auto-resolved cache path.
1 parent 407d4f8 commit 29943b1

7 files changed

Lines changed: 336 additions & 44 deletions

File tree

configs/behavior1k_eval.yaml

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# BEHAVIOR-1K (OmniGibson / Isaac Sim) — 50-task household-activity suite.
22
#
3-
# Before running, edit the dataset volume below to point at your local
4-
# BEHAVIOR-1K data directory (the one populated by the three
5-
# ``download_*`` calls documented in docs/reproductions/behavior1k.md).
6-
# An NVIDIA GPU with Vulkan + EGL is required.
3+
# Run ``vla-eval data fetch -c configs/behavior1k_eval.yaml
4+
# --accept-license behavior-dataset-tos`` once before evaluating to
5+
# populate the dataset cache. The default cache lives at
6+
# ``$VLA_EVAL_DATA_DIR/behavior1k`` (or ``~/.cache/vla-eval/behavior1k``
7+
# when the env var is unset); set ``VLA_EVAL_DATA_DIR`` to redirect to
8+
# a faster disk. An NVIDIA GPU with Vulkan + EGL is required.
79
server:
810
url: "ws://localhost:8000"
911

@@ -20,11 +22,12 @@ docker:
2022
# GPU" error and a segfault deep in omni.kit.xr on first launch.
2123
- "VK_ICD_FILENAMES=/etc/vulkan/icd.d/nvidia_icd.json"
2224
volumes:
23-
# OmniGibson reads gm.DATA_PATH=/app/BEHAVIOR-1K/datasets at import time.
24-
# Replace the host side with the directory holding
25-
# ``omnigibson-robot-assets/``, ``behavior-1k-assets/``, and
26-
# ``2025-challenge-task-instances/``.
27-
- "/data/og_data:/app/BEHAVIOR-1K/datasets:ro"
25+
# OmniGibson reads ``gm.DATA_PATH=/app/BEHAVIOR-1K/datasets`` at
26+
# import time. The host path resolves via OmegaConf:
27+
# ``${VLA_EVAL_DATA_DIR}/behavior1k`` if set, else
28+
# ``${HOME}/.cache/vla-eval/behavior1k``. This is the same layout
29+
# ``vla-eval data fetch`` writes to.
30+
- "${oc.env:VLA_EVAL_DATA_DIR,${oc.env:HOME}/.cache/vla-eval}/behavior1k:/app/BEHAVIOR-1K/datasets:ro"
2831

2932
output_dir: "./results"
3033

docs/reproductions/behavior1k.md

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -118,24 +118,15 @@ test set.
118118
# (NVIDIA Omniverse EULA — https://docs.omniverse.nvidia.com/eula/).
119119
docker/build.sh behavior1k --accept-license behavior1k
120120

121-
# 2. Download the dataset (~35 GiB). Mount-target inside the image
122-
# is /app/BEHAVIOR-1K/datasets — that's where gm.DATA_PATH points.
123-
mkdir -p /path/to/og_data
124-
docker run --rm --gpus all \
125-
-e OMNI_KIT_ACCEPT_EULA=YES \
126-
-v /path/to/og_data:/app/BEHAVIOR-1K/datasets \
127-
--entrypoint conda \
128-
ghcr.io/allenai/vla-evaluation-harness/behavior1k:latest \
129-
run --no-capture-output -n behavior python -c "
130-
from omnigibson.utils.asset_utils import (
131-
download_omnigibson_robot_assets,
132-
download_behavior_1k_assets,
133-
download_2025_challenge_task_instances,
134-
)
135-
download_omnigibson_robot_assets()
136-
download_behavior_1k_assets(accept_license=True)
137-
download_2025_challenge_task_instances()
138-
"
121+
# 2. Download the dataset (~35 GiB) into the harness cache. This drives
122+
# the official ``download_omnigibson_robot_assets`` /
123+
# ``download_behavior_1k_assets`` / ``download_2025_challenge_task_instances``
124+
# helpers inside the image and accepts the BEHAVIOR Dataset ToS. The
125+
# cache lives at ``$VLA_EVAL_DATA_DIR/behavior1k`` (defaults to
126+
# ``~/.cache/vla-eval/behavior1k``) — set ``VLA_EVAL_DATA_DIR`` to
127+
# redirect to a faster disk before running.
128+
uv run vla-eval data fetch -c configs/behavior1k_eval.yaml \
129+
--accept-license behavior-dataset-tos
139130

140131
# 3. Start the zero-action baseline server.
141132
uv run --script src/vla_eval/model_servers/behavior1k_baseline.py \
@@ -149,7 +140,10 @@ uv run vla-eval run -c configs/behavior1k_eval.yaml \
149140
--gpus 0 --yes
150141
```
151142

152-
Edit `configs/behavior1k_eval.yaml` `volumes` to point at your dataset path.
143+
The eval config picks up the cache directory automatically (the
144+
``volumes`` entry resolves
145+
``${VLA_EVAL_DATA_DIR}/behavior1k`` with a fallback to
146+
``${HOME}/.cache/vla-eval/behavior1k``); no per-host edits required.
153147

154148
## What Trained-VLA Reproduction Still Needs
155149

src/vla_eval/benchmarks/base.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,41 @@ class StepResult:
3333
info: dict[str, Any]
3434

3535

36+
@dataclass(frozen=True)
37+
class DataRequirement:
38+
"""External-data requirement that can't be redistributed in the image.
39+
40+
Benchmarks whose dataset is licensed independently of the harness
41+
(e.g. BEHAVIOR-1K's BEHAVIOR Dataset ToS) declare a
42+
``DataRequirement`` from their ``data_requirements()`` classmethod
43+
so the CLI can drive a uniform fetch flow.
44+
45+
Fields:
46+
license_id: Token a user passes to ``--accept-license`` to
47+
opt in. Should be lower-kebab-case and stable
48+
(e.g. ``"behavior-dataset-tos"``).
49+
license_url: Where to read the licence terms.
50+
container_data_path: Path inside the docker image where the
51+
data must be mounted. Used as the mount target for both
52+
``vla-eval data fetch`` (read-write) and ``vla-eval run``
53+
(read-only).
54+
marker: Path relative to the *host* data directory that, once
55+
present, signals the dataset is fetched. Used for the
56+
"already-fetched" short-circuit. Pick something that the
57+
download command produces last (a final asset directory,
58+
for instance).
59+
download_command: argv that the docker container will run, with
60+
``container_data_path`` mounted read-write, to populate
61+
the dataset.
62+
"""
63+
64+
license_id: str
65+
license_url: str
66+
container_data_path: str
67+
marker: str
68+
download_command: tuple[str, ...]
69+
70+
3671
# ---------------------------------------------------------------------------
3772
# Async Benchmark ABC (parent)
3873
# ---------------------------------------------------------------------------
@@ -132,6 +167,19 @@ def get_metadata(self) -> dict[str, Any]:
132167
"""Return benchmark defaults and metadata. Optional override."""
133168
return {}
134169

170+
@classmethod
171+
def data_requirements(cls) -> DataRequirement | None:
172+
"""Declare an external-data dependency that the harness can fetch.
173+
174+
Most benchmarks bundle their data inside the docker image and
175+
return ``None`` (the default). Benchmarks whose dataset is
176+
licensed independently of the harness (e.g. BEHAVIOR-1K)
177+
return a populated :class:`DataRequirement` so
178+
``vla-eval data fetch -c <config>`` can drive a uniform
179+
download flow.
180+
"""
181+
return None
182+
135183
def cleanup(self) -> None:
136184
"""Release benchmark resources (environments, renderers, etc.). Optional override."""
137185

src/vla_eval/benchmarks/behavior1k/benchmark.py

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
import numpy as np
3333
from anyio.to_thread import run_sync as _run_in_thread
3434

35-
from vla_eval.benchmarks.base import StepBenchmark, StepResult
35+
from vla_eval.benchmarks.base import DataRequirement, StepBenchmark, StepResult
3636
from vla_eval.specs import IMAGE_RGB, LANGUAGE, RAW, DimSpec
3737
from vla_eval.types import Action, EpisodeResult, Observation, Task
3838

@@ -213,6 +213,48 @@ def __init__(
213213
self._current_task_name: str | None = None
214214
self._available_tasks: dict[str, Any] | None = None
215215

216+
# ------------------------------------------------------------------
217+
# Data fetch
218+
# ------------------------------------------------------------------
219+
220+
@classmethod
221+
def data_requirements(cls) -> DataRequirement:
222+
"""Declare the BEHAVIOR Dataset / OmniGibson-asset download.
223+
224+
These are the three canonical helpers the upstream
225+
``OmniGibson`` README points users at; they're idempotent
226+
(skip when files already exist) so re-running ``data fetch``
227+
on a populated directory is cheap.
228+
"""
229+
download_script = (
230+
"from omnigibson.utils.asset_utils import ("
231+
"download_omnigibson_robot_assets, "
232+
"download_behavior_1k_assets, "
233+
"download_2025_challenge_task_instances); "
234+
"download_omnigibson_robot_assets(); "
235+
"download_behavior_1k_assets(accept_license=True); "
236+
"download_2025_challenge_task_instances()"
237+
)
238+
return DataRequirement(
239+
license_id="behavior-dataset-tos",
240+
license_url="https://behavior.stanford.edu/dataset",
241+
container_data_path="/app/BEHAVIOR-1K/datasets",
242+
# The 2025-challenge task instances are downloaded last,
243+
# so the directory's presence implies the prior two
244+
# download_* calls also completed.
245+
marker="2025-challenge-task-instances",
246+
download_command=(
247+
"conda",
248+
"run",
249+
"--no-capture-output",
250+
"-n",
251+
"behavior",
252+
"python",
253+
"-c",
254+
download_script,
255+
),
256+
)
257+
216258
# ------------------------------------------------------------------
217259
# Lazy initialization
218260
# ------------------------------------------------------------------

src/vla_eval/cli/cmd_data.py

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
"""``vla-eval data`` subcommand handlers.
2+
3+
Provides a uniform fetch flow for benchmarks whose dataset is licensed
4+
independently of the harness (e.g. BEHAVIOR-1K's BEHAVIOR Dataset
5+
ToS). See :class:`vla_eval.benchmarks.base.DataRequirement` and
6+
:meth:`vla_eval.benchmarks.base.Benchmark.data_requirements`.
7+
"""
8+
9+
from __future__ import annotations
10+
11+
import argparse
12+
import os
13+
import shutil
14+
import subprocess
15+
import sys
16+
from pathlib import Path
17+
18+
from vla_eval.benchmarks.base import Benchmark, DataRequirement
19+
from vla_eval.cli.config_loader import load_config as _load_config
20+
from vla_eval.config import DockerConfig
21+
from vla_eval.registry import resolve_import_string
22+
23+
24+
def _stderr_console(): # pragma: no cover — same shim cmd_run uses
25+
from rich.console import Console
26+
27+
return Console(stderr=True, soft_wrap=True)
28+
29+
30+
def _resolve_benchmark_class(config: dict) -> tuple[type[Benchmark], str]:
31+
"""Return ``(class, cache_subdir)`` for the first benchmark in config.
32+
33+
``cache_subdir`` is the module-path's last package segment, e.g.
34+
``vla_eval.benchmarks.behavior1k.benchmark:X`` → ``behavior1k``.
35+
"""
36+
benchmarks = config.get("benchmarks") or []
37+
if not benchmarks:
38+
raise ValueError("config has no 'benchmarks' entries")
39+
import_string = benchmarks[0].get("benchmark")
40+
if not import_string:
41+
raise ValueError("first benchmark entry is missing 'benchmark' import string")
42+
cls = resolve_import_string(import_string)
43+
if not (isinstance(cls, type) and issubclass(cls, Benchmark)):
44+
raise TypeError(f"resolved {import_string} to {cls!r}, which is not a Benchmark subclass")
45+
module_path = import_string.split(":", 1)[0]
46+
parts = module_path.split(".")
47+
# Expect …benchmarks.<key>.benchmark — take the second-to-last part.
48+
cache_subdir = parts[-2] if len(parts) >= 2 else parts[-1]
49+
return cls, cache_subdir
50+
51+
52+
def _default_host_data_dir(cache_subdir: str) -> Path:
53+
"""Return ``${VLA_EVAL_DATA_DIR}/<cache_subdir>`` or the XDG-style default."""
54+
base = os.environ.get("VLA_EVAL_DATA_DIR")
55+
if base:
56+
return Path(base).expanduser() / cache_subdir
57+
return Path.home() / ".cache" / "vla-eval" / cache_subdir
58+
59+
60+
def _build_docker_argv(
61+
image: str,
62+
docker_cfg: DockerConfig,
63+
host_dir: Path,
64+
requirement: DataRequirement,
65+
extra_gpus: str | None,
66+
) -> list[str]:
67+
"""Build the ``docker run`` argv that downloads the dataset."""
68+
argv: list[str] = ["docker", "run", "--rm"]
69+
gpus = extra_gpus or docker_cfg.gpus or "all"
70+
argv.extend(["--gpus", gpus])
71+
for env_pair in docker_cfg.env:
72+
argv.extend(["-e", env_pair])
73+
argv.extend(["-v", f"{host_dir}:{requirement.container_data_path}"])
74+
argv.append(image)
75+
argv.extend(requirement.download_command)
76+
return argv
77+
78+
79+
def cmd_data_fetch(args: argparse.Namespace) -> None:
80+
"""Fetch the external dataset for a benchmark, mounted at the
81+
canonical host cache directory."""
82+
con = _stderr_console()
83+
config = _load_config(args.config)
84+
85+
try:
86+
bench_cls, cache_subdir = _resolve_benchmark_class(config)
87+
except (TypeError, ValueError) as exc:
88+
con.print(f"[red]ERROR: {exc}[/red]")
89+
sys.exit(1)
90+
91+
requirement = bench_cls.data_requirements()
92+
if requirement is None:
93+
con.print(f"[yellow]{bench_cls.__name__} declares no external data requirement; nothing to fetch.[/yellow]")
94+
return
95+
96+
accepted = set(args.accept_license or [])
97+
if requirement.license_id not in accepted:
98+
con.print(
99+
f"[red]ERROR: this dataset requires accepting licence '{requirement.license_id}'.[/red]\n"
100+
f" Read: {requirement.license_url}\n"
101+
f" Re-run: vla-eval data fetch -c {args.config} --accept-license {requirement.license_id}"
102+
)
103+
sys.exit(1)
104+
105+
host_dir = Path(args.data_dir).expanduser().resolve() if args.data_dir else _default_host_data_dir(cache_subdir)
106+
host_dir.mkdir(parents=True, exist_ok=True)
107+
108+
marker = host_dir / requirement.marker
109+
if marker.exists() and not args.force:
110+
con.print(
111+
f"[green]Data already present at {host_dir} (marker: {requirement.marker}). "
112+
"Use --force to refetch.[/green]"
113+
)
114+
return
115+
116+
docker_cfg = DockerConfig.from_dict(config.get("docker"))
117+
if not docker_cfg.image:
118+
con.print("[red]ERROR: 'docker.image' must be set in the config to fetch data[/red]")
119+
sys.exit(1)
120+
if shutil.which("docker") is None:
121+
con.print("[red]ERROR: 'docker' not found on PATH[/red]")
122+
sys.exit(1)
123+
124+
argv = _build_docker_argv(
125+
docker_cfg.image,
126+
docker_cfg,
127+
host_dir,
128+
requirement,
129+
extra_gpus=getattr(args, "gpus", None),
130+
)
131+
132+
con.print(f"[bold]Fetching data → {host_dir}[/bold]")
133+
con.print(f" image: {docker_cfg.image}")
134+
con.print(f" mount: {host_dir}{requirement.container_data_path}")
135+
if args.dry_run:
136+
con.print(" [yellow]--dry-run[/yellow]: would run:")
137+
con.print(f" {' '.join(argv)}")
138+
return
139+
140+
completed = subprocess.run(argv, check=False)
141+
if completed.returncode != 0:
142+
con.print(f"[red]ERROR: docker run exited with {completed.returncode}[/red]")
143+
sys.exit(completed.returncode)
144+
con.print(f"[green]Done. Dataset available at {host_dir}.[/green]")
145+
146+
147+
def register(subparsers: argparse._SubParsersAction) -> None:
148+
"""Wire ``data fetch`` into the top-level ``vla-eval`` parser."""
149+
data_parser = subparsers.add_parser(
150+
"data",
151+
help="Manage external benchmark datasets",
152+
description=(
153+
"Fetch external datasets that aren't redistributable in the docker image. "
154+
"Each benchmark's data requirements are declared in its Benchmark class via "
155+
"data_requirements(); see vla_eval.benchmarks.base.DataRequirement."
156+
),
157+
)
158+
data_sub = data_parser.add_subparsers(dest="data_command", required=True)
159+
160+
fetch_parser = data_sub.add_parser(
161+
"fetch",
162+
help="Download a benchmark's external data into the local cache",
163+
description=(
164+
"Resolves the benchmark class from the config, runs its download command "
165+
"inside the benchmark's docker image with the host cache mounted "
166+
"read-write at the container's data path. Idempotent: skips if the "
167+
"marker file already exists."
168+
),
169+
)
170+
fetch_parser.add_argument("--config", "-c", required=True, help="Path to a benchmark eval config YAML.")
171+
fetch_parser.add_argument(
172+
"--accept-license",
173+
action="append",
174+
default=[],
175+
metavar="ID",
176+
help="License ID to opt into (e.g. 'behavior-dataset-tos'). Repeatable.",
177+
)
178+
fetch_parser.add_argument(
179+
"--data-dir",
180+
default=None,
181+
help="Override host data directory. Defaults to "
182+
"${VLA_EVAL_DATA_DIR}/<benchmark> or ~/.cache/vla-eval/<benchmark>.",
183+
)
184+
fetch_parser.add_argument(
185+
"--gpus",
186+
default=None,
187+
help="GPU devices for the fetch container (e.g. '0,1'). Defaults to docker.gpus or 'all'.",
188+
)
189+
fetch_parser.add_argument(
190+
"--force",
191+
action="store_true",
192+
help="Re-run the download even if the marker file is already present.",
193+
)
194+
fetch_parser.add_argument(
195+
"--dry-run",
196+
action="store_true",
197+
help="Print the docker command that would run and exit.",
198+
)
199+
fetch_parser.set_defaults(func=cmd_data_fetch)

0 commit comments

Comments
 (0)