Skip to content

Commit b52f83f

Browse files
committed
docs: reflow comments and docstrings to project line-length
Sweep across the surface PR #58 touches: comments and docstrings that were wrapping at ~62-72 chars get reflowed to fit the project's 119 line-length, and a small set of pure-narration banner comments are removed. Reflowed (no information loss, just wider lines): - ``benchmarks/behavior1k/benchmark.py`` — module docstring, class docstring + ``task_instance_id`` Args block, ``_load_task_instance`` docstring, signal-handler / cleanup / async-bridge / start_episode comment blocks. - ``model_servers/behavior1k_demo_replay.py`` — module docstring, parquet ``columns=`` rationale, per-(session, episode) cursor note. - ``model_servers/behavior1k_baseline.py`` — module docstring. - ``model_servers/vlanext.py`` — ``_ensure_vlanext`` docstring, shallow-clone rationale, class docstring. - ``model_servers/mme_vla.py`` — module-level shallow-clone rationale, class docstring + Args block. - ``docker_resources.py`` — ``tty_docker_flags`` summary, the OMP/MKL block. Removed (clearly noise, not navigation aids): - ``cli/main.py`` `# CLI override for ...` / `# Validate shard args` / `# Resource allocation` / `# Decide whether to run via Docker` banners are kept (they aid navigation in long ``cmd_run``). The truly redundant ones — `# Extra volumes from config`, `# Extra env vars`, `# 1. CWD (running from repo root)` — go. - ``cli/_docker.py`` module docstring's stale ``cmd_data`` reference.
1 parent 6f9cb37 commit b52f83f

8 files changed

Lines changed: 116 additions & 159 deletions

File tree

src/vla_eval/benchmarks/behavior1k/benchmark.py

Lines changed: 64 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
"""BEHAVIOR-1K benchmark implementation.
22
3-
BEHAVIOR-1K is a long-horizon household-activity benchmark built on
4-
OmniGibson (NVIDIA Isaac Sim). The 2025 BEHAVIOR Challenge defines a
5-
50-task evaluation suite (B10/B20/B30/B40/B50) using the R1Pro
6-
mobile-manipulation robot.
3+
BEHAVIOR-1K is a long-horizon household-activity benchmark built on OmniGibson (NVIDIA Isaac Sim).
4+
The 2025 BEHAVIOR Challenge defines a 50-task evaluation suite (B10/B20/B30/B40/B50) using the
5+
R1Pro mobile-manipulation robot.
76
87
References:
98
- https://behavior.stanford.edu
@@ -16,9 +15,8 @@
1615
base[0:3], torso[3:7], left_arm[7:14], left_gripper[14:15],
1716
right_arm[15:22], right_gripper[22:23].
1817
- Cameras: head 720x720, left_wrist 480x480, right_wrist 480x480.
19-
- Success: ``info["done"]["success"]`` (binary); the challenge
20-
separately reports a partial Q-score, but we only surface the
21-
binary flag here — partial scoring lives in the official
18+
- Success: ``info["done"]["success"]`` (binary); the challenge separately reports a partial
19+
Q-score, but we only surface the binary flag here — partial scoring lives in the official
2220
``score_utils.compute_final_q_score``.
2321
- Max steps default: 5000 (or 2× human demo length when known).
2422
"""
@@ -126,54 +124,43 @@ class Behavior1KBenchmark(StepBenchmark):
126124
"""BEHAVIOR-1K (OmniGibson) household-activity benchmark.
127125
128126
Non-obvious behaviors:
129-
- **Heavy lazy imports**: ``omnigibson`` and Isaac Sim are imported
130-
inside ``_init_og()`` rather than at module top. Importing
131-
OmniGibson boots the Isaac Sim runtime and consumes several
132-
gigabytes of VRAM, so we delay it until ``get_tasks()`` /
133-
``reset()`` actually need it. This also keeps
134-
``vla-eval test --validate`` (a pure import-string check) fast.
135-
- **Action format**: ``env.step()`` expects a ``torch.Tensor``,
136-
not numpy. We convert in ``step()``.
127+
- **Heavy lazy imports**: ``omnigibson`` and Isaac Sim are imported inside ``_init_og()``
128+
rather than at module top. Importing OmniGibson boots the Isaac Sim runtime and consumes
129+
several gigabytes of VRAM, so we delay until ``get_tasks()`` / ``reset()`` actually need
130+
it. Also keeps ``vla-eval test --validate`` (a pure import-string check) fast.
131+
- **Action format**: ``env.step()`` expects a ``torch.Tensor``, not numpy. Converted in
132+
``step()``.
137133
- **Observation flattening**: OmniGibson's nested observation
138-
(``obs["robot_r1"]["sensors"]["zed"]["rgb"]``) is flattened with
139-
a ``::`` delimiter via the official ``flatten_obs_dict`` helper.
140-
We then look up cameras by their canonical sensor key.
141-
- **Task description**: BehaviorTask does not expose a natural
142-
language instruction; we use the snake-case task name with
143-
underscores replaced by spaces, matching common VLA practice.
144-
- **Single robot supported**: R1Pro only (the BEHAVIOR Challenge
145-
2025 standard track). A1 is reachable through OmniGibson but
146-
not exercised here.
134+
(``obs["robot_r1"]["sensors"]["zed"]["rgb"]``) is flattened with a ``::`` delimiter via
135+
the official ``flatten_obs_dict`` helper. We then look up cameras by their canonical
136+
sensor key.
137+
- **Task description**: BehaviorTask does not expose a natural language instruction; we use
138+
the snake-case task name with underscores replaced by spaces, matching common VLA practice.
139+
- **Single robot supported**: R1Pro only (the BEHAVIOR Challenge 2025 standard track). A1
140+
is reachable through OmniGibson but not exercised here.
147141
148142
Args:
149143
tasks: Subset of B50 task names to evaluate. ``None`` runs all 50.
150-
partial_scene_load: Pass through to OmniGibson — load only rooms
151-
relevant to the task to speed up scene construction.
152-
max_steps: Per-episode step cap. ``None`` keeps OmniGibson's
153-
default (5000 in ``generate_basic_environment_config``).
154-
send_proprio: Include the R1Pro proprio vector
155-
(``robot_r1::proprio``, 256-D) in observations.
156-
camera_names: Which cameras to forward to the model server.
157-
Defaults to all three (``head``, ``left_wrist``, ``right_wrist``).
158-
env_wrapper_target: Hydra ``_target_`` for the env wrapper. By
159-
default we use OmniGibson's ``EnvironmentWrapper`` no-op
160-
wrapper; override to plug in challenge-specific behaviour.
161-
task_instance_id: Per-instance TRO state(s) to load after
162-
``env.reset()``, mirroring the official
163-
``Evaluator.load_task_instance``. Without this the env
164-
starts from BehaviorTask's default instance (idx 0); with
165-
it set, the cached
166-
``<scene>_task_<activity>_instances/<...>-tro_state.json``
167-
is applied so the initial object placement matches the
168-
recorded demos. Required for demo-replay reproductions.
144+
partial_scene_load: Pass through to OmniGibson — load only rooms relevant to the task to
145+
speed up scene construction.
146+
max_steps: Per-episode step cap. ``None`` keeps OmniGibson's default (5000 in
147+
``generate_basic_environment_config``).
148+
send_proprio: Include the R1Pro proprio vector (``robot_r1::proprio``, 256-D) in observations.
149+
camera_names: Which cameras to forward to the model server. Defaults to all three
150+
(``head``, ``left_wrist``, ``right_wrist``).
151+
env_wrapper_target: Hydra ``_target_`` for the env wrapper. By default we use OmniGibson's
152+
``EnvironmentWrapper`` no-op wrapper; override to plug in challenge-specific behaviour.
153+
task_instance_id: Per-instance TRO state(s) to load after ``env.reset()``, mirroring the
154+
official ``Evaluator.load_task_instance``. Without this the env starts from
155+
BehaviorTask's default instance (idx 0); with it set, the cached
156+
``<scene>_task_<activity>_instances/<...>-tro_state.json`` is applied so the initial
157+
object placement matches the recorded demos. Required for demo-replay reproductions.
169158
170159
Accepts:
171-
- ``None`` — use BehaviorTask's default instance every
172-
episode (no TRO state load).
160+
- ``None`` — use BehaviorTask's default instance every episode (no TRO state load).
173161
- ``int`` — fix the same instance for every episode.
174-
- ``list[int]`` — sweep instances; episode ``i`` uses
175-
``ids[i % len(ids)]``. Use this to reproduce the
176-
challenge protocol (50 tasks × 10 instances).
162+
- ``list[int]`` — sweep instances; episode ``i`` uses ``ids[i % len(ids)]``. Use
163+
this to reproduce the challenge protocol (50 tasks × 10 instances).
177164
"""
178165

179166
def __init__(
@@ -200,8 +187,7 @@ def __init__(
200187
if unknown_cams:
201188
raise ValueError(f"Unknown R1Pro cameras: {unknown_cams}. Valid: {list(R1PRO_CAMERAS)}")
202189
self._env_wrapper_target = env_wrapper_target
203-
# Normalize int|list|None to list[int]|None so the reset() path
204-
# can index by ``episode_idx`` uniformly.
190+
# Normalize int|list|None to list[int]|None so reset() can index by ``episode_idx`` uniformly.
205191
if task_instance_id is None:
206192
self._task_instance_ids: list[int] | None = None
207193
elif isinstance(task_instance_id, int):
@@ -272,12 +258,10 @@ def _ensure_assets(self, data_path: Path) -> None:
272258

273259
def _make_env(self, task_name: str) -> Any:
274260
"""Build a fresh OmniGibson env for *task_name*."""
275-
# Isaac Sim's SimulationApp.__init__ calls signal.signal(SIGINT, ...)
276-
# which raises ValueError when invoked from a non-main thread —
277-
# but we *must* off-load env construction to a worker so the
278-
# orchestrator's asyncio loop survives. The handler installed
279-
# at our main-thread import of omnigibson is already in place,
280-
# so it's safe to no-op the additional registration here.
261+
# Isaac Sim's SimulationApp.__init__ calls signal.signal(SIGINT, ...) which raises ValueError
262+
# when invoked from a non-main thread — but we *must* off-load env construction to a worker
263+
# so the orchestrator's asyncio loop survives. The handler installed at our main-thread
264+
# import of omnigibson is already in place, so it's safe to no-op the additional registration.
281265
import signal as _signal
282266
import threading
283267

@@ -307,8 +291,7 @@ def _make_env_inner(self, task_name: str) -> Any:
307291
generate_basic_environment_config,
308292
)
309293

310-
# The official eval disables a curated set of transition rules to
311-
# match the data-collection setup.
294+
# The official eval disables a curated set of transition rules to match the data-collection setup.
312295
for rule in DISABLED_TRANSITION_RULES:
313296
rule.ENABLED = False
314297

@@ -355,13 +338,11 @@ def reset(self, task: Task) -> Any:
355338
self._env = self._make_env(task_name)
356339
self._current_task_name = task_name
357340
obs, _ = self._env.reset()
358-
# Optional per-instance TRO state load (matches official
359-
# ``Evaluator.load_task_instance``). When unset, BehaviorTask
360-
# uses its default instance (idx 0) — the env still runs, but
361-
# object placements may diverge from a particular demo.
362-
# When a list is provided, sweep instances by ``episode_idx``
363-
# so consecutive episodes hit different recorded states (the
364-
# 50 task × 10 instance challenge protocol).
341+
# Optional per-instance TRO state load (matches official ``Evaluator.load_task_instance``).
342+
# When unset, BehaviorTask uses its default instance (idx 0) — the env still runs, but object
343+
# placements may diverge from a particular demo. When a list is provided, sweep instances by
344+
# ``episode_idx`` so consecutive episodes hit different recorded states (the 50-task ×
345+
# 10-instance challenge protocol).
365346
if self._task_instance_ids is not None:
366347
episode_idx = int(task.get("episode_idx", 0))
367348
instance_id = self._task_instance_ids[episode_idx % len(self._task_instance_ids)]
@@ -371,13 +352,12 @@ def reset(self, task: Task) -> Any:
371352
def _load_task_instance(self, instance_id: int) -> Any:
372353
"""Apply per-instance object/robot state JSON, then re-fetch obs.
373354
374-
Ports the v3.7.2 ``Evaluator.load_task_instance`` (public-test
375-
branch). Reads
355+
Ports the v3.7.2 ``Evaluator.load_task_instance`` (public-test branch). Reads
376356
``<get_task_instance_path(scene)>/json/<scene>_task_<activity>_instances/<...>-tro_state.json``
377357
and pushes the recorded object/robot state into the running env.
378358
379-
Compatible only with the v3.7.2 OmniGibson API: uses
380-
``robot.model_name``, ``entity.is_system`` / ``entity.exists``.
359+
Compatible only with the v3.7.2 OmniGibson API: uses ``robot.model_name``,
360+
``entity.is_system`` / ``entity.exists``.
381361
"""
382362
import json
383363
import os
@@ -429,8 +409,8 @@ def _load_task_instance(self, instance_id: int) -> Any:
429409
env.scene.update_initial_file()
430410
env.scene.reset()
431411

432-
# Re-fetch the observation after the state load so the model
433-
# server sees the post-load images / proprio.
412+
# Re-fetch the observation after the state load so the model server sees the post-load
413+
# images / proprio.
434414
obs, _ = env.get_obs()
435415
return obs
436416

@@ -505,31 +485,24 @@ def cleanup(self) -> None:
505485
except Exception:
506486
logger.exception("BEHAVIOR-1K env close failed")
507487
self._env = None
508-
# Intentionally NOT calling ``omnigibson.shutdown()`` here:
509-
# Isaac Sim's shutdown path can hang for many minutes
510-
# (waiting on hydra texture cleanup, render contexts, etc.)
511-
# which prevents the orchestrator from writing the result JSON
512-
# at the end of the run. Process exit reclaims everything;
513-
# leaving Isaac Sim alone is the lesser evil.
488+
# Intentionally NOT calling ``omnigibson.shutdown()`` here: Isaac Sim's shutdown path can hang
489+
# for many minutes (waiting on hydra texture cleanup, render contexts, etc.) which prevents
490+
# the orchestrator from writing the result JSON at the end of the run. Process exit reclaims
491+
# everything; leaving Isaac Sim alone is the lesser evil.
514492

515-
# ------------------------------------------------------------------
516-
# Async bridge override: run sync reset()/step() on a worker thread.
517-
# Booting Isaac Sim from the orchestrator's main thread tears down
518-
# the running asyncio event loop (SimulationApp installs its own),
519-
# which makes the next `await conn.act(...)` raise NoEventLoopError.
520-
# Off-loading to ``anyio.to_thread.run_sync`` keeps the orchestrator
521-
# loop intact while Isaac Sim does its synchronous work.
522-
# ------------------------------------------------------------------
493+
# Async bridge override: run sync reset()/step() on a worker thread. Booting Isaac Sim from the
494+
# orchestrator's main thread tears down the running asyncio event loop (SimulationApp installs
495+
# its own), which makes the next ``await conn.act(...)`` raise NoEventLoopError. Off-loading
496+
# to ``anyio.to_thread.run_sync`` keeps the orchestrator loop intact while Isaac Sim does its
497+
# synchronous work.
523498

524499
async def start_episode(self, task: Task) -> None:
525500
self._t0 = time.monotonic()
526501
self._task = task
527-
# Run imports + signal-handler registration on the main thread
528-
# (Python's signal module forbids setting handlers from a worker
529-
# thread, and OmniGibson registers SIGINT during its top-level
530-
# ``__init__.py``). Only the env construction / reset itself is
531-
# offloaded to the worker thread, which is what actually trashes
532-
# the asyncio event loop.
502+
# Run imports + signal-handler registration on the main thread (Python's signal module forbids
503+
# setting handlers from a worker thread, and OmniGibson registers SIGINT during its top-level
504+
# ``__init__.py``). Only the env construction / reset itself is offloaded to the worker
505+
# thread, which is what actually trashes the asyncio event loop.
533506
self._init_og()
534507
raw_obs = await _run_in_thread(self.reset, task)
535508
self._last_result = StepResult(obs=raw_obs, reward=0.0, done=False, info={})

src/vla_eval/cli/_docker.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Docker subprocess helpers shared by ``cmd_run`` and ``cmd_data``."""
1+
"""Docker subprocess helpers."""
22

33
from __future__ import annotations
44

src/vla_eval/cli/main.py

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ def _setup_logging(verbose: bool = False) -> None:
3434

3535

3636
def _inside_docker() -> bool:
37-
"""Check if we are already running inside a Docker container."""
3837
return Path("/.dockerenv").exists()
3938

4039

@@ -86,11 +85,10 @@ def _handle_signal(signum: int, _frame: object) -> None:
8685

8786
def _resolve_dev_src() -> Path:
8887
"""Find the host ``src/`` directory for ``--dev`` bind-mount."""
89-
# 1. CWD (running from repo root)
9088
cwd_src = Path.cwd() / "src"
9189
if (cwd_src / "vla_eval").is_dir():
9290
return cwd_src.resolve()
93-
# 2. Editable install: __file__ lives under src/vla_eval/
91+
# Editable install: ``vla_eval.__file__`` lives under ``src/vla_eval/``.
9492
import vla_eval
9593

9694
pkg_parent = Path(vla_eval.__file__).resolve().parent.parent
@@ -132,8 +130,7 @@ def _run_via_docker(
132130
results_dir = str(Path(config.get("output_dir", "./results")).resolve())
133131
Path(results_dir).mkdir(parents=True, exist_ok=True)
134132

135-
# Rewrite config for Docker: output_dir must point to the container-side mount,
136-
# not the host absolute path which doesn't exist inside the container.
133+
# output_dir must point to the container mount; the host absolute path doesn't exist inside.
137134
import tempfile
138135

139136
docker_config = dict(config)
@@ -160,25 +157,22 @@ def _run_via_docker(
160157
]
161158
# fmt: on
162159

163-
# Attach stdin (and optionally a TTY) so licence prompts inside the container can reach the host.
160+
# Forward stdin/TTY for in-container licence prompts.
164161
cmd.extend(tty_docker_flags())
165162

166-
# Dev mode: mount host src/ into container (requires editable install in image)
163+
# Dev mode: mount host src/ into container (requires editable install in image).
167164
if dev:
168165
src_dir = _resolve_dev_src()
169166
cmd.extend(["-v", f"{src_dir}:/workspace/src"])
170167
logger.info("Dev mode: mounting %s -> /workspace/src", src_dir)
171168

172-
# Extra volumes from config
169+
# Extra volumes / env vars from config
173170
for vol in docker_cfg.volumes:
174171
cmd.extend(["-v", vol])
175-
176-
# Extra env vars
177172
for env_str in docker_cfg.env:
178173
cmd.extend(["-e", env_str])
179174

180-
# Forward licence acceptance into the container so benchmarks calling
181-
# ``vla_eval.dirs.ensure_license`` can skip the stdin prompt.
175+
# Forward licence acceptance into the container so ``ensure_license`` can skip the prompt.
182176
if accept_license:
183177
cmd.extend(["-e", f"VLA_EVAL_ACCEPTED_LICENSES={','.join(accept_license)}"])
184178

src/vla_eval/docker_resources.py

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -79,10 +79,10 @@ def gpu_docker_flag(spec: str | None) -> list[str]:
7979

8080

8181
def tty_docker_flags() -> list[str]:
82-
"""``-i``/``-t`` flags so the in-container process can read the host's terminal.
82+
"""``-i`` / ``-t`` flags so an in-container process can read the host's terminal.
8383
8484
Both attached when stdin and stdout are TTYs; ``-i`` only when just stdin is; nothing otherwise.
85-
Lets ``ensure_license``-style stdin prompts reach the user without breaking CI/sharded runs.
85+
Lets ``ensure_license``-style stdin prompts reach the user without breaking CI / sharded runs.
8686
"""
8787
import sys
8888

@@ -128,14 +128,12 @@ def shard_docker_flags(
128128
shard_cpus = cpu_ids[start_idx : start_idx + per_shard]
129129
flags.extend(["--cpuset-cpus", _format_cpuset(shard_cpus)])
130130

131-
# OpenMP/MKL: force single-threaded to avoid cross-container contention.
132-
# Some benchmark images (e.g. CALVIN) ship CPU-only PyTorch that runs
133-
# per-step tensor ops (torchvision transforms, tensor creation). Without
134-
# this cap each container spawns one OpenMP thread per visible core,
135-
# causing massive context-switch overhead when multiple shards share a
136-
# host (e.g. 8 shards × 48 threads = 384 threads on 48 cores → no
137-
# scaling). Single-image transforms see no benefit from multi-threaded
138-
# BLAS/OpenMP, so OMP_NUM_THREADS=1 is always safe here.
131+
# OpenMP/MKL: force single-threaded to avoid cross-container contention. Some benchmark images
132+
# (e.g. CALVIN) ship CPU-only PyTorch that runs per-step tensor ops (torchvision transforms, tensor
133+
# creation). Without this cap each container spawns one OpenMP thread per visible core, causing
134+
# massive context-switch overhead when multiple shards share a host (e.g. 8 shards × 48 threads =
135+
# 384 threads on 48 cores → no scaling). Single-image transforms see no benefit from
136+
# multi-threaded BLAS/OpenMP, so OMP_NUM_THREADS=1 is always safe here.
139137
flags.extend(["-e", "OMP_NUM_THREADS=1", "-e", "MKL_NUM_THREADS=1"])
140138

141139
return flags

0 commit comments

Comments
 (0)