Skip to content

Commit 295b41c

Browse files
MiaoDXclaude
andcommitted
feat(eval): implement LeRobot evaluation plugin with CI gating
- Add PolicyAdapter protocol and LeRobotPolicyAdapter for checkpoint loading - Extract shared LeRobot env utilities into lerobot_env.py - Add evaluate_lerobot_policy() entry point with action shape validation - Update examples/lerobot_eval_harness.py with --checkpoint-path and --repo-id CLI - Add tests for policy adapter, env creation, eval plugin, and harness example - Update README with LeRobot CI evaluation section - Add torch/lerobot to mypy ignore_missing_imports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent ea8cc6b commit 295b41c

14 files changed

Lines changed: 1134 additions & 219 deletions

README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,29 @@ Uses LeRobot's official `make_env("lerobot/unitree-g1-mujoco")` factory for stan
121121

122122
</details>
123123

124+
<details>
125+
<summary><b>LeRobot Evaluation in CI</b></summary>
126+
127+
```bash
128+
pip install roboharness[lerobot]
129+
130+
# Evaluate a real LeRobot checkpoint with visual checkpoints + JSON report
131+
python examples/lerobot_eval_harness.py \
132+
--checkpoint-path /path/to/lerobot/checkpoint \
133+
--repo-id lerobot/unitree-g1-mujoco \
134+
--n-episodes 5 \
135+
--checkpoint-steps 10 50 100 \
136+
--assert-threshold \
137+
--min-success-rate 0.8
138+
```
139+
140+
Produces:
141+
- `episode_000/step_0010/default_rgb.png` — checkpoint screenshots
142+
- `lerobot_eval_report.json` — structured per-episode stats
143+
- CI exit code 1 when thresholds are not met
144+
145+
</details>
146+
124147
<a id="sonic-planner"></a>
125148
<details>
126149
<summary><b>SONIC Planner</b></summary>

docs/roadmap-2026-q2.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,8 @@ Directions are split into "do now" and "do later." "Do now" means the next avail
5656

5757
**Exit criteria:** A LeRobot user can run one command to get visual regression testing in CI.
5858

59+
**Status:** Complete. `examples/lerobot_eval_harness.py --checkpoint-path <path> --repo-id <repo>` loads real LeRobot policies, captures checkpoint screenshots, produces `lerobot_eval_report.json`, and supports `--assert-threshold` for CI pass/fail gates.
60+
5961
**Related issues:** New issue needed. Extends #83 (native LeRobot integration).
6062

6163
### B. Constraint Evaluator

examples/lerobot_eval_harness.py

Lines changed: 62 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,12 @@
1414
Run (standalone — CartPole demo):
1515
python examples/lerobot_eval_harness.py --env CartPole-v1 --n-episodes 5
1616
17+
Run (with real LeRobot checkpoint):
18+
python examples/lerobot_eval_harness.py \
19+
--checkpoint-path /path/to/lerobot/checkpoint \
20+
--repo-id lerobot/unitree-g1-mujoco \
21+
--n-episodes 5
22+
1723
Run (with success threshold — CI gate):
1824
python examples/lerobot_eval_harness.py --env CartPole-v1 --n-episodes 10 \
1925
--min-success-rate 0.0 --assert-threshold
@@ -38,6 +44,7 @@
3844
from roboharness.evaluate.lerobot_plugin import (
3945
LeRobotEvalConfig,
4046
check_eval_threshold,
47+
evaluate_lerobot_policy,
4148
evaluate_policy,
4249
)
4350

@@ -83,27 +90,24 @@ def main() -> None:
8390
action="store_true",
8491
help="Exit non-zero if thresholds are not met (CI mode)",
8592
)
93+
parser.add_argument(
94+
"--checkpoint-path",
95+
type=str,
96+
default=None,
97+
help="Path to a LeRobot policy checkpoint directory",
98+
)
99+
parser.add_argument(
100+
"--repo-id",
101+
type=str,
102+
default=None,
103+
help="HuggingFace repo ID for the LeRobot environment (inferred if omitted)",
104+
)
86105
args = parser.parse_args()
87106

88107
print("=" * 60)
89108
print(" Roboharness: LeRobot Evaluation Harness")
90109
print("=" * 60)
91110

92-
# 1. Create environment
93-
print(f"\n[1/3] Creating environment: {args.env}")
94-
try:
95-
import gymnasium as gym
96-
97-
env = gym.make(args.env, render_mode="rgb_array")
98-
except ImportError:
99-
print("ERROR: gymnasium is required. Install with: pip install roboharness[demo]")
100-
sys.exit(1)
101-
102-
print(f" Obs space: {env.observation_space}")
103-
print(f" Act space: {env.action_space}")
104-
105-
# 2. Run evaluation
106-
print(f"[2/3] Evaluating ({args.n_episodes} episodes, max {args.max_steps} steps each) ...")
107111
output_dir = Path(args.output_dir) / "lerobot_eval"
108112

109113
config = LeRobotEvalConfig(
@@ -113,14 +117,43 @@ def main() -> None:
113117
output_dir=str(output_dir),
114118
)
115119

116-
# Use random policy as fallback
117-
action_space = env.action_space
120+
# 1. Create environment / load policy
121+
if args.checkpoint_path:
122+
print(f"\n[1/3] Loading LeRobot policy from: {args.checkpoint_path}")
123+
if not Path(args.checkpoint_path).exists():
124+
print(f"ERROR: Checkpoint path does not exist: {args.checkpoint_path}")
125+
sys.exit(1)
126+
127+
# 2. Run evaluation with real LeRobot policy
128+
print(f"[2/3] Evaluating ({args.n_episodes} episodes, max {args.max_steps} steps each) ...")
129+
report = evaluate_lerobot_policy(
130+
checkpoint_path=args.checkpoint_path,
131+
repo_id=args.repo_id,
132+
config=config,
133+
)
134+
else:
135+
print(f"\n[1/3] Creating environment: {args.env}")
136+
try:
137+
import gymnasium as gym
138+
139+
env = gym.make(args.env, render_mode="rgb_array")
140+
except ImportError:
141+
print("ERROR: gymnasium is required. Install with: pip install roboharness[demo]")
142+
sys.exit(1)
143+
144+
print(f" Obs space: {env.observation_space}")
145+
print(f" Act space: {env.action_space}")
118146

119-
def policy_fn(obs: np.ndarray) -> np.ndarray:
120-
return _random_policy(obs, action_space)
147+
# Use random policy as fallback
148+
action_space = env.action_space
121149

122-
report = evaluate_policy(env, policy_fn, config)
123-
env.close()
150+
def policy_fn(obs: np.ndarray) -> np.ndarray:
151+
return _random_policy(obs, action_space)
152+
153+
# 2. Run evaluation
154+
print(f"[2/3] Evaluating ({args.n_episodes} episodes, max {args.max_steps} steps each) ...")
155+
report = evaluate_policy(env, policy_fn, config)
156+
env.close()
124157

125158
# 3. Report results
126159
print("[3/3] Results:")
@@ -144,6 +177,14 @@ def policy_fn(obs: np.ndarray) -> np.ndarray:
144177
f" length={ep.episode_length:4d}"
145178
)
146179

180+
if not args.checkpoint_path:
181+
print(
182+
"\n Tip: pass --checkpoint-path to evaluate a real LeRobot policy:\n"
183+
" python examples/lerobot_eval_harness.py \\\n"
184+
" --checkpoint-path /path/to/lerobot/checkpoint \\\n"
185+
" --repo-id lerobot/unitree-g1-mujoco"
186+
)
187+
147188
# 4. CI gate
148189
if args.assert_threshold:
149190
passed = check_eval_threshold(

examples/lerobot_g1_native.py

Lines changed: 2 additions & 197 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,11 @@
3838
from pathlib import Path
3939
from typing import Any
4040

41-
import gymnasium as gym # noqa: TC002 — used at runtime inside create_native_env
4241
import numpy as np
4342

4443
from roboharness.core.protocol import TaskPhase, TaskProtocol
45-
from roboharness.wrappers import RobotHarnessWrapper, VectorEnvAdapter
44+
from roboharness.evaluate.lerobot_env import create_native_env
45+
from roboharness.wrappers import RobotHarnessWrapper
4646

4747
# ---------------------------------------------------------------------------
4848
# Constants
@@ -65,201 +65,6 @@
6565
}
6666

6767

68-
# ---------------------------------------------------------------------------
69-
# Environment creation via make_env()
70-
# ---------------------------------------------------------------------------
71-
72-
73-
def _patch_config_for_headless(env_id: str) -> None:
74-
"""Patch the HuggingFace-cached config.yaml for headless (CI) rendering.
75-
76-
The lerobot/unitree-g1-mujoco env.py loads config.yaml at import time.
77-
The default config has ``ENABLE_ONSCREEN: true`` which requires GLFW/display.
78-
For headless environments (MUJOCO_GL=osmesa, no DISPLAY), we disable onscreen
79-
rendering so the simulator uses offscreen-only mode.
80-
"""
81-
import os
82-
83-
has_display = bool(os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY"))
84-
if has_display:
85-
return # Display available, no patching needed
86-
87-
try:
88-
from huggingface_hub import snapshot_download
89-
90-
repo_dir = Path(snapshot_download(env_id, repo_type="model"))
91-
except Exception:
92-
return # Can't patch, let make_env handle errors
93-
94-
config_path = repo_dir / "config.yaml"
95-
if not config_path.exists():
96-
return
97-
98-
import yaml
99-
100-
config = yaml.safe_load(config_path.read_text())
101-
if config.get("ENABLE_ONSCREEN") is True:
102-
config["ENABLE_ONSCREEN"] = False
103-
config["ENABLE_OFFSCREEN"] = True
104-
config_path.write_text(yaml.dump(config, default_flow_style=False))
105-
print(" Patched config.yaml: ENABLE_ONSCREEN=false (headless mode)")
106-
107-
108-
def create_native_env(
109-
env_id: str = LEROBOT_ENV_ID,
110-
*,
111-
n_envs: int = 1,
112-
) -> gym.Env:
113-
"""Create a LeRobot environment, preferring the official ``make_env()`` factory.
114-
115-
Strategy (in order):
116-
1. Try LeRobot's ``make_env()`` — wraps the hub env in ``SyncVectorEnv``.
117-
We unwrap the batch dimension via ``VectorEnvAdapter`` so downstream
118-
wrappers see a standard single-env interface.
119-
2. Fall back to importing the hub's ``env.py`` directly (works without
120-
the full LeRobot install; avoids the ``SyncVectorEnv`` obs-space
121-
mismatch that the upstream env has).
122-
"""
123-
try:
124-
from huggingface_hub import snapshot_download # noqa: F401 — used below
125-
except ImportError:
126-
print(
127-
"ERROR: huggingface_hub is required for native integration.\n"
128-
"Install with: pip install roboharness[demo,unitree] lerobot"
129-
)
130-
sys.exit(1)
131-
132-
# Patch config for headless CI environments before importing env module
133-
_patch_config_for_headless(env_id)
134-
135-
env = _try_lerobot_make_env(env_id, n_envs=n_envs)
136-
if env is None:
137-
env = _fallback_hub_make_env(env_id, n_envs=n_envs)
138-
139-
# Add MuJoCo rendering capability — the hub env has a MuJoCo model but
140-
# doesn't expose render_camera(), so the wrapper can't capture screenshots.
141-
_add_mujoco_rendering(env)
142-
143-
print(f" Env type: {type(env).__name__}")
144-
print(f" Obs space (declared): {env.observation_space}")
145-
print(f" Act space: {env.action_space}")
146-
147-
return env
148-
149-
150-
def _try_lerobot_make_env(env_id: str, *, n_envs: int = 1) -> gym.Env | None:
151-
"""Try creating the env via LeRobot's official ``make_env()`` factory.
152-
153-
Returns a ``VectorEnvAdapter``-wrapped env on success, or ``None`` if
154-
LeRobot is not installed or ``make_env()`` fails.
155-
"""
156-
try:
157-
from lerobot.common.envs.factory import make_env # type: ignore[import-untyped]
158-
except ImportError:
159-
print(" LeRobot not installed — falling back to hub env import")
160-
return None
161-
162-
try:
163-
vec_env = make_env(env_id, n_envs=n_envs)
164-
except Exception as exc:
165-
print(f" LeRobot make_env() failed ({exc}) — falling back to hub env import")
166-
return None
167-
168-
# make_env() wraps in SyncVectorEnv; adapt to standard gym.Env.
169-
env = VectorEnvAdapter(vec_env)
170-
print(" Created via LeRobot make_env() + VectorEnvAdapter")
171-
return env
172-
173-
174-
def _fallback_hub_make_env(env_id: str, *, n_envs: int = 1) -> gym.Env:
175-
"""Import the hub's ``env.py`` directly (no LeRobot dependency)."""
176-
from huggingface_hub import snapshot_download
177-
178-
repo_dir = Path(snapshot_download(env_id, repo_type="model"))
179-
sys.path.insert(0, str(repo_dir))
180-
try:
181-
from env import make_env as hub_make_env # type: ignore[import-not-found]
182-
except ImportError as e:
183-
print(f"ERROR: Failed to import hub env module: {e}")
184-
sys.exit(1)
185-
186-
env = hub_make_env(n_envs=n_envs)
187-
188-
# Obs-space shape mismatch (upstream declares (97,) but returns (100,) due to
189-
# floating_base_acc being 6-D not 3-D) is handled automatically by
190-
# RobotHarnessWrapper(auto_fix_obs_space=True). See issue #110.
191-
192-
print(" Created via direct hub env import (fallback)")
193-
return env
194-
195-
196-
def _add_mujoco_rendering(
197-
env: gym.Env,
198-
width: int = 640,
199-
height: int = 480,
200-
) -> None:
201-
"""Patch the env to support render_camera() using MuJoCo's renderer.
202-
203-
The hub env has a MuJoCo model/data underneath but doesn't expose camera
204-
rendering. We find the model/data, create a mujoco.Renderer, and add
205-
render_camera() + cameras property so RobotHarnessWrapper can capture
206-
multi-view screenshots.
207-
"""
208-
import mujoco
209-
210-
unwrapped = getattr(env, "unwrapped", env)
211-
212-
# Find the MuJoCo model and data on the env (attribute names vary by env)
213-
# Search the unwrapped env and one level deeper (e.g. env.sim_env.mj_model
214-
# for the lerobot/unitree-g1-mujoco hub env).
215-
model = None
216-
data = None
217-
search_targets = [unwrapped]
218-
for nested in ("sim_env", "simulator", "sim"):
219-
obj = getattr(unwrapped, nested, None)
220-
if obj is not None:
221-
search_targets.append(obj)
222-
223-
for target in search_targets:
224-
for attr in ("model", "_model", "mj_model"):
225-
candidate = getattr(target, attr, None)
226-
if candidate is not None and hasattr(candidate, "ncam"):
227-
model = candidate
228-
break
229-
if model is not None:
230-
break
231-
232-
for target in search_targets:
233-
for attr in ("data", "_data", "mj_data"):
234-
candidate = getattr(target, attr, None)
235-
if candidate is not None and hasattr(candidate, "qpos"):
236-
data = candidate
237-
break
238-
if data is not None:
239-
break
240-
241-
if model is None or data is None:
242-
print(" Warning: could not find MuJoCo model/data — no screenshots")
243-
return
244-
245-
renderer = mujoco.Renderer(model, height, width)
246-
camera_names = [model.camera(i).name for i in range(model.ncam)]
247-
248-
def render_camera(camera_name: str) -> np.ndarray:
249-
if camera_name not in camera_names:
250-
raise ValueError(f"Unknown camera: {camera_name}. Available: {camera_names}")
251-
renderer.update_scene(data, camera=camera_name)
252-
return renderer.render()
253-
254-
# Patch the unwrapped env so the wrapper detects render_camera capability
255-
unwrapped.render_camera = render_camera # type: ignore[attr-defined]
256-
unwrapped.cameras = camera_names # type: ignore[attr-defined]
257-
# Store model/data for controller state access
258-
unwrapped.mj_model = model # type: ignore[attr-defined]
259-
unwrapped.mj_data = data # type: ignore[attr-defined]
260-
print(f" Added MuJoCo rendering: {len(camera_names)} cameras {camera_names}")
261-
262-
26368
# ---------------------------------------------------------------------------
26469
# Validation
26570
# ---------------------------------------------------------------------------

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ lerobot = [
5252
"gymnasium>=0.29",
5353
"mujoco>=3.0",
5454
"Pillow",
55+
"torch",
5556
]
5657
dev = [
5758
"pytest>=7.0",
@@ -137,5 +138,7 @@ module = [
137138
"onnxruntime.*",
138139
"huggingface_hub.*",
139140
"mcp.*",
141+
"torch.*",
142+
"lerobot.*",
140143
]
141144
ignore_missing_imports = true

0 commit comments

Comments
 (0)