|
| 1 | +# ROS Discourse Post: Visual Testing Harness for AI Coding Agents |
| 2 | + |
| 3 | +_Context: Issue [miaodx/roboharness#149](https://github.com/MiaoDX/roboharness/issues/149) — post on ROS Discourse for community feedback._ |
| 4 | + |
| 5 | +**Post in:** https://discourse.ros.org/ → **General** category |
| 6 | + |
| 7 | +**Title:** `Visual testing harness for AI coding agents in robot simulation — looking for feedback` |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Full Post Content |
| 12 | + |
| 13 | +Hi ROS community, |
| 14 | + |
| 15 | +**A question for those using Claude Code, Codex, or other AI coding agents for robotics work:** how do you debug simulation behavior when the agent is writing the control code? |
| 16 | + |
| 17 | +My specific problem: when I used Claude Code to write MuJoCo control scripts, it could read error logs and joint angles, but it couldn't _see_ what the robot was actually doing. The agent would iterate on code that looked plausible but produced obviously wrong behavior — wrong grasp orientations, unstable footing, arm trajectories that clipped through geometry. These failures are trivial for a human to spot in a viewer, but invisible to a text-only agent. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +### What I built: roboharness |
| 22 | + |
| 23 | +[roboharness](https://github.com/MiaoDX/roboharness) is a visual testing harness that pauses simulation at named checkpoints and captures multi-view screenshots alongside structured JSON state. The agent reads these files directly — no separate VLM inference step needed. |
| 24 | + |
| 25 | +**MuJoCo grasp demo** (front view — Plan → Pregrasp → Approach → Close → Lift → Holding): |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | +**[→ Live Report (auto-generated from CI on every push)](https://miaodx.com/roboharness/grasp/)** |
| 30 | + |
| 31 | +The report shows 6 checkpoints from pre-grasp to object-in-hand. CI regenerates it via GitHub Actions on every push using `MUJOCO_GL=osmesa` for headless rendering. |
| 32 | + |
| 33 | +The core pattern is two lines to wrap any Gymnasium environment: |
| 34 | + |
| 35 | +```python |
| 36 | +from roboharness.wrappers import RobotHarnessWrapper |
| 37 | + |
| 38 | +env = RobotHarnessWrapper(env, |
| 39 | + checkpoints=[{"name": "pre_grasp", "step": 50}, {"name": "lift", "step": 120}], |
| 40 | + output_dir="./harness_output", |
| 41 | +) |
| 42 | +``` |
| 43 | + |
| 44 | +At each checkpoint, roboharness saves: |
| 45 | +- PNG screenshots from all configured cameras (front, side, wrist, top-down) |
| 46 | +- `state.json` — joint positions, velocities, ctrl commands |
| 47 | +- `metadata.json` — sim_time, step index, camera list |
| 48 | + |
| 49 | +The AI agent reads these files and reasons about what to change next. |
| 50 | + |
| 51 | +Or with the lower-level API: |
| 52 | + |
| 53 | +```python |
| 54 | +from roboharness import Harness |
| 55 | +from roboharness.backends.mujoco_meshcat import MuJoCoMeshcatBackend |
| 56 | + |
| 57 | +backend = MuJoCoMeshcatBackend(model_path="robot.xml", cameras=["front", "side"]) |
| 58 | +harness = Harness(backend, output_dir="./output") |
| 59 | + |
| 60 | +harness.add_checkpoint("pre_grasp") |
| 61 | +harness.add_checkpoint("lift") |
| 62 | +result = harness.run_to_next_checkpoint(actions) |
| 63 | +# result.views → multi-view screenshots, result.state → joint angles + velocities |
| 64 | +``` |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +### Current status |
| 69 | + |
| 70 | +| Simulator | Status | |
| 71 | +|---|---| |
| 72 | +| MuJoCo (native backend) | ✅ Implemented — headless via `MUJOCO_GL=osmesa` or `egl` | |
| 73 | +| Gymnasium wrapper | ✅ Works with Isaac Lab, ManiSkill, LeRobot, etc. | |
| 74 | +| LeRobot (`make_env()` factory) | ✅ Implemented | |
| 75 | +| **Gazebo / ROS2** | 📋 Planned — not yet implemented | |
| 76 | + |
| 77 | +--- |
| 78 | + |
| 79 | +### Questions for this community |
| 80 | + |
| 81 | +The Gazebo integration is the obvious next step for ROS users, and I'd genuinely value input before committing to an approach: |
| 82 | + |
| 83 | +1. **Capture method:** For screenshot capture from Gazebo, which do you prefer in practice — subscribing to `/camera/image_raw` via a ROS2 node, or using Gazebo's native snapshot API (where available)? Or is there a third approach that's more CI-friendly? |
| 84 | + |
| 85 | +2. **State source:** Is TF2 the right source for robot state (end-effector poses, joint frames) in ROS2 context? Would you also expect `/joint_states` to be exposed, or is TF sufficient? |
| 86 | + |
| 87 | +3. **CI renderer:** If you run headless Gazebo in CI today, which renderer are you using — `--headless-rendering` with WebGL, OSMesa, or something else? Osmesa works reliably for MuJoCo; curious whether the same holds for Gazebo. |
| 88 | + |
| 89 | +4. **Use case fit:** Does AI-agent-driven robot simulation development cause debugging pain in your workflow? Or is this solving a problem you don't actually have? Honest answer appreciated — I'd rather know the use case is wrong than optimize for the wrong thing. |
| 90 | + |
| 91 | +--- |
| 92 | + |
| 93 | +GitHub: https://github.com/MiaoDX/roboharness |
| 94 | +MIT License, Python 3.10+, numpy-only core (MuJoCo/Meshcat optional) |
| 95 | + |
| 96 | +Happy to answer technical questions. Looking for feedback, not promoting — I'm genuinely trying to understand whether this is useful to ROS developers and what the right Gazebo integration path looks like. |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## Posting Notes |
| 101 | + |
| 102 | +- Post in **General** category on https://discourse.ros.org/ |
| 103 | +- ROS Discourse account required (login with GitHub) |
| 104 | +- Add tags: `simulation`, `testing`, `ai-agents` (if tag system allows) |
| 105 | +- The GIF is embedded via raw GitHub URL — it will render inline on Discourse |
| 106 | +- Tone check: the four questions at the end must feel like real questions, not softening before a pitch. Edit them if they read as rhetorical |
| 107 | +- After posting, note the Discourse thread URL in a comment on issue #149 so the response thread is tracked |
| 108 | + |
| 109 | +--- |
| 110 | + |
| 111 | +## Related issues |
| 112 | + |
| 113 | +- **#152** — Gazebo/ROS2 showcase (the implementation that this post is seeking feedback for) |
| 114 | +- **#150** — ros2_mcp collaboration discussion (complementary tool, different angle) |
| 115 | +- **#146** — awesome-ros2 listing (deferred — requires forking external repo) |
0 commit comments