Guidance for AI agents working in this repository. This is the reachy-voice exhibition voice-companion app for a Reachy Mini Lite running on a Jetson. It is an on-robot Python app built on the Reachy Mini Python SDK ≥ 1.8.0 and the ovs_agent (OpenVoiceStream) framework.
This is not a JavaScript / Web / WebRTC app and not a fresh scaffold. The upstream SDK
AGENTS.mddescribes a JS "golden path" for new web apps — that path does not apply here. Only the SDK facts in SDK reference below carry over. Don't introduce JS apps, WebRTC,mountHost(), HF Spaces, or thereachy-mini-app-assistantscaffolding into this repo.
src/reachy_voice/— the app (the only thing that ships):main.py— entry point (ReachyVoiceApp, aReachyMiniAppsubclass) + the dashboard HTTP/WS server +/debug/*endpoints.conversation.py—ConversationEnginewrapping ovs_agentCompanionRobotApp(SLV ASR/TTS + edge-LLM + tools).motion.py— 25 Hz single-writer compositor: gaze tracking, speech wobble, idle presence, and official recorded-move expressions.attention.py+gaze.py— engagement-gated face tracking;FaceGazeprojects pixels→yaw/pitch via SDK camera intrinsics.vision.py— ZMQ consumer of the vision-trt face/emotion stream.audio.py— duplexsounddeviceI/O (subclasses ovs_agentAudioIO).dashboard.py,config.py,tier_a.py(dashboard API endpoints).
legacy/reachy_claw/— the retired old app (a full plugin platform on the old SDK with SDK media monkey-patching). Not packaged, not imported. Kept only as a reference to port features from (diary, skills, Home Assistant, voice-cloning, conversation modes, multi-LLM backends). Do not build on it.deploy/jetson/— per-component deploy.voice/is this app'sDockerfile+docker-compose.yml+entrypoint.sh; siblings:vision-trt/,reachy/,edge-llm/, …tests/voice/— the active test suite. Thereachy_clawtests undertests/root target legacy code and are not run (CI is scoped totests/voice).
The app orchestrates; it never touches hardware directly — everything goes through the daemon and the service containers:
| Container | Port(s) | Role |
|---|---|---|
| reachy-daemon | :38001 |
Official reachy_mini daemon — drives motors, exposes the SDK websocket (/ws/sdk, /api/state/full). Not our code; don't modify. |
| vision-trt | :8630 HTTP, :8631 ZMQ |
Camera → TensorRT face/emotion → publishes faces over ZMQ (msgpack, topic vision). The app consumes :8631. |
| deploy-speech (SLV) | :8621 |
ASR + TTS over one WebSocket (V2V). |
| edge-llm-chat-service | :11435 |
Local LLM, OpenAI-compatible. |
| reachy-voice | :8042 |
This app (dashboard + orchestration). |
Data flow:
- listen→think→speak: mic → SLV(ASR) → edge-LLM → SLV(TTS) → speaker
- see→track: camera → vision-trt → ZMQ →
attention/gaze→ SDK → motors - emote: LLM reply
[emotion]tags →motionplays the official move libraries
uv sync --extra dev
uv run pytest tests/voice -v # active suite; some tests skip off-Jetson
uv run ruff check .- Python ≥ 3.11 (CI matrix 3.11/3.12).
- The app imports GStreamer (
gi) transitively via the SDK, soimport reachy_voice.maindoes not work on a plain dev box (nogi/PortAudio) — real verification happens on the Jetson. Tests guard SDK/GStreamer/PortAudio imports and skip when absent.
- The app is baked into a Docker image built
FROM reachy-claw:slv-v7, run viadeploy/jetson/voice/docker-compose.yml(host net, privileged,restart: unless-stopped). Build context = repo root; the Dockerfile COPYssrc/reachy_voice,hf-hub/, and the entrypoint. hf-hub/(~107 MB official emotion/dance move cache) must be pre-staged in the build context. It is gitignored — never commit it.- Hot-patch one file:
docker cpinto the container +docker restart reachy-voice(Python needs restart; static files underreachy_voice/static/just need a hard browser reload). Hot-patches are lost on image rebuild — the durable source of truth is the repo. - Do not
docker compose upin a way that recreates from a stale image and wipes hot-patches. On the live robot, prefer a targeted restart. - Code delivery: feature branch + PR to
suharvest/reachy-claw(origin, over ssh). Never local-merge to master. Push over ssh — theghhttps token lacks theworkflowscope, so workflow-file changes are rejected over https.
- Connect:
ReachyMini(..., request_media_backend="no_media"). The app runs no_media — the camera is owned by vision-trt and audio by our duplexsounddevicestream. Do not use the SDK media/camera/audio path. - Daemon is on
:38001here (not the SDK default:8000). - Continuous motion (25 Hz compositor):
set_target_head_pose(create_head_pose(...)),set_target_body_yaw(rad),set_target_antenna_joint_positions([right, left] in rad); callenable_motors()once on start (1.8.0 requires torque on forset_target_*). - Expressions:
play_move(move, sound=False)from the official recorded-move libraries (pollen-robotics/reachy-mini-emotions-library,reachy-mini-dances-library, baked into the image's HF cache). - Rule of thumb:
goto_target()for one-shot smooth gestures (≥ 0.5 s);set_target_*for real-time loops (10 Hz+).
- Live exhibition robot — keep it running. Verify changes via the app's
/debug/{motion,vision,gaze,emotion,say,face,state}endpoints and the dashboard; don't make the user repeatedly interact to test. - The robot must track while listening, not freeze; the hardware handles mic noise.
- Motion kill-switch: env
REACHY_MOTION=0or file/tmp/reachy_motion_offdisables all motion at startup (read once, survivesdocker restart). If the robot is silent, check for it. - Never access the audio/reSpeaker USB from outside the app while the app is running.
- Official SDK + app docs: https://github.com/pollen-robotics/reachy_mini. Its
AGENTS.mdtargets new JS/Web apps — not applicable here beyond these facts. - Hardware: head = 6-DOF Stewart platform, body = 1-DOF rotation, 2 antennas (usable as buttons). SDK-enforced safety ranges (auto-clamped): head pitch/roll ±40°, head yaw ±180°, body yaw ±160°, max head–body yaw delta 65°. (This app clamps more conservatively: head ±20°, body ±40°.)
- Daemon REST/interactive docs:
http://localhost:38001/docs.