Reachy Mini Jetson Assistant

x

A low-latency, fully on-device voice and vision assistant for Reachy Mini Lite powered by NVIDIA Jetson. Everything runs locally with GPU acceleration — no cloud, no API keys, no internet required at runtime.

Current target: Jetson Orin Nano 8GB (JetPack 6.x, Python 3.10)

AGX Orin and Thor support is planned — see Roadmap.

What It Does

Speak to Reachy Mini and it responds using a vision-language model that sees through its camera. The robot moves its head and antennas while it talks, and you can watch everything live through a browser-based UI.

[Mic] → [Silero VAD] → [faster-whisper STT] ──┐
[USB Camera] → [Frame Ring Buffer] ────────────┼→ [VLM stream] → [TTS stream] → [Speaker + Robot]
                                               └→ [Web UI via WebSocket]

Demo

Supported Modes

Mode	Entry Point	Description
Vision Chat	`python3 run_vision_chat.py`	Camera + VLM + voice (terminal only)
Web Vision Chat	`python3 run_web_vision_chat.py`	Same as above + browser UI at `:8090`
Voice Chat	`python3 run_voice_chat.py`	Text LLM + optional RAG (no camera)
Text Chat	`python3 main.py chat -t`	Interactive text chat (no mic/speaker)
CLI	`python3 main.py ask "..."`	Single question, one-shot answer

Stack

Component	Library	Acceleration	Notes
VLM	llama.cpp (Docker)	GPU	Cosmos-Reason2-2B GGUF, OpenAI-compatible API
LLM	llama.cpp (Docker)	GPU	Gemma 3 1B for text-only mode
STT	faster-whisper	GPU (CUDA)	CTranslate2 with CUDA, small.en default
TTS	Kokoro ONNX	GPU (CUDA)	Natural voices, subprocess-isolated (see License Notes)
VAD	Silero VAD	CPU	Neural VAD, far better than energy-only
Camera	OpenCV V4L2	CPU	3 fps ring buffer, configurable resolution
Robot	Reachy Mini SDK	USB	Head pose, antennas, wake/sleep
RAG	ChromaDB + llama.cpp	GPU	bge-small-en-v1.5 embeddings (voice chat only)
Web UI	FastAPI + WebSocket	CPU	Live video, conversation stream, system stats

Prerequisites

NVIDIA Jetson Orin Nano (8GB) with JetPack 6.x, Python 3.10, Docker + NVIDIA runtime
Reachy Mini Lite connected via USB
NVMe SSD recommended for swap and model storage

Setup

See SETUP.md for the full installation guide — hardware setup, dependencies, Python packages, model downloads, and troubleshooting.

Usage

Quick Start (Vision Chat with Web UI)

This is the recommended mode — VLM + camera + voice + browser dashboard:

Terminal 1 — Start the VLM server:

NP=1 ./run_llama_cpp.sh Kbenkhaled/Cosmos-Reason2-2B-GGUF:Q4_K_M

Wait until you see llama server listening at http://0.0.0.0:8080.

Terminal 2 — Start the assistant:

source venv/bin/activate
python3 run_web_vision_chat.py

Open http://<jetson-ip>:8090 in a browser to see the live UI with camera feed, conversation log, and system stats. The robot listens through its microphone and responds via VLM + TTS.

Press Ctrl+C once to exit cleanly (robot will go to sleep position).

Vision Chat (Terminal Only)

Same pipeline without the web UI:

NP=1 ./run_llama_cpp.sh Kbenkhaled/Cosmos-Reason2-2B-GGUF:Q4_K_M
# In another terminal:
source venv/bin/activate
python3 run_vision_chat.py

Voice Chat (Text LLM, No Camera)

For text-only conversations with optional RAG:

./run_llama_cpp.sh ggml-org/gemma-3-1b-it-GGUF:Q8_0
# For RAG, also start the embedding server:
./run_llama_embedding.sh ggml-org/bge-small-en-v1.5-Q8_0-GGUF:Q8_0

# In another terminal:
source venv/bin/activate
python3 run_voice_chat.py           # with RAG
python3 run_voice_chat.py --no-rag  # without RAG

CLI Commands

python3 main.py chat -t                        # interactive text chat
python3 main.py ask "What is the Jetson Orin?"  # single question
python3 main.py info                            # system info
python3 main.py rag-status                      # RAG index status
python3 main.py rag-search "GPU specs"          # search the knowledge base

Test Robot Movement

python3 scripts/test_reachy_movement.py

Stopping

# Stop the LLM/VLM Docker container:
docker stop assistant-llm

# Stop the embedding server (if running):
docker stop assistant-embed

Web UI

The web UI (run_web_vision_chat.py) provides a real-time dashboard accessible from any browser on the same network:

Live camera feed at 10 fps (independent of the 3 fps VLM ring buffer)
Conversation log with streaming VLM responses
Push-to-talk button (starts muted, click to unmute)
System stats — CPU, GPU, RAM usage
Config panel — displays active settings
Platform detection — shows the specific Jetson model

Access at http://<jetson-ip>:8090. The web UI adds minimal overhead (~5 MB RAM).

Configuration

All settings live in config/settings.yaml. Edit this file to tune behavior:

Section	What It Controls
`llm`	LLM server URL, model, temperature, max tokens, system prompts
`stt`	Whisper model size, CUDA device, beam size
`tts`	Voice, speed, language, chunking
`audio`	Sample rate, input device
`vad`	Silero threshold, silence duration, utterance filters
`vision`	Camera resolution, capture FPS, frames per query, VLM system prompt, few-shot examples
`reachy`	Robot connection, daemon behavior, wake/sleep on start/exit
`web`	UI FPS, host, port
`rag`	Embedding backend, knowledge directory, retrieval settings

For developers adding new config fields, see app/config.py — typed dataclasses that define the schema and fallback defaults. The YAML always wins at runtime; the dataclass default is used if a key is missing from YAML.

Project Structure

reachy-mini-jetson-assistant/
├── app/
│   ├── pipeline.py          # Audio I/O, VAD, TTS streaming, mic recording
│   ├── config.py            # Configuration dataclasses + YAML loader
│   ├── llm.py               # LLM/VLM client (OpenAI-compatible, multimodal)
│   ├── stt.py               # faster-whisper speech-to-text
│   ├── tts.py               # TTS client (spawns subprocess worker)
│   ├── tts_worker.py        # TTS subprocess (Kokoro + GPL deps, isolated)
│   ├── camera.py            # USB webcam ring buffer (OpenCV, V4L2)
│   ├── reachy.py            # Reachy Mini connection, daemon management
│   ├── web.py               # FastAPI + WebSocket server for browser UI
│   ├── monitor.py           # System resource monitoring (CPU/GPU/RAM)
│   ├── rag.py               # ChromaDB + embeddings retrieval
│   ├── audio.py             # PulseAudio / ALSA device helpers
│   └── cli.py               # Typer CLI (chat, ask, rag-*)
├── config/
│   └── settings.yaml        # All runtime configuration
├── static/
│   └── index.html           # Web UI (single-file HTML/CSS/JS)
├── scripts/
│   ├── bench_ttft.py        # VLM TTFT benchmark
│   ├── test_reachy_movement.py   # Robot movement test
│   └── test_vlm_prompts.py  # VLM prompt experiments
├── knowledge_base/          # Markdown docs for RAG
├── models/                  # Local GGUF models (gitignored)
├── voices/                  # TTS voice files (gitignored)
├── run_web_vision_chat.py   # Vision chat + web UI (recommended)
├── run_vision_chat.py       # Vision chat (terminal only)
├── run_voice_chat.py        # Voice chat with optional RAG
├── run_llama_cpp.sh         # Docker LLM/VLM server launcher
├── run_llama_embedding.sh   # Docker embedding server launcher
├── main.py                  # CLI entry point
└── requirements.txt         # Python dependencies

Performance Notes (Orin Nano 8GB)

Metric	Value
STT latency	~0.7s (small.en, beam=1)
VLM TTFT (warm cache)	~6–8s (Cosmos-Reason2-2B Q4_K_M)
VLM TTFT (cold)	~8–10s
TTS latency (first chunk)	~0.3s (Kokoro GPU)
End-to-end (speak → robot responds)	~8–12s
Peak RAM	~7.5 GB (STT + VLM + TTS + camera + web UI)

The VLM vision encoder prefill is the primary bottleneck on Orin Nano. Flash attention (-fa on) and KV cache prefix reuse (--cache-reuse 256) are enabled in run_llama_cpp.sh to minimize repeated work across queries.

Roadmap

Contributions for AGX Orin and Thor testing are welcome.

Troubleshooting

See SETUP.md for common issues and fixes.

Reachy Mini Resources

Resource	Link
Getting Started	huggingface.co/docs/reachy_mini
Reachy Mini Lite Setup	Lite Guide
Python SDK Docs	SDK Reference
Quickstart	First Behavior
AI Integrations	LLMs, Apps, HF Spaces
Core Concepts	Architecture & Coordinates
Code Examples	github.com/pollen-robotics/reachy_mini/examples
Community Apps	Hugging Face Spaces
Discord	Join the Community
Troubleshooting	FAQ Guide

License Notes

This project uses Kokoro ONNX for text-to-speech. Kokoro ONNX itself is MIT-licensed, but it depends on:

phonemizer-fork — GPL-3.0 (text-to-phoneme conversion)
espeak-ng — GPL-3.0 (speech synthesis library loaded by espeakng-loader)

To avoid loading GPL-licensed code into the same process as NVIDIA CUDA libraries, TTS runs in a separate subprocess (app/tts_worker.py). The main application process never imports kokoro-onnx, phonemizer-fork, or espeak-ng — it communicates with the TTS worker via JSON over stdin/stdout. This is the same process-boundary isolation pattern used by the llama.cpp VLM backend (which runs in a separate Docker container).

All other dependencies use permissive licenses (MIT, BSD-3, Apache-2.0). See THIRD-PARTY-NOTICES.md for the full list.

Contributing

We welcome community contributions. Please see CONTRIBUTING.md for guidelines, including the Developer Certificate of Origin (DCO) sign-off requirement.

License

Apache 2.0 — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reachy Mini Jetson Assistant

What It Does

Demo

Supported Modes

Stack

Prerequisites

Setup

Usage

Quick Start (Vision Chat with Web UI)

Vision Chat (Terminal Only)

Voice Chat (Text LLM, No Camera)

CLI Commands

Test Robot Movement

Stopping

Web UI

Configuration

Project Structure

Performance Notes (Orin Nano 8GB)

Roadmap

Troubleshooting

Reachy Mini Resources

License Notes

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
config		config
docs/images		docs/images
knowledge_base		knowledge_base
scripts		scripts
static		static
voices		voices
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
THIRD-PARTY-NOTICES.md		THIRD-PARTY-NOTICES.md
main.py		main.py
requirements.txt		requirements.txt
run_llama_cpp.sh		run_llama_cpp.sh
run_llama_embedding.sh		run_llama_embedding.sh
run_vision_chat.py		run_vision_chat.py
run_voice_chat.py		run_voice_chat.py
run_web_vision_chat.py		run_web_vision_chat.py

Folders and files

Latest commit

History

Repository files navigation

Reachy Mini Jetson Assistant

What It Does

Demo

Supported Modes

Stack

Prerequisites

Setup

Usage

Quick Start (Vision Chat with Web UI)

Vision Chat (Terminal Only)

Voice Chat (Text LLM, No Camera)

CLI Commands

Test Robot Movement

Stopping

Web UI

Configuration

Project Structure

Performance Notes (Orin Nano 8GB)

Roadmap

Troubleshooting

Reachy Mini Resources

License Notes

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages