ROS Agentic Operating System: Control robots with LLMs through MCP with Reachy Mini as the interface.
Using Reachy Mini Lite for easy media stream.
The client supports a local OpenAI-compatible LLM (e.g. vLLM), the OpenAI API, the Groq API, the Anthropic API, or OpenAI Codex subscription auth. Choose one via CLI or environment variables.
For local inference, run an OpenAI-compatible server (e.g. vLLM) and point the client at it:
# On the machine with the GPU (or with port forwarding):
vllm serve openai/gpt-oss-120b --tool-call-parser openai --enable-auto-tool-choice --port 6000To verify the endpoint: curl http://localhost:6000/v1/models (or use https if your server uses TLS).
Groq provides inference with a free tier (with limits). You must set an API key to use Groq:
- macOS/Linux:
export GROQ_API_KEY=your_key - Windows (PowerShell):
$env:GROQ_API_KEY="your_key"
Get a key at console.groq.com/keys.
Supported Groq tool-use models include: llama-3.1-8b-instant, llama-3.3-70b-versatile, openai/gpt-oss-120b, openai/gpt-oss-20b, moonshotai/kimi-k2-instruct-0905, qwen/qwen3-32b, and meta-llama/llama-4-scout-17b-16e-instruct. Default is openai/gpt-oss-120b.
Required for image analysis and better TTS experience.
OpenAI is the default hosted agent model provider:
- macOS/Linux:
export OPENAI_API_KEY=your_key - Windows (PowerShell):
$env:OPENAI_API_KEY="your_key"
OpenAI API usage is billed through the API platform, separately from ChatGPT Free/Plus/Pro subscriptions. A ChatGPT subscription does not provide API quota for rosaOS.
rosaOS can also use the sibling ../openai-subscription-wrapper package to talk to the ChatGPT Codex backend with ChatGPT Plus/Pro subscription OAuth credentials:
scripts/reachy_mini_env/bin/openai-codex-client login
# Or, if you already logged in with Pi:
scripts/reachy_mini_env/bin/openai-codex-client import-piThen start the client with --codex or --provider openai-codex.
Developed with Python 3.12.
Cloning this repo requires the use of the recursive flag to download all submodules (ros-mcp-server). Further instructions to setup ros-mcp-server are in the rosaOS setup file found in the submodule directory.
git clone https://github.com/lilyjge/reachy-mcp.git --recursive
cd reachy-mcp
uv venv --python 3.12 scripts/reachy_mini_env
# Install dependencies
uv pip install -p scripts/reachy_mini_env/bin/python -r requirements.txt
For fresh macOS + Reachy Mini Lite setup details, including camera permissions and voice/STT keys, see docs/macos-reachy-mini-setup.md.
Start all services at once:
./scripts/start_all.shThis will start:
- Reachy Mini daemon (port 8000)
- MCP server (port 5001)
- Process manager MCP server (port 7001)
- RAG agent (port 8765)
Logs are saved to the scripts/logs/ directory. To stop all services:
./scripts/stop_all.shAlternatively, start each service manually:
Start Reachy Mini's robot daemon server on the default port 8000:
scripts/reachy_mini_env/bin/reachy-mini-daemon
Start the Reachy Mini's MCP server on port 5001. For TTS, we support ElevenLabs API, Groq API, or the local pyttsx3 package.
scripts/reachy_mini_env/bin/python -m server
scripts/reachy_mini_env/bin/python -m server --tts-elevenlabs --tts-voice M4zkunnpRihDKTNF0D7f # Use ElevenLabsStart the operating system's client (default port 8765).
To use your own OpenAI compatible endpoint for the agents, start the client with --local and optionally --endpoint (port, default 6000).
To use OpenAI, Groq, Anthropic, or OpenAI Codex subscription auth, start the client with --provider or a shortcut flag and optionally specify a model with --model.
scripts/reachy_mini_env/bin/python -m client # OpenAI (requires OPENAI_API_KEY)
scripts/reachy_mini_env/bin/python -m client --local # Local LLM at port 6000
scripts/reachy_mini_env/bin/python -m client --provider groq --model moonshotai/kimi-k2-instruct-0905 # Groq
scripts/reachy_mini_env/bin/python -m client --anthropic --model claude-sonnet-4-6 # Anthropic API
scripts/reachy_mini_env/bin/python -m client --openai --model gpt-5.2 # OpenAI API
scripts/reachy_mini_env/bin/python -m client --codex --model gpt-5.5 # ChatGPT Codex subscription authNow you can talk to the Reachy Mini directly.
To chat via CLI instead of the robot:
scripts/reachy_mini_env/bin/python -m client.chat.client_cli
# Optional: --base-url http://localhost:8765 (or set RAG_AGENT_PORT)Or, when the agent is running, visit http://localhost:8765/ in your browser (or the port you set with --port / RAG_AGENT_PORT).
All ports and the LLM source can be overridden by environment variables so scripts and deployed setups don't rely on CLI flags.
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
— | Required by default, or when using OpenAI (--openai, --provider openai, or LLM_PROVIDER=openai). OpenAI API key from https://platform.openai.com/api-keys; ChatGPT subscriptions do not count as API billing. |
OPENAI_MODEL |
gpt-5.2 |
OpenAI model name when LLM_PROVIDER=openai (overridden by --model when using --openai or --provider openai). |
GROQ_API_KEY |
— | Required when using Groq. Groq API key from console.groq.com/keys. |
LOCAL_LLM |
— | Set to 1 or true to use local OpenAI-compatible endpoint. |
LOCAL_LLM_PORT |
6000 |
Port of local LLM when LOCAL_LLM is set. |
LOCAL_LLM_ENDPOINT |
— | Full base URL (e.g. https://localhost:6000/v1) overrides port. |
GROQ_MODEL |
openai/gpt-oss-120b |
Groq model when not using local LLM. |
LLM_PROVIDER |
openai |
Remote LLM provider when not using local LLM. One of openai, groq, anthropic, or openai-codex. Usually set via CLI (python -m client flags). |
ANTHROPIC_API_KEY |
— | Required when using Anthropic (--anthropic or LLM_PROVIDER=anthropic). Anthropic API key from https://console.anthropic.com. |
ANTHROPIC_MODEL |
claude-sonnet-4-6 |
Anthropic model name when LLM_PROVIDER=anthropic (overridden by --model when using --anthropic). |
OPENAI_CODEX_MODEL |
gpt-5.5 |
OpenAI Codex model name when LLM_PROVIDER=openai-codex (overridden by --model when using --codex or --provider openai-codex). |
OPENAI_CODEX_AUTH_FILE |
~/.openai-codex-client/auth.json |
Optional auth file for the sibling OpenAI Codex client adapter. If absent, the adapter can fall back to Pi's ~/.pi/agent/auth.json. |
OPENAI_CODEX_ORIGINATOR |
rosaos |
Originator header for OpenAI Codex subscription requests. |
RAG_AGENT_PORT |
8765 |
Client app (kernel + chat) port. |
RAG_AGENT_URL |
— | Full base URL for chat CLI (e.g. http://localhost:8765). |
PROCESS_SERVER_PORT |
7001 |
Process manager MCP server port. |
PROCESS_SERVER_URL |
— | Full process server URL (e.g. http://localhost:7001/mcp). |
REACHY_MCP_PORT |
5001 |
Reachy Mini MCP server port (when starting python -m server). |
STT_CALLBACK_URL |
from RAG_AGENT_PORT |
Where the server POSTs transcribed speech (default http://localhost:{RAG_AGENT_PORT}/stt). |
STT_WAKE_WORD |
hello |
Wake word used when eye contact is absent. After the wake word, Reachy turns toward the detected audio direction and then listens for the command. |
STT_WAKE_WORD_ALIASES |
hello,helo,hallo,hullo |
Comma-separated wake-word transcription variants accepted while listening for activation. |
STT_SILENCE_THRESHOLD_SEC |
0.55 |
Silent audio duration before an utterance is considered complete. Increase if speech gets cut off; decrease for snappier turn-taking. |
STT_VAD_CHUNK_DURATION |
0.12 |
Audio chunk size used by voice activity detection. Smaller values respond sooner with slightly more CPU overhead. |
STT_MIN_SPEECH_DURATION_SEC |
0.35 |
Minimum accepted command speech duration, used to ignore noise. |
STT_MIN_WAKE_SPEECH_DURATION_SEC |
0.25 |
Minimum accepted wake-check speech duration, kept lower so short wake words can activate. |
STT_MIN_SPEECH_CHUNKS |
3 |
Minimum number of speech-positive chunks before command audio can be transcribed. |
STT_MIN_WAKE_SPEECH_CHUNKS |
2 |
Minimum number of speech-positive chunks before wake-check audio can be transcribed. |
STT_PRE_SPEECH_BUFFER_SEC |
0.6 |
Audio kept before speech detection starts, to preserve the first syllables of an utterance. |
STT_SIMPLE_RMS_THRESHOLD |
0.035 |
Energy threshold used as a fallback/safety net when neural VAD misses short speech. Raise this if background noise is detected as speech. |
STT_MIN_TRANSCRIBE_RMS |
0.01 |
Minimum full-utterance RMS before a transcript is allowed to be posted. |
EYE_CONTACT_POLL_INTERVAL |
0.16 |
Seconds between eye-contact camera checks while waiting for activation. |
AGENT_RETRIES |
3 |
Pydantic-AI retry count for kernel and worker agents. Higher values can hide transient failures but feel slower when a provider is unhealthy. |
TTS_ENGINE |
groq |
TTS backend: groq or elevenlabs. |
TTS_VOICE |
autumn |
Preferred TTS voice name / ID (used for Groq Orpheus and ElevenLabs). |
ELEVENLABS_API_KEY |
— | ElevenLabs API key when using TTS_ENGINE=elevenlabs or --tts-elevenlabs. |
ELEVENLABS_VOICE_ID |
from TTS_VOICE |
Optional explicit ElevenLabs voice ID. |
ELEVENLABS_MODEL |
eleven_flash_v2_5 |
ElevenLabs TTS model ID. |
ROSAOS_CONFIG_DIR |
config |
Directory for drivers.json, kernel.txt, process.txt, and prompts/. |
Agent system prompts and robot config live under the config directory (or ROSAOS_CONFIG_DIR):
config/kernel.txt— System prompt for the kernel agent (one placeholder:{robot_list}).config/process.txt— System prompt template for process agents (placeholders:{robot_instructions},{kernel_instructions}).config/drivers.json— MCP server names, URLs, and descriptions. If you changeREACHY_MCP_PORT, update thereachy-miniURL in this file to match (e.g.http://localhost:5001/mcp).config/prompts/<server_name>.txt— Per-robot instructions for the LLM (e.g.reachy-mini.txt).
Edit these files to customize behavior without changing code.
Debug MCP servers using the MCP Inspector Tool (requires Node installation):
npx @modelcontextprotocol/inspectorrosaOS is structured like a minimal operating system: a kernel schedules and supervises processes (LLM workers) that perform tasks, while a device layer (MCP server) exposes hardware (Reachy Mini) as callable tools. The LLM is the “CPU” that executes kernel and process logic.
| Layer | Component | OS analogy | Role |
|---|---|---|---|
| User / shell | Reachy Mini, or to chat directly, browser UI or CLI | Shell / terminal | Sends prompts and receives responses; polls for event-driven updates. |
| Kernel | Client event worker + Pydantic-AI “kernel” agent | OS kernel / scheduler | Single thread consumes an event queue (speech, worker callbacks, chat messages). Decides when to launch processes (workers) via the process server; does not drive the robot directly. |
| Process manager | Internal MCP server for kernel | Syscall interface / fork |
Exposes process management tools to kernel. Spawns worker subprocesses (python -m client.worker) so each agent has its own event loop and does not block the kernel. |
| Processes | Agent worker subprocesses | User processes | Each runs a Pydantic-AI agent with MCP robot tools. Executes one task from a system prompt generated by kernel, then POSTs a completion callback to the client /event. |
| Device layer | Reachy MCP server, optionally easily connect additional robot MCP servers | Drivers / HAL | FastMCP server with lifespan owning the ReachyMini connection. Registers tools: goto_target, take_picture, speak, play_emotion, describe_image, etc. Runs a background STT loop: mic → VAD → transcribe → POST to client /stt, like a system process for the UI. |
| Hardware | Reachy Mini + other robot | Physical devices | Robot daemon and hardware; MCP server talks to Reachy via reachy_mini SDK and other robots through ROS. |
- User input → Speech via Reachy mic is transcribed by the server’s STT loop and POSTed to client
/stt; or text is sent via CLI or the UI. - Kernel receives an event (
[User said] ...or[Worker callback] ...). It runs the kernel agent (LLM) with tools from the process server, typically callinglaunch_process(system_prompt)to start a worker. - Process manager starts a worker subprocess with
WORKER_ID,WORKER_SYSTEM_PROMPT, andCALLBACK_URL(client/event). - Worker runs the process agent (LLM) with tools from the Reachy MCP server: move, see, speak, etc. When done, it POSTs
{ worker_id, message, done }to/event. - Kernel gets a
[Worker callback]event and can respond to the user (e.g. via another launched process that usesspeak) or launch further work. Primary communication to the user is through Reachy speaking; outgoing messages are also pushed to/updatesfor the UI/CLI to poll.
So: kernel = one agent that only launches processes; processes = short-lived agents that use the robot and report back via callbacks.
See docs/architecture.md for a diagram (Mermaid) of the same layout.