rosaOS

ROS Agentic Operating System: Control robots with LLMs through MCP with Reachy Mini as the interface.

Requirements

Using Reachy Mini Lite for easy media stream.

The client supports a local OpenAI-compatible LLM (e.g. vLLM), the OpenAI API, the Groq API, the Anthropic API, or OpenAI Codex subscription auth. Choose one via CLI or environment variables.

Local LLM (OpenAI-compatible endpoint)

For local inference, run an OpenAI-compatible server (e.g. vLLM) and point the client at it:

# On the machine with the GPU (or with port forwarding):
vllm serve openai/gpt-oss-120b --tool-call-parser openai --enable-auto-tool-choice --port 6000

To verify the endpoint: curl http://localhost:6000/v1/models (or use https if your server uses TLS).

Groq API

Groq provides inference with a free tier (with limits). You must set an API key to use Groq:

macOS/Linux: export GROQ_API_KEY=your_key
Windows (PowerShell): $env:GROQ_API_KEY="your_key"

Get a key at console.groq.com/keys.

Supported Groq tool-use models include: llama-3.1-8b-instant, llama-3.3-70b-versatile, openai/gpt-oss-120b, openai/gpt-oss-20b, moonshotai/kimi-k2-instruct-0905, qwen/qwen3-32b, and meta-llama/llama-4-scout-17b-16e-instruct. Default is openai/gpt-oss-120b.

Required for image analysis and better TTS experience.

OpenAI API

OpenAI is the default hosted agent model provider:

macOS/Linux: export OPENAI_API_KEY=your_key
Windows (PowerShell): $env:OPENAI_API_KEY="your_key"

OpenAI API usage is billed through the API platform, separately from ChatGPT Free/Plus/Pro subscriptions. A ChatGPT subscription does not provide API quota for rosaOS.

OpenAI Codex subscription auth

rosaOS can also use the sibling ../openai-subscription-wrapper package to talk to the ChatGPT Codex backend with ChatGPT Plus/Pro subscription OAuth credentials:

scripts/reachy_mini_env/bin/openai-codex-client login
# Or, if you already logged in with Pi:
scripts/reachy_mini_env/bin/openai-codex-client import-pi

Then start the client with --codex or --provider openai-codex.

Installation

Developed with Python 3.12.

Cloning this repo requires the use of the recursive flag to download all submodules (ros-mcp-server). Further instructions to setup ros-mcp-server are in the rosaOS setup file found in the submodule directory.

git clone https://github.com/lilyjge/reachy-mcp.git --recursive
cd reachy-mcp
uv venv --python 3.12 scripts/reachy_mini_env

# Install dependencies
uv pip install -p scripts/reachy_mini_env/bin/python -r requirements.txt

For fresh macOS + Reachy Mini Lite setup details, including camera permissions and voice/STT keys, see docs/macos-reachy-mini-setup.md.

Usage

Quick Start (All Services)

Start all services at once:

./scripts/start_all.sh

This will start:

Reachy Mini daemon (port 8000)
MCP server (port 5001)
Process manager MCP server (port 7001)
RAG agent (port 8765)

Logs are saved to the scripts/logs/ directory. To stop all services:

./scripts/stop_all.sh

Manual Start (Individual Services)

Alternatively, start each service manually:

Start Reachy Mini's robot daemon server on the default port 8000:

scripts/reachy_mini_env/bin/reachy-mini-daemon

Start the Reachy Mini's MCP server on port 5001. For TTS, we support ElevenLabs API, Groq API, or the local pyttsx3 package.

scripts/reachy_mini_env/bin/python -m server
scripts/reachy_mini_env/bin/python -m server --tts-elevenlabs --tts-voice M4zkunnpRihDKTNF0D7f # Use ElevenLabs

Start the operating system's client (default port 8765). To use your own OpenAI compatible endpoint for the agents, start the client with --local and optionally --endpoint (port, default 6000). To use OpenAI, Groq, Anthropic, or OpenAI Codex subscription auth, start the client with --provider or a shortcut flag and optionally specify a model with --model.

scripts/reachy_mini_env/bin/python -m client                    # OpenAI (requires OPENAI_API_KEY)
scripts/reachy_mini_env/bin/python -m client --local             # Local LLM at port 6000
scripts/reachy_mini_env/bin/python -m client --provider groq --model moonshotai/kimi-k2-instruct-0905 # Groq
scripts/reachy_mini_env/bin/python -m client --anthropic --model claude-sonnet-4-6 # Anthropic API
scripts/reachy_mini_env/bin/python -m client --openai --model gpt-5.2 # OpenAI API
scripts/reachy_mini_env/bin/python -m client --codex --model gpt-5.5 # ChatGPT Codex subscription auth

Now you can talk to the Reachy Mini directly.

To chat via CLI instead of the robot:

scripts/reachy_mini_env/bin/python -m client.chat.client_cli
# Optional: --base-url http://localhost:8765  (or set RAG_AGENT_PORT)

Or, when the agent is running, visit http://localhost:8765/ in your browser (or the port you set with --port / RAG_AGENT_PORT).

Environment variables

All ports and the LLM source can be overridden by environment variables so scripts and deployed setups don't rely on CLI flags.

Variable	Default	Description
`OPENAI_API_KEY`	—	Required by default, or when using OpenAI (`--openai`, `--provider openai`, or `LLM_PROVIDER=openai`). OpenAI API key from `https://platform.openai.com/api-keys`; ChatGPT subscriptions do not count as API billing.
`OPENAI_MODEL`	`gpt-5.2`	OpenAI model name when `LLM_PROVIDER=openai` (overridden by `--model` when using `--openai` or `--provider openai`).
`GROQ_API_KEY`	—	Required when using Groq. Groq API key from console.groq.com/keys.
`LOCAL_LLM`	—	Set to `1` or `true` to use local OpenAI-compatible endpoint.
`LOCAL_LLM_PORT`	`6000`	Port of local LLM when `LOCAL_LLM` is set.
`LOCAL_LLM_ENDPOINT`	—	Full base URL (e.g. `https://localhost:6000/v1`) overrides port.
`GROQ_MODEL`	`openai/gpt-oss-120b`	Groq model when not using local LLM.
`LLM_PROVIDER`	`openai`	Remote LLM provider when not using local LLM. One of `openai`, `groq`, `anthropic`, or `openai-codex`. Usually set via CLI (`python -m client` flags).
`ANTHROPIC_API_KEY`	—	Required when using Anthropic (`--anthropic` or `LLM_PROVIDER=anthropic`). Anthropic API key from `https://console.anthropic.com`.
`ANTHROPIC_MODEL`	`claude-sonnet-4-6`	Anthropic model name when `LLM_PROVIDER=anthropic` (overridden by `--model` when using `--anthropic`).
`OPENAI_CODEX_MODEL`	`gpt-5.5`	OpenAI Codex model name when `LLM_PROVIDER=openai-codex` (overridden by `--model` when using `--codex` or `--provider openai-codex`).
`OPENAI_CODEX_AUTH_FILE`	`~/.openai-codex-client/auth.json`	Optional auth file for the sibling OpenAI Codex client adapter. If absent, the adapter can fall back to Pi's `~/.pi/agent/auth.json`.
`OPENAI_CODEX_ORIGINATOR`	`rosaos`	Originator header for OpenAI Codex subscription requests.
`RAG_AGENT_PORT`	`8765`	Client app (kernel + chat) port.
`RAG_AGENT_URL`	—	Full base URL for chat CLI (e.g. `http://localhost:8765`).
`PROCESS_SERVER_PORT`	`7001`	Process manager MCP server port.
`PROCESS_SERVER_URL`	—	Full process server URL (e.g. `http://localhost:7001/mcp`).
`REACHY_MCP_PORT`	`5001`	Reachy Mini MCP server port (when starting `python -m server`).
`STT_CALLBACK_URL`	from `RAG_AGENT_PORT`	Where the server POSTs transcribed speech (default `http://localhost:{RAG_AGENT_PORT}/stt`).
`STT_WAKE_WORD`	`hello`	Wake word used when eye contact is absent. After the wake word, Reachy turns toward the detected audio direction and then listens for the command.
`STT_WAKE_WORD_ALIASES`	`hello,helo,hallo,hullo`	Comma-separated wake-word transcription variants accepted while listening for activation.
`STT_SILENCE_THRESHOLD_SEC`	`0.55`	Silent audio duration before an utterance is considered complete. Increase if speech gets cut off; decrease for snappier turn-taking.
`STT_VAD_CHUNK_DURATION`	`0.12`	Audio chunk size used by voice activity detection. Smaller values respond sooner with slightly more CPU overhead.
`STT_MIN_SPEECH_DURATION_SEC`	`0.35`	Minimum accepted command speech duration, used to ignore noise.
`STT_MIN_WAKE_SPEECH_DURATION_SEC`	`0.25`	Minimum accepted wake-check speech duration, kept lower so short wake words can activate.
`STT_MIN_SPEECH_CHUNKS`	`3`	Minimum number of speech-positive chunks before command audio can be transcribed.
`STT_MIN_WAKE_SPEECH_CHUNKS`	`2`	Minimum number of speech-positive chunks before wake-check audio can be transcribed.
`STT_PRE_SPEECH_BUFFER_SEC`	`0.6`	Audio kept before speech detection starts, to preserve the first syllables of an utterance.
`STT_SIMPLE_RMS_THRESHOLD`	`0.035`	Energy threshold used as a fallback/safety net when neural VAD misses short speech. Raise this if background noise is detected as speech.
`STT_MIN_TRANSCRIBE_RMS`	`0.01`	Minimum full-utterance RMS before a transcript is allowed to be posted.
`EYE_CONTACT_POLL_INTERVAL`	`0.16`	Seconds between eye-contact camera checks while waiting for activation.
`AGENT_RETRIES`	`3`	Pydantic-AI retry count for kernel and worker agents. Higher values can hide transient failures but feel slower when a provider is unhealthy.
`TTS_ENGINE`	`groq`	TTS backend: `groq` or `elevenlabs`.
`TTS_VOICE`	`autumn`	Preferred TTS voice name / ID (used for Groq Orpheus and ElevenLabs).
`ELEVENLABS_API_KEY`	—	ElevenLabs API key when using `TTS_ENGINE=elevenlabs` or `--tts-elevenlabs`.
`ELEVENLABS_VOICE_ID`	from `TTS_VOICE`	Optional explicit ElevenLabs voice ID.
`ELEVENLABS_MODEL`	`eleven_flash_v2_5`	ElevenLabs TTS model ID.
`ROSAOS_CONFIG_DIR`	`config`	Directory for `drivers.json`, `kernel.txt`, `process.txt`, and `prompts/`.

Configuration

Agent system prompts and robot config live under the config directory (or ROSAOS_CONFIG_DIR):

config/kernel.txt — System prompt for the kernel agent (one placeholder: {robot_list}).
config/process.txt — System prompt template for process agents (placeholders: {robot_instructions}, {kernel_instructions}).
config/drivers.json — MCP server names, URLs, and descriptions. If you change REACHY_MCP_PORT, update the reachy-mini URL in this file to match (e.g. http://localhost:5001/mcp).
config/prompts/<server_name>.txt — Per-robot instructions for the LLM (e.g. reachy-mini.txt).

Edit these files to customize behavior without changing code.

Debugging

Debug MCP servers using the MCP Inspector Tool (requires Node installation):

npx @modelcontextprotocol/inspector

Technical Details

rosaOS is structured like a minimal operating system: a kernel schedules and supervises processes (LLM workers) that perform tasks, while a device layer (MCP server) exposes hardware (Reachy Mini) as callable tools. The LLM is the “CPU” that executes kernel and process logic.

High-level architecture

Layer	Component	OS analogy	Role
User / shell	Reachy Mini, or to chat directly, browser UI or CLI	Shell / terminal	Sends prompts and receives responses; polls for event-driven updates.
Kernel	Client event worker + Pydantic-AI “kernel” agent	OS kernel / scheduler	Single thread consumes an event queue (speech, worker callbacks, chat messages). Decides when to launch processes (workers) via the process server; does not drive the robot directly.
Process manager	Internal MCP server for kernel	Syscall interface / `fork`	Exposes process management tools to kernel. Spawns worker subprocesses (`python -m client.worker`) so each agent has its own event loop and does not block the kernel.
Processes	Agent worker subprocesses	User processes	Each runs a Pydantic-AI agent with MCP robot tools. Executes one task from a system prompt generated by kernel, then POSTs a completion callback to the client `/event`.
Device layer	Reachy MCP server, optionally easily connect additional robot MCP servers	Drivers / HAL	FastMCP server with lifespan owning the ReachyMini connection. Registers tools: `goto_target`, `take_picture`, `speak`, `play_emotion`, `describe_image`, etc. Runs a background STT loop: mic → VAD → transcribe → POST to client `/stt`, like a system process for the UI.
Hardware	Reachy Mini + other robot	Physical devices	Robot daemon and hardware; MCP server talks to Reachy via `reachy_mini` SDK and other robots through ROS.

Data flow

User input → Speech via Reachy mic is transcribed by the server’s STT loop and POSTed to client /stt; or text is sent via CLI or the UI.
Kernel receives an event ([User said] ... or [Worker callback] ...). It runs the kernel agent (LLM) with tools from the process server, typically calling launch_process(system_prompt) to start a worker.
Process manager starts a worker subprocess with WORKER_ID, WORKER_SYSTEM_PROMPT, and CALLBACK_URL (client /event).
Worker runs the process agent (LLM) with tools from the Reachy MCP server: move, see, speak, etc. When done, it POSTs { worker_id, message, done } to /event.
Kernel gets a [Worker callback] event and can respond to the user (e.g. via another launched process that uses speak) or launch further work. Primary communication to the user is through Reachy speaking; outgoing messages are also pushed to /updates for the UI/CLI to poll.

So: kernel = one agent that only launches processes; processes = short-lived agents that use the robot and report back via callbacks.

Architecture diagram

See docs/architecture.md for a diagram (Mermaid) of the same layout.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rosaOS

Requirements

Local LLM (OpenAI-compatible endpoint)

Groq API

OpenAI API

OpenAI Codex subscription auth

Installation

Usage

Quick Start (All Services)

Manual Start (Individual Services)

Environment variables

Configuration

Debugging

Technical Details

High-level architecture

Data flow

Architecture diagram

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
client		client
config		config
docs		docs
modules		modules
scripts		scripts
server		server
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

rosaOS

Requirements

Local LLM (OpenAI-compatible endpoint)

Groq API

OpenAI API

OpenAI Codex subscription auth

Installation

Usage

Quick Start (All Services)

Manual Start (Individual Services)

Environment variables

Configuration

Debugging

Technical Details

High-level architecture

Data flow

Architecture diagram

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages