Skip to content

castorini/rosaos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rosaOS

ROS Agentic Operating System: Control robots with LLMs through MCP with Reachy Mini as the interface.

Requirements

Using Reachy Mini Lite for easy media stream.

The client supports a local OpenAI-compatible LLM (e.g. vLLM), the OpenAI API, the Groq API, the Anthropic API, or OpenAI Codex subscription auth. Choose one via CLI or environment variables.

Local LLM (OpenAI-compatible endpoint)

For local inference, run an OpenAI-compatible server (e.g. vLLM) and point the client at it:

# On the machine with the GPU (or with port forwarding):
vllm serve openai/gpt-oss-120b --tool-call-parser openai --enable-auto-tool-choice --port 6000

To verify the endpoint: curl http://localhost:6000/v1/models (or use https if your server uses TLS).

Groq API

Groq provides inference with a free tier (with limits). You must set an API key to use Groq:

  • macOS/Linux: export GROQ_API_KEY=your_key
  • Windows (PowerShell): $env:GROQ_API_KEY="your_key"

Get a key at console.groq.com/keys.

Supported Groq tool-use models include: llama-3.1-8b-instant, llama-3.3-70b-versatile, openai/gpt-oss-120b, openai/gpt-oss-20b, moonshotai/kimi-k2-instruct-0905, qwen/qwen3-32b, and meta-llama/llama-4-scout-17b-16e-instruct. Default is openai/gpt-oss-120b.

Required for image analysis and better TTS experience.

OpenAI API

OpenAI is the default hosted agent model provider:

  • macOS/Linux: export OPENAI_API_KEY=your_key
  • Windows (PowerShell): $env:OPENAI_API_KEY="your_key"

OpenAI API usage is billed through the API platform, separately from ChatGPT Free/Plus/Pro subscriptions. A ChatGPT subscription does not provide API quota for rosaOS.

OpenAI Codex subscription auth

rosaOS can also use the sibling ../openai-subscription-wrapper package to talk to the ChatGPT Codex backend with ChatGPT Plus/Pro subscription OAuth credentials:

scripts/reachy_mini_env/bin/openai-codex-client login
# Or, if you already logged in with Pi:
scripts/reachy_mini_env/bin/openai-codex-client import-pi

Then start the client with --codex or --provider openai-codex.

Installation

Developed with Python 3.12.

Cloning this repo requires the use of the recursive flag to download all submodules (ros-mcp-server). Further instructions to setup ros-mcp-server are in the rosaOS setup file found in the submodule directory.

git clone https://github.com/lilyjge/reachy-mcp.git --recursive
cd reachy-mcp
uv venv --python 3.12 scripts/reachy_mini_env

# Install dependencies
uv pip install -p scripts/reachy_mini_env/bin/python -r requirements.txt

For fresh macOS + Reachy Mini Lite setup details, including camera permissions and voice/STT keys, see docs/macos-reachy-mini-setup.md.

Usage

Quick Start (All Services)

Start all services at once:

./scripts/start_all.sh

This will start:

  • Reachy Mini daemon (port 8000)
  • MCP server (port 5001)
  • Process manager MCP server (port 7001)
  • RAG agent (port 8765)

Logs are saved to the scripts/logs/ directory. To stop all services:

./scripts/stop_all.sh

Manual Start (Individual Services)

Alternatively, start each service manually:

Start Reachy Mini's robot daemon server on the default port 8000:

scripts/reachy_mini_env/bin/reachy-mini-daemon

Start the Reachy Mini's MCP server on port 5001. For TTS, we support ElevenLabs API, Groq API, or the local pyttsx3 package.

scripts/reachy_mini_env/bin/python -m server
scripts/reachy_mini_env/bin/python -m server --tts-elevenlabs --tts-voice M4zkunnpRihDKTNF0D7f # Use ElevenLabs

Start the operating system's client (default port 8765). To use your own OpenAI compatible endpoint for the agents, start the client with --local and optionally --endpoint (port, default 6000). To use OpenAI, Groq, Anthropic, or OpenAI Codex subscription auth, start the client with --provider or a shortcut flag and optionally specify a model with --model.

scripts/reachy_mini_env/bin/python -m client                    # OpenAI (requires OPENAI_API_KEY)
scripts/reachy_mini_env/bin/python -m client --local             # Local LLM at port 6000
scripts/reachy_mini_env/bin/python -m client --provider groq --model moonshotai/kimi-k2-instruct-0905 # Groq
scripts/reachy_mini_env/bin/python -m client --anthropic --model claude-sonnet-4-6 # Anthropic API
scripts/reachy_mini_env/bin/python -m client --openai --model gpt-5.2 # OpenAI API
scripts/reachy_mini_env/bin/python -m client --codex --model gpt-5.5 # ChatGPT Codex subscription auth

Now you can talk to the Reachy Mini directly.

To chat via CLI instead of the robot:

scripts/reachy_mini_env/bin/python -m client.chat.client_cli
# Optional: --base-url http://localhost:8765  (or set RAG_AGENT_PORT)

Or, when the agent is running, visit http://localhost:8765/ in your browser (or the port you set with --port / RAG_AGENT_PORT).

Environment variables

All ports and the LLM source can be overridden by environment variables so scripts and deployed setups don't rely on CLI flags.

Variable Default Description
OPENAI_API_KEY Required by default, or when using OpenAI (--openai, --provider openai, or LLM_PROVIDER=openai). OpenAI API key from https://platform.openai.com/api-keys; ChatGPT subscriptions do not count as API billing.
OPENAI_MODEL gpt-5.2 OpenAI model name when LLM_PROVIDER=openai (overridden by --model when using --openai or --provider openai).
GROQ_API_KEY Required when using Groq. Groq API key from console.groq.com/keys.
LOCAL_LLM Set to 1 or true to use local OpenAI-compatible endpoint.
LOCAL_LLM_PORT 6000 Port of local LLM when LOCAL_LLM is set.
LOCAL_LLM_ENDPOINT Full base URL (e.g. https://localhost:6000/v1) overrides port.
GROQ_MODEL openai/gpt-oss-120b Groq model when not using local LLM.
LLM_PROVIDER openai Remote LLM provider when not using local LLM. One of openai, groq, anthropic, or openai-codex. Usually set via CLI (python -m client flags).
ANTHROPIC_API_KEY Required when using Anthropic (--anthropic or LLM_PROVIDER=anthropic). Anthropic API key from https://console.anthropic.com.
ANTHROPIC_MODEL claude-sonnet-4-6 Anthropic model name when LLM_PROVIDER=anthropic (overridden by --model when using --anthropic).
OPENAI_CODEX_MODEL gpt-5.5 OpenAI Codex model name when LLM_PROVIDER=openai-codex (overridden by --model when using --codex or --provider openai-codex).
OPENAI_CODEX_AUTH_FILE ~/.openai-codex-client/auth.json Optional auth file for the sibling OpenAI Codex client adapter. If absent, the adapter can fall back to Pi's ~/.pi/agent/auth.json.
OPENAI_CODEX_ORIGINATOR rosaos Originator header for OpenAI Codex subscription requests.
RAG_AGENT_PORT 8765 Client app (kernel + chat) port.
RAG_AGENT_URL Full base URL for chat CLI (e.g. http://localhost:8765).
PROCESS_SERVER_PORT 7001 Process manager MCP server port.
PROCESS_SERVER_URL Full process server URL (e.g. http://localhost:7001/mcp).
REACHY_MCP_PORT 5001 Reachy Mini MCP server port (when starting python -m server).
STT_CALLBACK_URL from RAG_AGENT_PORT Where the server POSTs transcribed speech (default http://localhost:{RAG_AGENT_PORT}/stt).
STT_WAKE_WORD hello Wake word used when eye contact is absent. After the wake word, Reachy turns toward the detected audio direction and then listens for the command.
STT_WAKE_WORD_ALIASES hello,helo,hallo,hullo Comma-separated wake-word transcription variants accepted while listening for activation.
STT_SILENCE_THRESHOLD_SEC 0.55 Silent audio duration before an utterance is considered complete. Increase if speech gets cut off; decrease for snappier turn-taking.
STT_VAD_CHUNK_DURATION 0.12 Audio chunk size used by voice activity detection. Smaller values respond sooner with slightly more CPU overhead.
STT_MIN_SPEECH_DURATION_SEC 0.35 Minimum accepted command speech duration, used to ignore noise.
STT_MIN_WAKE_SPEECH_DURATION_SEC 0.25 Minimum accepted wake-check speech duration, kept lower so short wake words can activate.
STT_MIN_SPEECH_CHUNKS 3 Minimum number of speech-positive chunks before command audio can be transcribed.
STT_MIN_WAKE_SPEECH_CHUNKS 2 Minimum number of speech-positive chunks before wake-check audio can be transcribed.
STT_PRE_SPEECH_BUFFER_SEC 0.6 Audio kept before speech detection starts, to preserve the first syllables of an utterance.
STT_SIMPLE_RMS_THRESHOLD 0.035 Energy threshold used as a fallback/safety net when neural VAD misses short speech. Raise this if background noise is detected as speech.
STT_MIN_TRANSCRIBE_RMS 0.01 Minimum full-utterance RMS before a transcript is allowed to be posted.
EYE_CONTACT_POLL_INTERVAL 0.16 Seconds between eye-contact camera checks while waiting for activation.
AGENT_RETRIES 3 Pydantic-AI retry count for kernel and worker agents. Higher values can hide transient failures but feel slower when a provider is unhealthy.
TTS_ENGINE groq TTS backend: groq or elevenlabs.
TTS_VOICE autumn Preferred TTS voice name / ID (used for Groq Orpheus and ElevenLabs).
ELEVENLABS_API_KEY ElevenLabs API key when using TTS_ENGINE=elevenlabs or --tts-elevenlabs.
ELEVENLABS_VOICE_ID from TTS_VOICE Optional explicit ElevenLabs voice ID.
ELEVENLABS_MODEL eleven_flash_v2_5 ElevenLabs TTS model ID.
ROSAOS_CONFIG_DIR config Directory for drivers.json, kernel.txt, process.txt, and prompts/.

Configuration

Agent system prompts and robot config live under the config directory (or ROSAOS_CONFIG_DIR):

  • config/kernel.txt — System prompt for the kernel agent (one placeholder: {robot_list}).
  • config/process.txt — System prompt template for process agents (placeholders: {robot_instructions}, {kernel_instructions}).
  • config/drivers.json — MCP server names, URLs, and descriptions. If you change REACHY_MCP_PORT, update the reachy-mini URL in this file to match (e.g. http://localhost:5001/mcp).
  • config/prompts/<server_name>.txt — Per-robot instructions for the LLM (e.g. reachy-mini.txt).

Edit these files to customize behavior without changing code.

Debugging

Debug MCP servers using the MCP Inspector Tool (requires Node installation):

npx @modelcontextprotocol/inspector

Technical Details

rosaOS is structured like a minimal operating system: a kernel schedules and supervises processes (LLM workers) that perform tasks, while a device layer (MCP server) exposes hardware (Reachy Mini) as callable tools. The LLM is the “CPU” that executes kernel and process logic.

High-level architecture

Layer Component OS analogy Role
User / shell Reachy Mini, or to chat directly, browser UI or CLI Shell / terminal Sends prompts and receives responses; polls for event-driven updates.
Kernel Client event worker + Pydantic-AI “kernel” agent OS kernel / scheduler Single thread consumes an event queue (speech, worker callbacks, chat messages). Decides when to launch processes (workers) via the process server; does not drive the robot directly.
Process manager Internal MCP server for kernel Syscall interface / fork Exposes process management tools to kernel. Spawns worker subprocesses (python -m client.worker) so each agent has its own event loop and does not block the kernel.
Processes Agent worker subprocesses User processes Each runs a Pydantic-AI agent with MCP robot tools. Executes one task from a system prompt generated by kernel, then POSTs a completion callback to the client /event.
Device layer Reachy MCP server, optionally easily connect additional robot MCP servers Drivers / HAL FastMCP server with lifespan owning the ReachyMini connection. Registers tools: goto_target, take_picture, speak, play_emotion, describe_image, etc. Runs a background STT loop: mic → VAD → transcribe → POST to client /stt, like a system process for the UI.
Hardware Reachy Mini + other robot Physical devices Robot daemon and hardware; MCP server talks to Reachy via reachy_mini SDK and other robots through ROS.

Data flow

  1. User input → Speech via Reachy mic is transcribed by the server’s STT loop and POSTed to client /stt; or text is sent via CLI or the UI.
  2. Kernel receives an event ([User said] ... or [Worker callback] ...). It runs the kernel agent (LLM) with tools from the process server, typically calling launch_process(system_prompt) to start a worker.
  3. Process manager starts a worker subprocess with WORKER_ID, WORKER_SYSTEM_PROMPT, and CALLBACK_URL (client /event).
  4. Worker runs the process agent (LLM) with tools from the Reachy MCP server: move, see, speak, etc. When done, it POSTs { worker_id, message, done } to /event.
  5. Kernel gets a [Worker callback] event and can respond to the user (e.g. via another launched process that uses speak) or launch further work. Primary communication to the user is through Reachy speaking; outgoing messages are also pushed to /updates for the UI/CLI to poll.

So: kernel = one agent that only launches processes; processes = short-lived agents that use the robot and report back via callbacks.

Architecture diagram

See docs/architecture.md for a diagram (Mermaid) of the same layout.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors