Skip to content

Commit d9145a6

Browse files
AmanSwarclaude
andcommitted
docs: update README — add benchmarks, remove architecture, refresh action count
- Add MetalRT benchmark charts (decode speed + RTF comparison) with blog links - Add rcli metalrt and rcli llamacpp to Quick Start and CLI Reference - Update action count from 43 to 38 - Remove benchmark commands (rcli bench, B key shortcut) - Remove architecture section - Highlight M3+ requirement with IMPORTANT callout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 19b2e54 commit d9145a6

1 file changed

Lines changed: 32 additions & 49 deletions

File tree

README.md

Lines changed: 32 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue" alt="MIT"></a>
1010
</p>
1111

12-
**RCLI** is an on-device voice AI for macOS. A complete STT + LLM + TTS pipeline running natively on Apple Silicon — 43 macOS actions via voice, local RAG over your documents, sub-200ms end-to-end latency. No cloud, no API keys.
12+
**RCLI** is an on-device voice AI for macOS. A complete STT + LLM + TTS pipeline running natively on Apple Silicon — 38 macOS actions via voice, local RAG over your documents, sub-200ms end-to-end latency. No cloud, no API keys.
1313

1414
Powered by [MetalRT](#metalrt-gpu-engine), a proprietary GPU inference engine built by [RunAnywhere, Inc.](https://runanywhere.ai) specifically for Apple Silicon.
1515

@@ -29,7 +29,7 @@ Powered by [MetalRT](#metalrt-gpu-engine), a proprietary GPU inference engine bu
2929
</td>
3030
<td width="50%" align="center">
3131
<strong>App Control</strong><br>
32-
<em>Control Spotify, adjust volume — 43 macOS actions by voice.</em><br><br>
32+
<em>Control Spotify, adjust volume — 38 macOS actions by voice.</em><br><br>
3333
<a href="https://youtu.be/eTYwkgNoaKg">
3434
<img src="assets/demos/demo2-spotify-volume.gif" alt="App Control Demo" width="100%">
3535
</a>
@@ -38,8 +38,8 @@ Powered by [MetalRT](#metalrt-gpu-engine), a proprietary GPU inference engine bu
3838
</tr>
3939
<tr>
4040
<td width="50%" align="center">
41-
<strong>Models & Benchmarks</strong><br>
42-
<em>Browse models, hot-swap LLMs, run benchmarks — all from the TUI.</em><br><br>
41+
<strong>Models</strong><br>
42+
<em>Browse models, hot-swap LLMs — all from the TUI.</em><br><br>
4343
<a href="https://youtu.be/HD1aS37zIGE">
4444
<img src="assets/demos/demo3-benchmarks.gif" alt="Models & Benchmarks Demo" width="100%">
4545
</a>
@@ -58,7 +58,8 @@ Powered by [MetalRT](#metalrt-gpu-engine), a proprietary GPU inference engine bu
5858

5959
## Install
6060

61-
> **Requires macOS 13+ on Apple Silicon (M1 or later).**
61+
> [IMPORTANT]
62+
> **Requires macOS 13+ on Apple Silicon. MetalRT engine requires M3 or later.** M1/M2 Macs fall back to llama.cpp automatically.
6263
6364
**One command:**
6465

@@ -73,16 +74,37 @@ brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
7374
brew install rcli
7475
rcli setup
7576
```
76-
7777
## Quick Start
7878

7979
```bash
8080
rcli # interactive TUI (push-to-talk + text)
8181
rcli listen # continuous voice mode
8282
rcli ask "open Safari" # one-shot command
8383
rcli ask "play some jazz on Spotify"
84+
rcli metalrt # MetalRT GPU engine management
85+
rcli llamacpp # llama.cpp engine management
8486
```
8587

88+
89+
## Benchmarks
90+
91+
<p align="center">
92+
<img src="assets/decode-vs-llamacpp.webp" alt="MetalRT vs llama.cpp decode speed" width="700" />
93+
<br>
94+
<em>MetalRT decode throughput vs llama.cpp and Apple MLX on Apple M3 Max</em>
95+
</p>
96+
97+
<p align="center">
98+
<img src="assets/rtf_comparison.webp" alt="STT and TTS real-time factor comparison" width="700" />
99+
<br>
100+
<em>STT and TTS real-time factor — lower is better. MetalRT STT is 714x faster than real-time.</em>
101+
</p>
102+
103+
For More info :
104+
- https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-engine-apple-silicon
105+
- https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-tts-apple-silicon
106+
- https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai-pipeline-apple-silicon
107+
86108
## Features
87109

88110
### Voice Pipeline
@@ -96,7 +118,7 @@ A full STT + LLM + TTS pipeline running on Metal GPU with three concurrent threa
96118
- **Tool Calling** — LLM-native tool call formats (Qwen3, LFM2, etc.)
97119
- **Multi-turn Memory** — Sliding window conversation history with token-budget trimming
98120

99-
### 43 macOS Actions
121+
### 38 macOS Actions
100122

101123
Control your Mac by voice or text. The LLM routes intent to actions executed locally via AppleScript and shell commands.
102124

@@ -108,7 +130,7 @@ Control your Mac by voice or text. The LLM routes intent to actions executed loc
108130
| **System** | `open_app`, `quit_app`, `set_volume`, `toggle_dark_mode`, `screenshot`, `lock_screen` |
109131
| **Web** | `search_web`, `search_youtube`, `open_url`, `open_maps` |
110132

111-
Run `rcli actions` to see all 43, or toggle them on/off in the TUI Actions panel.
133+
Run `rcli actions` to see all 38, or toggle them on/off in the TUI Actions panel.
112134

113135
> **Tip:** If tool calling feels unreliable, press **X** in the TUI to clear the conversation and reset context. With small LLMs, accumulated context can degrade tool-calling accuracy — a fresh context often fixes it.
114136
@@ -130,7 +152,6 @@ A terminal dashboard with push-to-talk, live hardware monitoring, model manageme
130152
| **SPACE** | Push-to-talk |
131153
| **M** | Models — browse, download, hot-swap LLM/STT/TTS |
132154
| **A** | Actions — browse, enable/disable macOS actions |
133-
| **B** | Benchmarks — run STT, LLM, TTS, E2E benchmarks |
134155
| **R** | RAG — ingest documents |
135156
| **X** | Clear conversation and reset context |
136157
| **T** | Toggle tool call trace |
@@ -172,45 +193,6 @@ rcli voices # browse and switch TTS voices
172193
rcli cleanup # remove unused models
173194
```
174195

175-
## Architecture
176-
177-
```
178-
Mic → VAD → STT → [RAG] → LLM → TTS → Speaker
179-
|
180-
Tool Calling → 43 macOS Actions
181-
```
182-
183-
Three dedicated threads in live mode, synchronized via condition variables:
184-
185-
| Thread | Role |
186-
|--------|------|
187-
| STT | Captures audio, runs VAD, detects speech endpoints |
188-
| LLM | Generates tokens, dispatches tool calls |
189-
| TTS | Double-buffered sentence-level synthesis and playback |
190-
191-
**Key design decisions:**
192-
193-
- 64 MB pre-allocated memory pool — zero runtime malloc during inference
194-
- Lock-free ring buffers for zero-copy audio transfer
195-
- System prompt KV caching across queries
196-
- Hardware profiling at startup for optimal config
197-
- Token-budget conversation trimming
198-
- Live model hot-swap without restarting
199-
200-
```
201-
src/
202-
engines/ STT, LLM, TTS, VAD, embedding engine wrappers
203-
pipeline/ Orchestrator, sentence detector, text sanitizer
204-
rag/ Vector index, BM25, hybrid retriever
205-
core/ Types, ring buffer, memory pool, hardware profiler
206-
audio/ CoreAudio mic/speaker I/O
207-
tools/ Tool calling engine with JSON schema definitions
208-
actions/ 43 macOS action implementations
209-
api/ C API (rcli_api.h)
210-
cli/ TUI dashboard (FTXUI), CLI commands
211-
models/ Model registries with on-demand download
212-
```
213-
214196
## Build from Source
215197

216198
CPU-only build using llama.cpp + sherpa-onnx (no MetalRT):
@@ -239,7 +221,8 @@ rcli rag ingest <dir> Index documents for RAG
239221
rcli rag query <text> Query indexed documents
240222
rcli models [llm|stt|tts] Manage AI models
241223
rcli voices Manage TTS voices
242-
rcli bench [--suite ...] Run benchmarks
224+
rcli metalrt MetalRT GPU engine management
225+
rcli llamacpp llama.cpp engine management
243226
rcli setup Download default models
244227
rcli info Show engine and model info
245228

0 commit comments

Comments
 (0)