Complete guide to text-to-speech and voice features in Ralph CLI
- Quick Start
- Auto-Speak Hook
- TTS Engines
- Configuration
- Advanced Features
- Troubleshooting
- Technical Details
# 1. Install Ralph to your project
ralph install
# 2. Follow auto-speak setup prompts
# This will:
# - Check dependencies (Ollama, jq, TTS provider)
# - Create voice config
# - Provide hook configuration instructions
# 3. Enable auto-speak
ralph speak --auto-on
# 4. Use Claude Code normally
# Every response will be spoken automatically!Ralph provides three distinct voice/TTS implementations:
- Auto-Speak Hook (Recommended) - Automatic TTS for Claude Code responses
- Terminal Voice Command (
ralph voice) - Full hands-free CLI with STT - Standalone Speak Command (
ralph speak) - Manual TTS invocation
Note: Browser STT was removed in January 2026 due to redundancy.
Automatically speaks Claude Code responses after every interaction using:
- Local Qwen 2.5:1.5b LLM for intelligent 1-2 sentence summaries
- Context-aware summarization - considers your original question
- Non-blocking execution - doesn't slow down Claude Code
Claude Code completes response
→ Stop hook triggers (.agents/ralph/auto-speak-hook.sh)
→ Extract transcript (user question + assistant response)
→ OutputFilter (remove code blocks, markdown)
→ Qwen LLM summarization (context-aware)
→ ralph speak (non-blocking TTS)
→ Audio output
Example:
- You ask: "How many tests passed?"
- Claude's response: [500 lines of test output]
- You hear: "All 47 tests passed"
# Enable auto-speak mode
ralph speak --auto-on
# Disable auto-speak mode
ralph speak --auto-off
# Check status
ralph speak --auto-statusAuto-speak supports multiple summarization modes:
| Mode | Chars | Tokens | Words | Use Case |
|---|---|---|---|---|
short |
150 | 150 | ~30 | Simple answers, confirmations (default) |
medium |
800 | 400 | ~100 | Explanations, multi-step changes |
full |
1500 | 600 | ~200 | PRDs, plans, complex summaries |
adaptive |
varies | varies | varies | Auto-detect based on response complexity |
Set mode:
# Adaptive mode (recommended for varied responses)
ralph speak --auto-mode=adaptive
# Full mode (always use long summaries)
ralph speak --auto-mode=full
# Short mode (default)
ralph speak --auto-mode=short
# Show current mode
ralph speak --auto-modeAdaptive mode detection:
- User Stories: 3+
US-XXXpatterns →fullmode - Multi-week/phase plans: 2+ week/phase references →
fullmode - Response length:
- Under 500 chars →
short - 500-2000 chars →
medium - Over 2000 chars →
full
- Under 500 chars →
- List density: 5+ bullet points → upgrade to
medium/full
Auto-speak is intentionally short (~20-30 words). For more detail:
# Medium summary (~100 words) - default
ralph recap
# Detailed summary (~200 words)
ralph recap --full
# Short summary (~30 words, same as auto-speak)
ralph recap --short
# Preview without speaking
ralph recap --previewWhen to use recap:
- After complex responses with multiple steps
- When you missed details in the auto-speak summary
- When you want key decisions, caveats, or next steps
When running Ralph in headless mode (ralph build), auto-speak behavior is optimized:
What works:
- Initial acknowledgment (Claude's first response)
- Progress updates (periodic "Still working..." phrases)
- Final summarization (completion summary)
Configuration:
{
"skipSessionStart": {
"headlessAlwaysSpeak": true // Bypasses session-start detection
},
"progress": {
"initialDelaySeconds": 5, // First progress phrase delay
"intervalSeconds": 15 // Interval between phrases
}
}Force headless mode:
export RALPH_HEADLESS=true
ralph build 5Ralph supports multiple TTS providers with automatic fallback.
Default provider on macOS - uses built-in say command.
# Set as default
ralph speak --set-tts-engine macos
# Test voices
say -v '?' # List available voices
# Speak with specific voice
ralph speak "Hello" --voice SamanthaHigh-quality Vietnamese text-to-speech with voice cloning capability.
Installation:
# Run setup script (installs to ~/.agents/ralph/vieneu/)
.agents/ralph/setup/vieneu-setup.sh
# Configure
ralph speak --set-tts-engine vieneu
ralph speak --set-vieneu-voice VinhAvailable preset voices:
| Voice | Description |
|---|---|
| Binh | Male voice |
| Tuyen | Female voice |
| Vinh | Male voice |
| Doan | Male voice |
| Ly | Female voice |
| Ngoc | Female voice |
Usage:
# Speak Vietnamese text
ralph speak "Xin chào thế giới"
# One-time use without changing default
ralph speak --engine vieneu "Xin chào"
# Switch back to macOS TTS
ralph speak --set-tts-engine macosVoice cloning (advanced):
# Clone custom voice from audio sample
source ~/.agents/ralph/vieneu/venv/bin/activate
python ~/.agents/ralph/vieneu/clone-voice.py your_audio.wav my_voice
# Use cloned voice
ralph speak --set-vieneu-voice my_voiceRequirements for voice cloning:
- 3-5 second audio sample (WAV format)
- 16kHz or 22kHz sample rate recommended
- Clean speech, minimal background noise
Ralph can automatically detect Vietnamese text and route it to VieNeu-TTS.
How it works:
- Text is analyzed with franc-min language detector
- If Vietnamese detected (requires 20+ characters) and VieNeu installed → routes to VieNeu-TTS
- Otherwise → uses configured default TTS engine
Enable/disable:
# Check status
ralph speak --multilingual-status
# Enable auto-detection (default)
ralph speak --multilingual-on
# Disable auto-detection
ralph speak --multilingual-offUsage examples:
# English text → uses default engine (macOS/Piper)
ralph speak "Hello world, this is a test"
# Vietnamese text → auto-detects and routes to VieNeu
ralph speak "Xin chào thế giới, đây là một bài kiểm tra"
# Force specific engine (bypasses auto-detection)
ralph speak --engine vieneu "Hello"
ralph speak --engine macos "Xin chào"Detection requirements:
- Minimum text length: 20 characters for reliable detection
- Short text defaults to English (prevents false positives)
- VieNeu must be installed for Vietnamese routing
High-quality local neural TTS for Linux. (Installation instructions available in project setup)
All voice settings are stored in .ralph/voice-config.json.
{
"ttsEngine": "macos",
"autoSpeak": {
"enabled": true,
"mode": "adaptive"
},
"acknowledgment": {
"enabled": true,
"immediate": false,
"immediatePhrase": "Got it"
},
"progress": {
"enabled": true,
"intervalSeconds": 15,
"initialDelaySeconds": 5
},
"skipSessionStart": {
"enabled": true,
"minUserMessages": 1,
"headlessAlwaysSpeak": true
},
"multilingual": {
"enabled": true,
"autoDetect": true
},
"vieneuVoice": "Vinh"
}autoSpeak:
enabled: Whether auto-speak is active (true/false)mode: Summarization mode ("short","medium","full","adaptive")
acknowledgment:
enabled: Whether initial acknowledgment voice is enabledimmediate: Speak quick acknowledgment on prompt submit (falsedefault)immediatePhrase: The phrase to speak immediately (default:"Got it")
progress:
enabled: Whether periodic progress phrases are enabledintervalSeconds: Interval between progress phrases (default: 15)initialDelaySeconds: Delay before first progress phrase (default: 5)
skipSessionStart:
enabled: Skip voice on first prompt of new sessionminUserMessages: Minimum user messages before voice enabled (default: 1)headlessAlwaysSpeak: In headless/automation mode, always speak (default:true)
multilingual:
enabled: Master switch for multilingual featuresautoDetect: Whether to auto-detect language and route accordingly
usageVoices: (Optional - Advanced)
- Configure different voices and TTS engines for different usage types and languages
- Structure:
.usageVoices.{lang}.{usageType}.{voice|engine} - Usage types:
summary,acknowledgment,progress - Languages:
en,vi,zh - See Usage-Specific Voice Configuration for detailed guide
Ralph supports configuring different voices for different contexts. For example, you can use Piper Ryan for summaries but macOS Samantha for acknowledgments.
Example:
{
"ttsEngine": "piper",
"voice": "ryan",
"usageVoices": {
"en": {
"summary": { "voice": "ryan", "engine": "piper" },
"acknowledgment": { "voice": "alex", "engine": "macos" },
"progress": { "voice": "victoria", "engine": "macos" }
}
}
}See full documentation: usage-voices-config.md
Auto-speak requires hook configuration in ~/.claude/settings.local.json:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "/full/path/to/.agents/ralph/auto-speak-hook.sh"
}
]
}
]
}
}Automatic setup:
# Automated hook installation (uses jq)
.agents/ralph/setup/post-install.sh
# Manual setup: Copy hook snippet from setup guidance
ralph install # Shows hook configuration instructionsVoice commands for window management:
"snap window left" → Tiles active window to left half
"snap window right" → Tiles right
"tile left/right/top/bottom"
"center window" → Centers the active window
"move to next display" → Moves to next monitor
"open google.com" → Opens URL in default browser
"new tab" → Opens new browser tab
"close tab" → Closes current tab
"refresh page" → Reloads current page
"go back/forward" → Browser navigation
"copy that" → Copies selected text (Cmd+C)
"paste" → Pastes from clipboard (Cmd+V)
"select all" → Selects all text (Cmd+A)
"what's on the clipboard" → Reads clipboard contents aloud
"play music" → Plays Spotify (defaults to Spotify)
"pause" → Pauses playback
"next track" → Skip to next song
"previous song" → Previous track
Specify different app:
"play music in apple music" → Uses Apple Music instead
"open documents" → Opens Documents folder
"open desktop" → Opens Desktop
"open downloads" → Opens Downloads
"new finder window" → Creates new Finder window
"command palette" → Opens command palette (Cmd+Shift+P)
"go to line 42" → Jumps to specific line
"open file" → Opens file picker
"clear terminal" → Clears the terminal (Cmd+K)
"delete this line" → Deletes current line (Ctrl+U)
"delete word" → Deletes last word (Opt+Delete)
Note: Advanced features require Terminal STT (ralph voice) or Electron app. See Voice Features Guide for setup.
Check 1: Is auto-speak enabled?
ralph speak --auto-statusCheck 2: Is Ollama running?
curl http://localhost:11434/api/tags
ollama list | grep qwen2.5:1.5bCheck 3: Test TTS manually
echo "test" | ralph speakCheck 4: Check logs
tail -f .ralph/auto-speak-hook.logCheck 1: Verify hook configuration
cat ~/.claude/settings.local.json | grep -A5 "hooks"Check 2: Ensure script is executable
chmod +x .agents/ralph/auto-speak-hook.shCheck 3: Check hook logs
tail -20 .ralph/auto-speak-hook.logCheck 1: Ollama service
ollama listCheck 2: Pull Qwen model
ollama pull qwen2.5:1.5bCheck 3: Test Ollama directly
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5:1.5b",
"prompt": "Summarize: Hello world",
"stream": false
}'Fallback: If Qwen fails, system uses regex-based cleanup (no LLM summarization).
Check 1: Verify headless mode detection
tail -20 .ralph/session-detect.log | grep -i headlessShould show: Headless mode detected, always speak enabled - allowing voice
Check 2: Verify headlessAlwaysSpeak setting
jq '.skipSessionStart.headlessAlwaysSpeak' .ralph/voice-config.jsonShould return true
Check 3: Force headless mode
export RALPH_HEADLESS=true
ralph build 5Check 4: Check progress timer logs
tail -30 .ralph/progress-timer.logCheck 5: Verify TTS manager
tail -30 .ralph/tts-manager.logThe system uses two-stage filtering:
- OutputFilter - Removes code blocks, tool calls, markdown, URLs
- TTSSummarizer (Qwen) - Generates natural 1-2 sentence summary
If you hear code being spoken:
- Check
.ralph/auto-speak-hook.logfor summary preview - Verify Qwen model is working (test with curl command above)
- Adjust
maxTokensin.agents/ralph/summarize-for-tts.mjs
If ralph recap says "No transcript found":
- Ensure you're in a directory where Claude Code has been used
- Check Claude projects exist:
ls ~/.claude/projects/ - Transcripts are stored per-project with encoded paths
The TTS summarization uses carefully engineered prompts to eliminate:
- Symbols - File paths, technical syntax (
~,/,., etc.) - Repetition - Duplicate points with different wording
- Technical jargon - API, CLI, TTS abbreviations
- File references -
voice-config.json,.agents/ralph/
Prompt structure:
Your task: Create a clear spoken summary answering what the user asked.
FORMAT ([style], [words]):
- Use natural conversational speech
- For lists: "First, [action]. Second, [action]. Third, [action]."
- State ONLY the main point once - do not repeat or rephrase
STRICT RULES - NEVER include:
- File names or paths
- File extensions (.sh, .js, .py, .md)
- Technical references ("the file", "the script")
- Symbols: ~ / \ | @ # $ % ^ & * ` < > { } [ ] = + _
- Abbreviations (TTS, API, CLI) - say full words
WHAT TO SAY:
- Actions completed: "Added feature X", "Fixed the login bug"
- Key outcomes: "Users can now...", "The system will..."
- Next steps: "You should...", "Consider..."
Stage 1: OutputFilter (.agents/ralph/output-filter.mjs)
- Remove code blocks (
...) - Remove tool calls (<function_calls>...