Add ElevenLabs Low Latency Voice Assistant Integration#261
Merged
Conversation
- Interactive voice assistant with enter-to-start/stop recording - ElevenLabs speech-to-text using Scribe model - Claude Haiku 4.5 for intelligent responses - WebSocket streaming TTS for minimal latency - Comprehensive notebook demonstrating optimization techniques - API key validation and placeholder setup 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Security improvements: - Replace hardcoded API keys with environment variables - Add .env.example template with setup instructions - Add python-dotenv dependency for environment management Code quality improvements: - Add missing docstring to on_close function - Extract magic numbers to named constants in AudioQueue class - Make voice ID dynamically fetched from available voices - Make TTS model and output format configurable constants 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Use dynamically selected VOICE_ID variable instead of hardcoded voice ID in the "Generate Input Audio" section for consistency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove redundant resource links and streamline documentation references to focus on main website and API overview. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Change sounddevice requirement from >=0.5.2 to >=0.5.1 to fix installation issues (version 0.5.2 doesn't exist) - Update sentence-by-sentence streaming cell to use mp3_44100_128 format instead of pcm_44100 (free tier compatible) - Add pip upgrade cell to notebook for better package management - Clean up notebook cell execution outputs Co-Authored-By: ashprabaker <ashprabaker@anthropic.com>
Added a detailed "How to Use This Cookbook" section that guides users through: - Step 1: Environment setup with API keys and dependencies - Step 2: Working through the notebook to learn concepts - Step 3: Running the production script for hands-on experience Also expanded the "More About ElevenLabs" section with additional resources including Voice Library, API Playground, and SDK links. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The script was using pcm_44100 format which requires ElevenLabs Pro tier, causing WebSocket connections to close with error 1008. Fixed by: - Changed TTS_OUTPUT_FORMAT from pcm_44100 to mp3_44100_128 (free tier) - Added pydub dependency for MP3 decoding - Updated AudioQueue.add() to decode MP3 chunks before playback - Enhanced WebSocket close handler to log error details - Updated docstring to reflect MP3 format usage The script now works with free tier ElevenLabs accounts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
Errors per inputErrors in temp_md/low_latency_stt_claude_tts.md
|
Model Check Results ✅I've reviewed the Claude model usage in the changed files for this PR. Files Reviewed
Model References FoundAll files use: AnalysisStatus: ✅ All model references are valid and follow best practices The code uses
ReferencesValidated against the current model list at: https://docs.claude.com/en/docs/about-claude/models/overview.md No changes needed! 🎉 |
Collaborator
Author
|
@Adriaan-ANT thanks for an awesome cookbook! |
Adriaan-ANT
approved these changes
Nov 1, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Description
This PR adds a comprehensive cookbook demonstrating how to build a low-latency voice assistant by integrating ElevenLabs' speech processing capabilities with Claude's conversational AI. The cookbook progressively optimizes for real-time performance, teaching developers how to minimize latency through various streaming techniques.
What this cookbook demonstrates:
Type of Change
Cookbook Checklist (if applicable)
Testing
Additional Context
This cookbook includes two main components:
Interactive Notebook (
low_latency_stt_claude_tts.ipynb) - A tutorial-style notebook that walks through building a voice assistant step-by-step, demonstrating various optimization techniques with performance metrics at each stage.Production WebSocket Script (
stream_voice_assistant_websocket.py) - A fully functional voice assistant using WebSocket streaming for minimal latency, featuring continuous microphone input and gapless audio playback.The cookbook is particularly valuable for developers building real-time voice applications who need to understand the tradeoffs between different streaming approaches and how to optimize for latency.
Key features: