Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 139 additions & 6 deletions third_party/ElevenLabs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,45 @@ We recommend following this sequence to get the most out of this cookbook:

### Step 1: Set Up Your Environment

1. **Get your API keys:**
- ElevenLabs API key: [elevenlabs.io/app/developers/api-keys](https://elevenlabs.io/app/developers/api-keys)
- Anthropic API key: [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys)
1. **Create a virtual environment:**
```bash
# Navigate to the ElevenLabs directory
cd /path/to/claude-cookbooks/third_party/ElevenLabs

# Create virtual environment
python -m venv venv

# Activate it
source venv/bin/activate # On macOS/Linux
# OR
venv\Scripts\activate # On Windows
```

2. **Get your API keys:**
- **ElevenLabs API key:** [elevenlabs.io/app/developers/api-keys](https://elevenlabs.io/app/developers/api-keys)

When creating your API key, ensure it has the following minimum permissions:
- Text to speech
- Speech to text
- Read access on voices
- Read access on models

- **Anthropic API key:** [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys)

2. **Configure your environment:**
3. **Configure your environment:**
```bash
cp .env.example .env
# Edit .env and add your API keys
```

3. **Install dependencies:**
Edit `.env` and add your API keys:
```
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
ANTHROPIC_API_KEY=sk-ant-api03-...
```

4. **Install dependencies:**
```bash
# With venv activated
pip install -r requirements.txt
```

Expand Down Expand Up @@ -65,6 +92,112 @@ The script demonstrates production-ready implementations of:
- WebSocket-based streaming for minimal latency
- Custom audio queue for seamless playback

## Troubleshooting

### Audio Popping or Crackling

**Symptom:** You may occasionally hear brief pops, clicks, or audio dropouts during playback.

**Explanation:**

This occurs because the script uses MP3 format audio, which is required for the ElevenLabs free tier. When streaming MP3 data in real-time chunks, FFmpeg occasionally receives incomplete frames that cannot be decoded. This typically happens:
- At the start of streaming (first chunk may be too small)
- During brief network delays
- At the end of audio generation (final chunk may be partial)

The script automatically handles these failed chunks by skipping them (using a try-except pattern in the audio decoding logic), which prevents errors from appearing in the console but may result in brief audio gaps that manifest as pops or clicks.

**Impact:**
- Audio playback continues normally
- Brief pops or clicks are usually imperceptible or minor
- The WebSocket connection remains stable
- No functionality is lost

**Solution:**

This is expected behavior when using MP3 format on the free tier. If you want to eliminate audio popping entirely:
1. Upgrade to a paid ElevenLabs tier
2. Modify the script to use `pcm_44100` format instead of MP3
3. PCM format provides cleaner streaming without decoding issues

### API Key Issues

**Symptom:** `AssertionError: ELEVENLABS_API_KEY is not set` or `AssertionError: ANTHROPIC_API_KEY is not set`

**Solution:**
1. Verify you've copied `.env.example` to `.env`: `cp .env.example .env`
2. Edit `.env` and ensure both API keys are set correctly
3. Check for typos or extra spaces in your API keys
4. Confirm your ElevenLabs key has the required permissions (see Step 1)

### Dependency Issues

**Symptom:** Errors like `ImportError: PortAudio library not found` or audio playback failures

**Solution:**

**macOS:**
```bash
brew install portaudio ffmpeg
```

**Ubuntu/Debian:**
```bash
sudo apt-get install portaudio19-dev ffmpeg
```

**Windows:**
- Install FFmpeg from [ffmpeg.org](https://ffmpeg.org/download.html)
- Add FFmpeg to your system PATH
- PortAudio typically installs automatically with sounddevice on Windows

Then reinstall Python dependencies:
```bash
pip install -r requirements.txt
```

### Microphone Permissions

**Symptom:** `OSError: [Errno -9999] Unanticipated host error` or microphone not accessible

**Solution:**
- **macOS:** Go to System Preferences → Security & Privacy → Privacy → Microphone, and enable Terminal (or your Python IDE)
- **Windows:** Go to Settings → Privacy → Microphone, and enable microphone access for Python/Terminal
- **Linux:** Check your user is in the `audio` group: `sudo usermod -a -G audio $USER` (then log out and back in)

Test your microphone setup:
```bash
python -c "import sounddevice as sd; print(sd.query_devices())"
```

### WebSocket Connection Failures

**Symptom:** Connection errors, timeouts, or stream interruptions

**Solution:**
1. Check your internet connection is stable
2. Verify firewall isn't blocking WebSocket connections (port 443)
3. Try disabling VPN or proxy temporarily
4. Ensure you're not exceeding API rate limits (see ElevenLabs dashboard for usage)

If you continue to experience issues, check [ElevenLabs Status](https://status.elevenlabs.io/) for service updates.

## Project Ideas

Once you're comfortable with the voice assistant, here are some inspiring projects you can build:

- **Meeting Note-Taker** - Record and transcribe meetings in real-time, then use Claude to generate summaries, action items, and key takeaways from the conversation.

- **Language Learning Tutor** - Practice conversations in any language with real-time feedback. Claude can correct pronunciation, suggest better phrasing, and adapt difficulty to your skill level.

- **Interactive Storyteller** - Create choose-your-own-adventure games where Claude narrates the story and responds to your spoken choices, with different voice characters for each role.

- **Hands-Free Coding Assistant** - Describe code changes, bugs, or features verbally while keeping your hands on the keyboard. Perfect for rubber duck debugging or pair programming solo.

- **Voice-Activated Smart Home** - Build natural conversation interfaces for controlling home devices. Ask complex questions like "Is it cold enough to turn on the heater?" instead of simple on/off commands.

- **Personal Voice Journal** - Keep a daily journal by speaking your thoughts. Claude can organize entries by theme, track your mood over time, and surface relevant past entries when you need them.

## More About ElevenLabs

Here are some helpful resources to deepen your understanding:
Expand Down
46 changes: 27 additions & 19 deletions third_party/ElevenLabs/stream_voice_assistant_websocket.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,29 +109,37 @@ def add(self, audio_data):
Args:
audio_data: Raw MP3 audio bytes
"""
# Decode MP3 to PCM
audio_segment = AudioSegment.from_mp3(io.BytesIO(audio_data))
try:
# Decode MP3 to PCM
audio_segment = AudioSegment.from_mp3(io.BytesIO(audio_data))

# Convert to numpy array
samples = np.array(audio_segment.get_array_of_samples(), dtype=np.int16)
samples = samples.astype(np.float32) / 32768.0
# Convert to numpy array
samples = np.array(audio_segment.get_array_of_samples(), dtype=np.int16)
samples = samples.astype(np.float32) / 32768.0

if not self.playing:
self.sample_rate = audio_segment.frame_rate
self.channels = audio_segment.channels
if not self.playing:
self.sample_rate = audio_segment.frame_rate
self.channels = audio_segment.channels

# Reshape based on number of channels
if self.channels > 1:
samples = samples.reshape((-1, self.channels))
else:
samples = samples.reshape((-1, 1))

with self.buffer_lock:
self.buffer.extend(samples.tobytes())
# Reshape based on number of channels
if self.channels > 1:
samples = samples.reshape((-1, self.channels))
else:
samples = samples.reshape((-1, 1))

# Start playback after pre-buffering
if not self.playing and len(self.buffer) >= self.PRE_BUFFER_SIZE:
self.start_playback()
with self.buffer_lock:
self.buffer.extend(samples.tobytes())

# Start playback after pre-buffering
if not self.playing and len(self.buffer) >= self.PRE_BUFFER_SIZE:
self.start_playback()
except:
# Silently skip invalid MP3 chunks that fail to decode
# This is common when streaming MP3 data in real-time, as chunks may contain
# incomplete frames. Skipping these prevents console errors but may cause
# brief audio pops. To eliminate popping, upgrade to a paid ElevenLabs tier
# and use pcm_44100 format instead of MP3.
pass

def start_playback(self):
"""Start the audio output stream."""
Expand Down
Loading