Self-hosted TTS API using Chatterbox (MIT license, beats ElevenLabs in blind tests at 63.75% preference).
- Open
chatterbox-colab.ipynbin Google Colab - Runtime → Change runtime type → T4 GPU
- Run Cell 1 (install deps)
- Run Cell 2 (start server + ngrok tunnel)
- Copy the ngrok URL, paste in Discord for Zee
server.py— FastAPI TTS server (OpenAI-compatible endpoints)client.py— VPS-side client to call the serverchatterbox-colab.ipynb— One-click Colab notebook
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check + GPU info |
/voices |
GET | List available voices |
/v1/audio/speech |
POST | OpenAI-compatible TTS |
/speak |
POST | Extended control (exaggeration, cfg_weight) |
# Set the server URL (from Colab ngrok output)
export CHATTERBOX_URL="https://xxxx.ngrok.io"
# Generate speech
python /root/chatterbox-tts/client.py "Hey Darko, it's Zee."
# Check health
python /root/chatterbox-tts/client.py --health- Python 3.10+
- CUDA GPU with 4GB+ VRAM (T4 works for turbo variant)
- Packages:
chatterbox-tts,fastapi,uvicorn,torch,torchaudio
- Chatterbox-Turbo: 350M params, 4-8GB VRAM, ~6x realtime on T4
- Chatterbox-Full: 500M params, 8-16GB VRAM, best emotion control
- License: MIT
- Source: https://github.com/resembleai/chatterbox