Integrate Chatterbox TTS from Resemble AI with Linux Speech Dispatcher using Podman Quadlets.
Chatterbox is a state-of-the-art open-source TTS model that outperforms ElevenLabs in side-by-side evaluations. It supports voice cloning from just 10 seconds of audio.
- High-quality neural TTS powered by Chatterbox (350M-500M parameters)
- Models baked into container image
- Voice cloning support (provide a 10-second reference clip)
- Container-based - no pip install required on host
- Native systemd integration via Podman Quadlets
- GPU acceleration with NVIDIA CUDA (optional)
- CPU-only mode for systems without GPU
# Clone the repository
git clone https://github.com/rsturla/chatterbox-spd.git
cd chatterbox-spd
# Install (pulls container from GHCR)
./install.sh --user
# Enable socket activation (container starts on first use)
systemctl --user enable --now chatterbox-tts.socket
# For GPU support, also enable the CUDA socket
systemctl --user enable --now chatterbox-tts-cuda.socket
# Test it (client auto-detects GPU and picks the right backend)
spd-say -o chatterbox "Hello from Chatterbox!"# Requires HuggingFace token for model download during build
# Get one at: https://huggingface.co/settings/tokens
export HF_TOKEN="your_token_here"
# Build and install
./install.sh --user --build
# For GPU support
./install.sh --user --build --gpu| Requirement | Version | Notes |
|---|---|---|
| Podman | 4.4+ | For Quadlet support |
| speech-dispatcher | any | For TTS integration |
| Audio player | any | aplay, paplay, pw-play, or mpv |
| NVIDIA GPU | optional | For GPU acceleration |
| nvidia-container-toolkit | optional | Provides CUDA to container |
The container image does NOT bundle NVIDIA CUDA libraries. This avoids CUDA redistribution licensing issues. CUDA is injected at runtime from your host by nvidia-container-toolkit.
# Install nvidia-container-toolkit
sudo dnf install nvidia-container-toolkit # Fedora/RHEL
sudo apt install nvidia-container-toolkit # Debian/Ubuntu
# Generate CDI spec (required on newer systems)
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml# Via speech-dispatcher
spd-say -o chatterbox "Hello world"
# Via client directly
echo "Hello world" | chatterbox-tts-client
chatterbox-tts-client --text "Hello world"
# Save to file instead of playing
chatterbox-tts-client --text "Hello world" --output hello.wav- Add a 10-second WAV reference clip:
mkdir -p ~/.cache/chatterbox-spd/voices
cp my_voice.wav ~/.cache/chatterbox-spd/voices/alice.wav- Update
/etc/speech-dispatcher/modules/chatterbox.conf:
AddVoice "en" "FEMALE1" "alice"
- Use the voice:
spd-say -o chatterbox -y FEMALE1 "Hello, I sound like Alice now"The Turbo model supports non-speech sounds:
spd-say -o chatterbox "[laugh] That's hilarious!"
spd-say -o chatterbox "I'm not sure... [sigh] let me think."Pre-built images are available from GitHub Container Registry:
# CPU version (default)
podman pull ghcr.io/rsturla/chatterbox-spd:latest
# CUDA/GPU version (for NVIDIA GPUs)
podman pull ghcr.io/rsturla/chatterbox-spd:cuda| Tag | Description |
|---|---|
latest |
Latest commit (CPU) |
cpu |
Latest CPU version |
cuda |
Latest CUDA/GPU version |
<sha> |
Specific commit (CPU) |
<sha>-cpu |
Specific commit CPU |
<sha>-cuda |
Specific commit CUDA |
The service uses socket activation. The container starts automatically when a client connects and exits after 20 minutes of inactivity. The client auto-detects GPU and chooses the appropriate backend.
# Check socket status
systemctl --user status chatterbox-tts.socket # CPU
systemctl --user status chatterbox-tts-cuda.socket # GPU
# Check container status (only running if active)
systemctl --user status chatterbox-tts # CPU
systemctl --user status chatterbox-tts-cuda # GPU
# View logs
journalctl --user -u chatterbox-tts -f # CPU
journalctl --user -u chatterbox-tts-cuda -f # GPU
# Stop the container (socket stays active)
systemctl --user stop chatterbox-tts
systemctl --user stop chatterbox-tts-cuda
# Disable socket activation entirely
systemctl --user disable --now chatterbox-tts.socket
systemctl --user disable --now chatterbox-tts-cuda.socketClient (runs on host):
| Variable | Default | Description |
|---|---|---|
CHATTERBOX_SOCKET |
auto-detect | Socket path (overrides auto-detection) |
CHATTERBOX_PREFER_GPU |
auto | Set to 1 to prefer GPU, 0 to prefer CPU |
CHATTERBOX_VOICE |
default |
Voice name |
CHATTERBOX_EXAGGERATION |
0.5 |
Emotion exaggeration (0.0-1.0) |
CHATTERBOX_CFG_WEIGHT |
0.5 |
CFG weight (0.0-1.0) |
CHATTERBOX_PLAYER |
auto |
Audio player |
The client auto-detects GPU availability and uses the CUDA backend if an NVIDIA GPU is present and the CUDA socket is enabled. Use --prefer-cpu or --prefer-gpu flags to override.
Daemon (runs in container, set in .container file):
| Variable | Default | Description |
|---|---|---|
CHATTERBOX_IDLE_TIMEOUT |
1200 |
Seconds of inactivity before daemon exits (0 to disable) |
The module configuration is at /etc/speech-dispatcher/modules/chatterbox.conf. Key settings:
# Add custom voices
AddVoice "en" "MALE1" "default"
AddVoice "en" "FEMALE1" "alice" # Uses ~/.cache/chatterbox-spd/voices/alice.wav
# Adjust text chunk size
GenericMaxChunkLength 10000
chatterbox-spd/
├── .github/
│ ├── workflows/
│ │ ├── build-container.yml # Build and push to GHCR
│ │ └── ci.yml # Linting and validation
│ └── dependabot.yml # Dependency updates
├── bin/
│ ├── chatterbox-tts-daemon # TTS daemon (runs in container)
│ └── chatterbox-tts-client # Client (runs on host)
├── config/
│ └── chatterbox.conf # Speech-dispatcher module config
├── container/
│ ├── Containerfile # Container build file
│ ├── chatterbox-tts.socket # Socket unit (CPU)
│ ├── chatterbox-tts.container # Quadlet (CPU)
│ ├── chatterbox-tts-cuda.socket # Socket unit (GPU)
│ └── chatterbox-tts-cuda.container # Quadlet (GPU)
├── install.sh # Installation script
├── Makefile # Development tasks
├── LICENSE # MIT License
├── CONTRIBUTING.md # Contribution guide
└── README.md
┌─────────────────────┐ ┌──────────────────────┐
│ speech-dispatcher │────▶│ sd_generic module │
└─────────────────────┘ └──────────────────────┘
│
▼
┌──────────────────────┐
│ chatterbox-tts-client│ (host)
└──────────────────────┘
│
▼ Unix Socket
┌─────────────────────────────────────────────────────────────┐
│ Podman Container (Quadlet) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ chatterbox-tts-daemon │ │
│ │ - Chatterbox TTS model (baked in) │ │
│ │ - PyTorch + torchaudio │ │
│ │ - CUDA support (via nvidia-container-toolkit) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────┐
│ Audio Player │ (host)
└──────────────────────┘
# Check logs
journalctl --user -u chatterbox-tts -f
# Verify Quadlet generation
/usr/libexec/podman/quadlet --dryrun --user
# Check if container image exists
podman images | grep chatterbox# Verify NVIDIA GPU access on host
nvidia-smi
# Check nvidia-container-toolkit
rpm -q nvidia-container-toolkit # Fedora/RHEL
dpkg -l nvidia-container-toolkit # Debian/Ubuntu
# Generate CDI spec
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Test container GPU access
podman run --rm --device nvidia.com/gpu=all \
--security-opt=label=disable \
docker.io/nvidia/cuda:12.6.3-base-ubuntu22.04 nvidia-smi# Check available audio players
which aplay paplay pw-play mpv
# Test client directly
echo "test" | chatterbox-tts-client
# Check if socket exists
ls -la $XDG_RUNTIME_DIR/chatterbox-tts/# Check module is listed
spd-say -O
# Verify config file permissions (should be 644)
ls -la /etc/speech-dispatcher/modules/chatterbox.conf
ls -la /etc/speech-dispatcher/modules.d/chatterbox.conf
# Check speechd.conf includes drop-in directory
grep 'modules.d' /etc/speech-dispatcher/speechd.conf# Uninstall everything
./install.sh --uninstall
# Also remove container images
podman rmi ghcr.io/rsturla/chatterbox-spd:latest
podman rmi ghcr.io/rsturla/chatterbox-spd:cuda
# Remove cached voices
rm -rf ~/.cache/chatterbox-spdSee CONTRIBUTING.md for development setup and guidelines.
MIT License - see LICENSE for details.
This project integrates with Chatterbox TTS from Resemble AI, which is also MIT licensed.
- Resemble AI for creating Chatterbox TTS
- The Speech Dispatcher project
- The Podman team for Quadlet support