Chatterbox TTS for Speech Dispatcher

Integrate Chatterbox TTS from Resemble AI with Linux Speech Dispatcher using Podman Quadlets.

Chatterbox is a state-of-the-art open-source TTS model that outperforms ElevenLabs in side-by-side evaluations. It supports voice cloning from just 10 seconds of audio.

Features

High-quality neural TTS powered by Chatterbox (350M-500M parameters)
Models baked into container image
Voice cloning support (provide a 10-second reference clip)
Container-based - no pip install required on host
Native systemd integration via Podman Quadlets
GPU acceleration with NVIDIA CUDA (optional)
CPU-only mode for systems without GPU

Quick Start

Using Pre-built Container Images

# Clone the repository
git clone https://github.com/rsturla/chatterbox-spd.git
cd chatterbox-spd

# Install (pulls container from GHCR)
./install.sh --user

# Enable socket activation (container starts on first use)
systemctl --user enable --now chatterbox-tts.socket

# For GPU support, also enable the CUDA socket
systemctl --user enable --now chatterbox-tts-cuda.socket

# Test it (client auto-detects GPU and picks the right backend)
spd-say -o chatterbox "Hello from Chatterbox!"

Building Locally

# Requires HuggingFace token for model download during build
# Get one at: https://huggingface.co/settings/tokens
export HF_TOKEN="your_token_here"

# Build and install
./install.sh --user --build

# For GPU support
./install.sh --user --build --gpu

Requirements

Requirement	Version	Notes
Podman	4.4+	For Quadlet support
speech-dispatcher	any	For TTS integration
Audio player	any	aplay, paplay, pw-play, or mpv
NVIDIA GPU	optional	For GPU acceleration
nvidia-container-toolkit	optional	Provides CUDA to container

GPU Setup (Optional)

The container image does NOT bundle NVIDIA CUDA libraries. This avoids CUDA redistribution licensing issues. CUDA is injected at runtime from your host by nvidia-container-toolkit.

# Install nvidia-container-toolkit
sudo dnf install nvidia-container-toolkit  # Fedora/RHEL
sudo apt install nvidia-container-toolkit  # Debian/Ubuntu

# Generate CDI spec (required on newer systems)
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

Usage

Basic Speech

# Via speech-dispatcher
spd-say -o chatterbox "Hello world"

# Via client directly
echo "Hello world" | chatterbox-tts-client
chatterbox-tts-client --text "Hello world"

# Save to file instead of playing
chatterbox-tts-client --text "Hello world" --output hello.wav

Voice Cloning

Add a 10-second WAV reference clip:

mkdir -p ~/.cache/chatterbox-spd/voices
cp my_voice.wav ~/.cache/chatterbox-spd/voices/alice.wav

Update /etc/speech-dispatcher/modules/chatterbox.conf:

AddVoice "en" "FEMALE1" "alice"

Use the voice:

spd-say -o chatterbox -y FEMALE1 "Hello, I sound like Alice now"

Paralinguistic Tags (Turbo model)

The Turbo model supports non-speech sounds:

spd-say -o chatterbox "[laugh] That's hilarious!"
spd-say -o chatterbox "I'm not sure... [sigh] let me think."

Container Images

Pre-built images are available from GitHub Container Registry:

# CPU version (default)
podman pull ghcr.io/rsturla/chatterbox-spd:latest

# CUDA/GPU version (for NVIDIA GPUs)
podman pull ghcr.io/rsturla/chatterbox-spd:cuda

Available Tags

Tag	Description
`latest`	Latest commit (CPU)
`cpu`	Latest CPU version
`cuda`	Latest CUDA/GPU version
`<sha>`	Specific commit (CPU)
`<sha>-cpu`	Specific commit CPU
`<sha>-cuda`	Specific commit CUDA

Managing the Service

The service uses socket activation. The container starts automatically when a client connects and exits after 20 minutes of inactivity. The client auto-detects GPU and chooses the appropriate backend.

# Check socket status
systemctl --user status chatterbox-tts.socket       # CPU
systemctl --user status chatterbox-tts-cuda.socket  # GPU

# Check container status (only running if active)
systemctl --user status chatterbox-tts       # CPU
systemctl --user status chatterbox-tts-cuda  # GPU

# View logs
journalctl --user -u chatterbox-tts -f       # CPU
journalctl --user -u chatterbox-tts-cuda -f  # GPU

# Stop the container (socket stays active)
systemctl --user stop chatterbox-tts
systemctl --user stop chatterbox-tts-cuda

# Disable socket activation entirely
systemctl --user disable --now chatterbox-tts.socket
systemctl --user disable --now chatterbox-tts-cuda.socket

Configuration

Environment Variables

Client (runs on host):

Variable	Default	Description
`CHATTERBOX_SOCKET`	auto-detect	Socket path (overrides auto-detection)
`CHATTERBOX_PREFER_GPU`	auto	Set to `1` to prefer GPU, `0` to prefer CPU
`CHATTERBOX_VOICE`	`default`	Voice name
`CHATTERBOX_EXAGGERATION`	`0.5`	Emotion exaggeration (0.0-1.0)
`CHATTERBOX_CFG_WEIGHT`	`0.5`	CFG weight (0.0-1.0)
`CHATTERBOX_PLAYER`	`auto`	Audio player

The client auto-detects GPU availability and uses the CUDA backend if an NVIDIA GPU is present and the CUDA socket is enabled. Use --prefer-cpu or --prefer-gpu flags to override.

Daemon (runs in container, set in .container file):

Variable	Default	Description
`CHATTERBOX_IDLE_TIMEOUT`	`1200`	Seconds of inactivity before daemon exits (0 to disable)

Speech Dispatcher Module Config

The module configuration is at /etc/speech-dispatcher/modules/chatterbox.conf. Key settings:

# Add custom voices
AddVoice "en" "MALE1" "default"
AddVoice "en" "FEMALE1" "alice"    # Uses ~/.cache/chatterbox-spd/voices/alice.wav

# Adjust text chunk size
GenericMaxChunkLength 10000

Project Structure

chatterbox-spd/
├── .github/
│   ├── workflows/
│   │   ├── build-container.yml  # Build and push to GHCR
│   │   └── ci.yml               # Linting and validation
│   └── dependabot.yml           # Dependency updates
├── bin/
│   ├── chatterbox-tts-daemon    # TTS daemon (runs in container)
│   └── chatterbox-tts-client    # Client (runs on host)
├── config/
│   └── chatterbox.conf          # Speech-dispatcher module config
├── container/
│   ├── Containerfile                  # Container build file
│   ├── chatterbox-tts.socket          # Socket unit (CPU)
│   ├── chatterbox-tts.container       # Quadlet (CPU)
│   ├── chatterbox-tts-cuda.socket     # Socket unit (GPU)
│   └── chatterbox-tts-cuda.container  # Quadlet (GPU)
├── install.sh                   # Installation script
├── Makefile                     # Development tasks
├── LICENSE                      # MIT License
├── CONTRIBUTING.md              # Contribution guide
└── README.md

Architecture

┌─────────────────────┐     ┌──────────────────────┐
│  speech-dispatcher  │────▶│ sd_generic module    │
└─────────────────────┘     └──────────────────────┘
                                      │
                                      ▼
                            ┌──────────────────────┐
                            │ chatterbox-tts-client│ (host)
                            └──────────────────────┘
                                      │
                                      ▼ Unix Socket
┌─────────────────────────────────────────────────────────────┐
│                    Podman Container (Quadlet)                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ chatterbox-tts-daemon                                 │   │
│  │   - Chatterbox TTS model (baked in)                  │   │
│  │   - PyTorch + torchaudio                             │   │
│  │   - CUDA support (via nvidia-container-toolkit)      │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
                            ┌──────────────────────┐
                            │    Audio Player      │ (host)
                            └──────────────────────┘

Troubleshooting

Service won't start

# Check logs
journalctl --user -u chatterbox-tts -f

# Verify Quadlet generation
/usr/libexec/podman/quadlet --dryrun --user

# Check if container image exists
podman images | grep chatterbox

GPU not working

# Verify NVIDIA GPU access on host
nvidia-smi

# Check nvidia-container-toolkit
rpm -q nvidia-container-toolkit  # Fedora/RHEL
dpkg -l nvidia-container-toolkit  # Debian/Ubuntu

# Generate CDI spec
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# Test container GPU access
podman run --rm --device nvidia.com/gpu=all \
    --security-opt=label=disable \
    docker.io/nvidia/cuda:12.6.3-base-ubuntu22.04 nvidia-smi

No audio output

# Check available audio players
which aplay paplay pw-play mpv

# Test client directly
echo "test" | chatterbox-tts-client

# Check if socket exists
ls -la $XDG_RUNTIME_DIR/chatterbox-tts/

Module not loading in speech-dispatcher

# Check module is listed
spd-say -O

# Verify config file permissions (should be 644)
ls -la /etc/speech-dispatcher/modules/chatterbox.conf
ls -la /etc/speech-dispatcher/modules.d/chatterbox.conf

# Check speechd.conf includes drop-in directory
grep 'modules.d' /etc/speech-dispatcher/speechd.conf

Uninstalling

# Uninstall everything
./install.sh --uninstall

# Also remove container images
podman rmi ghcr.io/rsturla/chatterbox-spd:latest
podman rmi ghcr.io/rsturla/chatterbox-spd:cuda

# Remove cached voices
rm -rf ~/.cache/chatterbox-spd

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

MIT License - see LICENSE for details.

This project integrates with Chatterbox TTS from Resemble AI, which is also MIT licensed.

Acknowledgments

Resemble AI for creating Chatterbox TTS
The Speech Dispatcher project
The Podman team for Quadlet support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chatterbox TTS for Speech Dispatcher

Features

Quick Start

Using Pre-built Container Images

Building Locally

Requirements

GPU Setup (Optional)

Usage

Basic Speech

Voice Cloning

Paralinguistic Tags (Turbo model)

Container Images

Available Tags

Managing the Service

Configuration

Environment Variables

Speech Dispatcher Module Config

Project Structure

Architecture

Troubleshooting

Service won't start

GPU not working

No audio output

Module not loading in speech-dispatcher

Uninstalling

Contributing

License

Acknowledgments

About

Uh oh!

Packages

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
bin		bin
config		config
container		container
.editorconfig		.editorconfig
.gitignore		.gitignore
.packit.yaml		.packit.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
chatterbox-spd.spec		chatterbox-spd.spec
install.sh		install.sh
pyproject.toml		pyproject.toml

License

rsturla/chatterbox-spd

Folders and files

Latest commit

History

Repository files navigation

Chatterbox TTS for Speech Dispatcher

Features

Quick Start

Using Pre-built Container Images

Building Locally

Requirements

GPU Setup (Optional)

Usage

Basic Speech

Voice Cloning

Paralinguistic Tags (Turbo model)

Container Images

Available Tags

Managing the Service

Configuration

Environment Variables

Speech Dispatcher Module Config

Project Structure

Architecture

Troubleshooting

Service won't start

GPU not working

No audio output

Module not loading in speech-dispatcher

Uninstalling

Contributing

License

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Languages

Packages