Skip to content

rsturla/chatterbox-spd

Repository files navigation

Chatterbox TTS for Speech Dispatcher

CI Container Build License: MIT

Integrate Chatterbox TTS from Resemble AI with Linux Speech Dispatcher using Podman Quadlets.

Chatterbox is a state-of-the-art open-source TTS model that outperforms ElevenLabs in side-by-side evaluations. It supports voice cloning from just 10 seconds of audio.

Features

  • High-quality neural TTS powered by Chatterbox (350M-500M parameters)
  • Models baked into container image
  • Voice cloning support (provide a 10-second reference clip)
  • Container-based - no pip install required on host
  • Native systemd integration via Podman Quadlets
  • GPU acceleration with NVIDIA CUDA (optional)
  • CPU-only mode for systems without GPU

Quick Start

Using Pre-built Container Images

# Clone the repository
git clone https://github.com/rsturla/chatterbox-spd.git
cd chatterbox-spd

# Install (pulls container from GHCR)
./install.sh --user

# Enable socket activation (container starts on first use)
systemctl --user enable --now chatterbox-tts.socket

# For GPU support, also enable the CUDA socket
systemctl --user enable --now chatterbox-tts-cuda.socket

# Test it (client auto-detects GPU and picks the right backend)
spd-say -o chatterbox "Hello from Chatterbox!"

Building Locally

# Requires HuggingFace token for model download during build
# Get one at: https://huggingface.co/settings/tokens
export HF_TOKEN="your_token_here"

# Build and install
./install.sh --user --build

# For GPU support
./install.sh --user --build --gpu

Requirements

Requirement Version Notes
Podman 4.4+ For Quadlet support
speech-dispatcher any For TTS integration
Audio player any aplay, paplay, pw-play, or mpv
NVIDIA GPU optional For GPU acceleration
nvidia-container-toolkit optional Provides CUDA to container

GPU Setup (Optional)

The container image does NOT bundle NVIDIA CUDA libraries. This avoids CUDA redistribution licensing issues. CUDA is injected at runtime from your host by nvidia-container-toolkit.

# Install nvidia-container-toolkit
sudo dnf install nvidia-container-toolkit  # Fedora/RHEL
sudo apt install nvidia-container-toolkit  # Debian/Ubuntu

# Generate CDI spec (required on newer systems)
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

Usage

Basic Speech

# Via speech-dispatcher
spd-say -o chatterbox "Hello world"

# Via client directly
echo "Hello world" | chatterbox-tts-client
chatterbox-tts-client --text "Hello world"

# Save to file instead of playing
chatterbox-tts-client --text "Hello world" --output hello.wav

Voice Cloning

  1. Add a 10-second WAV reference clip:
mkdir -p ~/.cache/chatterbox-spd/voices
cp my_voice.wav ~/.cache/chatterbox-spd/voices/alice.wav
  1. Update /etc/speech-dispatcher/modules/chatterbox.conf:
AddVoice "en" "FEMALE1" "alice"
  1. Use the voice:
spd-say -o chatterbox -y FEMALE1 "Hello, I sound like Alice now"

Paralinguistic Tags (Turbo model)

The Turbo model supports non-speech sounds:

spd-say -o chatterbox "[laugh] That's hilarious!"
spd-say -o chatterbox "I'm not sure... [sigh] let me think."

Container Images

Pre-built images are available from GitHub Container Registry:

# CPU version (default)
podman pull ghcr.io/rsturla/chatterbox-spd:latest

# CUDA/GPU version (for NVIDIA GPUs)
podman pull ghcr.io/rsturla/chatterbox-spd:cuda

Available Tags

Tag Description
latest Latest commit (CPU)
cpu Latest CPU version
cuda Latest CUDA/GPU version
<sha> Specific commit (CPU)
<sha>-cpu Specific commit CPU
<sha>-cuda Specific commit CUDA

Managing the Service

The service uses socket activation. The container starts automatically when a client connects and exits after 20 minutes of inactivity. The client auto-detects GPU and chooses the appropriate backend.

# Check socket status
systemctl --user status chatterbox-tts.socket       # CPU
systemctl --user status chatterbox-tts-cuda.socket  # GPU

# Check container status (only running if active)
systemctl --user status chatterbox-tts       # CPU
systemctl --user status chatterbox-tts-cuda  # GPU

# View logs
journalctl --user -u chatterbox-tts -f       # CPU
journalctl --user -u chatterbox-tts-cuda -f  # GPU

# Stop the container (socket stays active)
systemctl --user stop chatterbox-tts
systemctl --user stop chatterbox-tts-cuda

# Disable socket activation entirely
systemctl --user disable --now chatterbox-tts.socket
systemctl --user disable --now chatterbox-tts-cuda.socket

Configuration

Environment Variables

Client (runs on host):

Variable Default Description
CHATTERBOX_SOCKET auto-detect Socket path (overrides auto-detection)
CHATTERBOX_PREFER_GPU auto Set to 1 to prefer GPU, 0 to prefer CPU
CHATTERBOX_VOICE default Voice name
CHATTERBOX_EXAGGERATION 0.5 Emotion exaggeration (0.0-1.0)
CHATTERBOX_CFG_WEIGHT 0.5 CFG weight (0.0-1.0)
CHATTERBOX_PLAYER auto Audio player

The client auto-detects GPU availability and uses the CUDA backend if an NVIDIA GPU is present and the CUDA socket is enabled. Use --prefer-cpu or --prefer-gpu flags to override.

Daemon (runs in container, set in .container file):

Variable Default Description
CHATTERBOX_IDLE_TIMEOUT 1200 Seconds of inactivity before daemon exits (0 to disable)

Speech Dispatcher Module Config

The module configuration is at /etc/speech-dispatcher/modules/chatterbox.conf. Key settings:

# Add custom voices
AddVoice "en" "MALE1" "default"
AddVoice "en" "FEMALE1" "alice"    # Uses ~/.cache/chatterbox-spd/voices/alice.wav

# Adjust text chunk size
GenericMaxChunkLength 10000

Project Structure

chatterbox-spd/
├── .github/
│   ├── workflows/
│   │   ├── build-container.yml  # Build and push to GHCR
│   │   └── ci.yml               # Linting and validation
│   └── dependabot.yml           # Dependency updates
├── bin/
│   ├── chatterbox-tts-daemon    # TTS daemon (runs in container)
│   └── chatterbox-tts-client    # Client (runs on host)
├── config/
│   └── chatterbox.conf          # Speech-dispatcher module config
├── container/
│   ├── Containerfile                  # Container build file
│   ├── chatterbox-tts.socket          # Socket unit (CPU)
│   ├── chatterbox-tts.container       # Quadlet (CPU)
│   ├── chatterbox-tts-cuda.socket     # Socket unit (GPU)
│   └── chatterbox-tts-cuda.container  # Quadlet (GPU)
├── install.sh                   # Installation script
├── Makefile                     # Development tasks
├── LICENSE                      # MIT License
├── CONTRIBUTING.md              # Contribution guide
└── README.md

Architecture

┌─────────────────────┐     ┌──────────────────────┐
│  speech-dispatcher  │────▶│ sd_generic module    │
└─────────────────────┘     └──────────────────────┘
                                      │
                                      ▼
                            ┌──────────────────────┐
                            │ chatterbox-tts-client│ (host)
                            └──────────────────────┘
                                      │
                                      ▼ Unix Socket
┌─────────────────────────────────────────────────────────────┐
│                    Podman Container (Quadlet)                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ chatterbox-tts-daemon                                 │   │
│  │   - Chatterbox TTS model (baked in)                  │   │
│  │   - PyTorch + torchaudio                             │   │
│  │   - CUDA support (via nvidia-container-toolkit)      │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
                            ┌──────────────────────┐
                            │    Audio Player      │ (host)
                            └──────────────────────┘

Troubleshooting

Service won't start

# Check logs
journalctl --user -u chatterbox-tts -f

# Verify Quadlet generation
/usr/libexec/podman/quadlet --dryrun --user

# Check if container image exists
podman images | grep chatterbox

GPU not working

# Verify NVIDIA GPU access on host
nvidia-smi

# Check nvidia-container-toolkit
rpm -q nvidia-container-toolkit  # Fedora/RHEL
dpkg -l nvidia-container-toolkit  # Debian/Ubuntu

# Generate CDI spec
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# Test container GPU access
podman run --rm --device nvidia.com/gpu=all \
    --security-opt=label=disable \
    docker.io/nvidia/cuda:12.6.3-base-ubuntu22.04 nvidia-smi

No audio output

# Check available audio players
which aplay paplay pw-play mpv

# Test client directly
echo "test" | chatterbox-tts-client

# Check if socket exists
ls -la $XDG_RUNTIME_DIR/chatterbox-tts/

Module not loading in speech-dispatcher

# Check module is listed
spd-say -O

# Verify config file permissions (should be 644)
ls -la /etc/speech-dispatcher/modules/chatterbox.conf
ls -la /etc/speech-dispatcher/modules.d/chatterbox.conf

# Check speechd.conf includes drop-in directory
grep 'modules.d' /etc/speech-dispatcher/speechd.conf

Uninstalling

# Uninstall everything
./install.sh --uninstall

# Also remove container images
podman rmi ghcr.io/rsturla/chatterbox-spd:latest
podman rmi ghcr.io/rsturla/chatterbox-spd:cuda

# Remove cached voices
rm -rf ~/.cache/chatterbox-spd

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

MIT License - see LICENSE for details.

This project integrates with Chatterbox TTS from Resemble AI, which is also MIT licensed.

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages