This package contains everything you need to run Chatterbox Voice Cloning locally with a simple UI. After evaluating several open-source voice cloning solutions, Chatterbox TTS was selected as the best option due to its MIT license, state-of-the-art performance, simple Python API, and built-in web UI capabilities.
- Double-click
start_voice_cloning.bat - Wait for the UI to launch in your browser
- Start cloning voices!
- Open a terminal in this folder
- Run
chmod +x start_voice_cloning.sh(first time only) - Run
./start_voice_cloning.sh - Wait for the UI to launch in your browser
- Python 3.11 ONLY (application will not run on other versions)
- CUDA-compatible GPU recommended for faster processing (but CPU works too)
If the launchers don't work immediately, you may need to:
- Install Python from https://www.python.org/downloads/
- Open a terminal/command prompt in this folder
- Run:
python -m venv venv - Activate the environment:
- Windows:
venv\Scripts\activate - Linux/Mac:
source venv/bin/activate
- Windows:
- Install dependencies:
pip install -r requirements.txt- Note: If you encounter dependency resolution issues, try using
pip install --no-deps -r requirements.txtfollowed bypip install torch torchaudio
- Note: If you encounter dependency resolution issues, try using
- Then try the launcher again
- If you encounter dependency resolution issues during installation, the requirements.txt file contains pinned versions that should work together
- For Windows users with Python 3.13+:
- The updated requirements.txt uses numpy<1.26.0 to avoid build failures on Python 3.13
- If you still encounter numpy build errors, try installing with
pip install --only-binary=numpy -r requirements.txt - Alternatively, consider using Python 3.10-3.12 for better compatibility with all dependencies
- If you see UnicodeDecodeError during installation, try setting your console to UTF-8:
chcp 65001before running pip
- Enter the text you want to convert to speech
- Upload a voice reference audio file (5-10 seconds of clear speech works best)
- Click "Generate Speech" to create your cloned voice audio
- Adjust the advanced settings as needed:
- Exaggeration: Controls emotional intensity (higher = more expressive)
- Temperature: Controls randomness (higher = more variation)
- CFG Weight: Controls adherence to text (higher = more precise)
For streamer chat TTS specifically:
- Use a high-quality reference audio with clear speech
- Keep messages relatively short for best results
- For consistent voice quality, use the same reference audio for all generations
- Try lower CFG weight values (~0.3) and higher exaggeration (~0.7) for more expressive speech
If you encounter any issues:
- Make sure Python is installed and in your PATH
- Check that you have enough disk space
- For GPU acceleration, ensure you have CUDA installed
- Try running the commands manually as described in "First-Time Setup"