If you enjoy using Ultimate TTS Studio and would like to support its ongoing development, your generosity is deeply appreciated.
Any amount β big or small β makes a difference!
|
Support this project securely via PayPal. |
Bitcoin Address: 1N942jHr6vVuR2KAe2JEf3nN59eR21tpKv
This update brings VibeVoice and IndexTTS2 as newly supported TTS engines, expanding the variety and flexibility of voice options available.
This recent update brings Higgs-Audio TTS:
This recent update brings a few UI improvements focused on clarity and usability:
-
The TTS engine selector is now organized into a tabbed interface, making it easier to navigate and less overwhelming.
-
The audiobook feature has been moved into its own tab to reduce visual clutter and improve user experience.
Weβve pushed another exciting update packed with new functionality and improvements!
- F5-TTS has now been added as a fifth supported engine, and it works seamlessly across all modes.
- Index-TTS has been added as a supported speech engine.
- All TTS engines now work across all modes, including narration, conversation, and ambient.
- Kokoro now fully supports conversation mode, offering a more dynamic and interactive experience.
For the smoothest installation and full feature compatibility:
- Use a Conda environment, or
- Install via Pinokio for the easiest experience.
Weβre excited to announce a major update to the app!
Bring your favorite eBooks to life with our brand-new custom voice audiobook feature. Instantly convert any eBook into a personalized listening experienceβperfect for learning, multitasking, or relaxing on the go.
This update brings key improvements to performance, model management, and the user interface. Here's what's new:
- Models are no longer auto-loaded into GPU memory at app launch.
- You can now manually load and unload models, giving you more precise control over memory usage.
- A refreshed interface is now live.
- The app is now optimized for dark mode. It still works in light mode, but some visuals may not display as intended.
- Fixed a bug where Fish Speech did not chunk text correctly, which could cause processing issues.
- Chatterbox and Kokoro models will automatically download the first time you click "Load."
- Fish Speech models must still be downloaded manually and are not included in the auto-download process.
- Kokoro now supports custom
.pt
voice models! - Use the Custom Voice Upload section in the Kokoro interface to upload your own compatible voice files.
Ultimate TTS Studio is a powerful all-in-one text-to-speech studio that brings together ChatterboxTTS, Kokoro TTS, and Fish Speech under one interactive Gradio interface.
π Reference Audio Cloning π£οΈ Pre-trained Multi-Language Voices π Natural TTS with Audio Effects π΅ Real-time Voice Synthesis & Export
---.
- π€ ChatterboxTTS: Custom voice cloning using short reference clips.
- π£οΈ Kokoro TTS: High-quality, multilingual pre-trained voices.
- π Fish Speech: Advanced TTS engine.
- ποΈ Professional Audio Effects: Reverb, Echo, EQ, Pitch shift, Gain.
β οΈ Tested Hardware: This project has only been tested on a Windows 11 machine with an RTX 4090 GPU. π» Performance or compatibility on other systems is not guaranteed.π Audio Caution: The Fish Speech feature may occasionally produce extremely loud or muffled audio. π§ Please lower your volume and avoid using headphones during initial tests.
β οΈ Windows Users β Important Note onpynini
If you encounter the following error when installingpynini
:ERROR: Failed building wheel for pynini
You can fix this by installing it via conda: Pynini and wetextprocessing is needed for index-tts to work at its best Espeak-ng is needed for Kokoro to work at its best.
# After activating your conda environment (e.g., conda activate index-tts)
conda install -c conda-forge pynini==2.1.6
pip install WeTextProcessing --no-deps
Install via Pinokio You can use the Pinokio script here for one-click setup: Pinokio App Installer
Option 1a: Install via Dione
You can also use Dione for an easy one-click installation experience:
This is the fastest way to get started. It uses a built-in installer script for automatic setup and app launching.
π οΈ Before You Begin: Make sure you have Miniconda or Anaconda installed on your system. You can download Miniconda here: https://docs.conda.io/en/latest/miniconda.html
git clone https://github.com/SUP3RMASS1VE/Ultimate-TTS-Studio-SUP3R-Edition.git
cd Ultimate-TTS-Studio-SUP3R-Edition
π Double-click RUN_INSTALLER
in the project folder.
This will automatically set up everything for you β dependencies, environment, etc.
π Double-click RUN_APP
to open the app.
π Double-click RUN_UPDATE
to update the app to the latest version.
Follow these steps to set up your environment for Ultimate TTS Studio SUP3R Edition using Conda and UV for fast dependency management.
git clone https://github.com/SUP3RMASS1VE/Ultimate-TTS-Studio-SUP3R-Edition.git
cd Ultimate-TTS-Studio-SUP3R-Edition
conda create -n ultimate-tts python=3.10 -y
conda activate ultimate-tts
pip install uv
π‘ Tip:
uv
dramatically speeds up installation. If you prefer, you can use regularpip install
instead.
uv pip install -r requirements.txt
uv pip install voxcpm openai-whisper --no-deps
uv pip install https://huggingface.co/lldacing/flash-attention-windows-wheel/resolve/main/flash_attn-2.7.4.post1%2Bcu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
uv pip install WeTextProcessing --no-deps
uv pip install triton-windows==3.3.1.post19
uv pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
Your environment is now ready to run Ultimate TTS Studio SUP3R Edition with CUDA 12.8 support. Launch the app and start generating high-quality speech!
π‘ If you encounter CUDA or package conflicts, ensure your GPU drivers are updated and that Condaβs python=3.10
matches the wheel compatibility.
π‘ If you're not using uv
, you can just use pip install
in its place.
To use Fish Speech, you must download the model checkpoint from Hugging Face. This requires a Hugging Face account and access token.
-
Create an account (if needed): https://huggingface.co/join
-
Get your access token: Visit https://huggingface.co/settings/tokens and create a read token.
-
Log in via CLI:
huggingface-cli login
Paste your token when prompted.
-
(Optional) Accept the model license: Visit https://huggingface.co/fishaudio/openaudio-s1-mini and click "Access repository" if prompted.
-
Download the model:
huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
python launch.py
This will launch a local Gradio interface at:
π http://127.0.0.1:7860
- All engines are optional β the app will gracefully disable missing engines.
- ChatterboxTTS and Fish Speech support reference audio input.
- Audio effects are applied post-synthesis for professional-quality output.
- Custom Kokoro voices can be added to
custom_voices/
as.pt
files.
MIT License Β© SUP3RMASS1VE
This project proudly integrates and builds upon the amazing work of:
-
Fish Speech by fishaudio β Natural and expressive TTS engine. π License: MIT License
-
Kokoro TTS by hexgrad β High-quality multilingual voice synthesis. π License: Apache 2.0 License
-
ChatterboxTTS by Resemble AI β Custom voice cloning from short reference clips. π License: Apache 2.0 License
-
F5-TTS by SWivid β Efficient and lightweight TTS model focused on real-time synthesis. π License: MIT License
-
Index TTS β Modular and scalable text-to-speech system with advanced voice capabilities. π License: Apache 2.0 License
We deeply thank the authors and contributors to these projects for making this work possible.