A Streamlit web app for cloning voices using Coqui TTS.
Upload or record reference voices, fine-tune style & settings, and generate natural-sounding speech in 16+ languages — all through a clean, dark-themed interface.
- 📁 Voice gallery management
- Upload, record, save, activate, and delete reference voices
- Batch delete multiple voices at once
- Preview and switch between saved voices
- 🌍 Multi-language support
- 16+ languages including English, Spanish, French, German, Chinese, Japanese, and more
- 🎭 Voice style controls
- Choose between neutral, fast, and expressive styles
- Optional speed and emotion overrides (happy, sad, angry, surprised, fearful)
- ⚙️ Advanced settings
- Control voice stability and similarity for better cloning accuracy
- Toggle loudness normalization for consistent output
- 📊 Generation history
- Automatic metadata saving (text, style, reference voice, timestamp)
- Recent generations gallery with preview, download, and re-synthesize option
- 💾 Multiple export formats
- Download results as WAV or MP3 (requires FFmpeg)
- 🎨 Modern dark UI
- Custom-styled Streamlit interface with smooth UX and mobile-friendly layout
- Streamlit: Web interface for interaction and real-time updates
- streamlit-option-menu: Sidebar navigation with a clean UI
- Custom CSS styling for dark mode & better UX
- HTML5
MediaRecorderfor in-browser recording
- Coqui TTS (XTTS v2): Voice cloning & text-to-speech model
- pydub: Audio file conversion (WebM/MP3 ↔ WAV)
- librosa & soundfile: Audio signal processing
- NumPy: Array and numerical operations
- ffmpeg (system dependency): Required for audio encoding/decoding
- streamlit.components.v1 – custom recorder integration
voice_clone_app/
│
├── outputs/ # Generated outputs (auto-created)
│ └── xtts_*.wav # Generated files
│
├── voices/ # Reference voices (auto-created)
│ ├── ref_*.wav # Uploaded/recorded samples
│
├── app.py # Main Streamlit application
├── run_app.py # Launcher (cross-platform, auto-opens browser)
├── requirements.txt # Python dependencies
├── LICENSE # Open-source license
├── .gitignore # Ignored files/folders
└── README.md # Project documentation
git clone https://github.com/EbrahimAR/AI-Voice-Cloner-XTTS-v2.git
cd AI-Voice-Cloner-XTTS-v2python -m venv .venv
source .venv/bin/activate # On Linux/Mac
.venv\Scripts\activate # On Windowspip install -r requirements.txt- ffmpeg is required for MP3 conversion:
- Windows:
choco install ffmpeg - macOS:
brew install ffmpeg - Linux:
sudo apt-get install ffmpeg
- Windows:
streamlit run app.pyOr use the helper script:
python run_app.pyApp will open at: http://localhost:8502
Ebrahim Abdul Raoof
This project is licensed under the MIT License. See LICENSE for details.