A local text-to-speech application powered by Qwen3-TTS with a premium React web UI.
Generate natural, expressive speech locally — custom voices, voice design from natural language descriptions, and voice cloning from audio samples.
| Feature | Description |
|---|---|
| 🎤 Custom Voice | 10+ built-in speakers with 10 languages and emotional control |
| 🎨 Voice Design | Describe your ideal voice in plain English — the AI creates it |
| 🔄 Clone Speaker | Save a custom voice and use it to generate new speech |
| 📚 Voice Library | Save, manage, and reuse your custom voice profiles |
| ⚡ Ultra Low Latency | Streaming generation with end-to-end latency as low as 97ms |
| 🌍 10 Languages | Chinese, English, Japanese, Korean, German, French, and more |
| 🖥️ Premium Web UI | Dark-themed React interface with glassmorphism and animations |
| 🍎 Apple Silicon | Drop-in Metal GPU acceleration (MPS) for Mac users |
This project is organized into two independent modules:
| Module | Description | Guide |
|---|---|---|
| 🔧 backend/ | FastAPI server + Qwen3-TTS engine + CLI | Setup Guide → |
| 🎨 frontend/ | React web UI (Vite + Zustand) | Setup Guide → |
Each module can be developed and tested independently. See their respective READMEs for setup instructions.
Run the backend and frontend separately to get live reloading on both ends.
# Terminal 1: Start backend
cd backend
make install
make serve
# Terminal 2: Start frontend dev server
cd frontend
npm install
npm run devOpen http://localhost:5173 — the frontend automatically proxies API requests to the backend.
Run both frontend and backend together via the root Makefile for an easy one-click setup.
first run
npm installmake serveOpen http://localhost:8765 to view the application in production mode.
All data is stored under ~/.local-tts/ by default. Override with the LOCAL_TTS_DATA_DIR environment variable or the --data-dir CLI flag.
| Path | Contents |
|---|---|
~/.local-tts/models/ |
Downloaded model weights |
~/.local-tts/voices/ |
Saved voice profiles & cloned voice samples |
~/.local-tts/history/ |
Generated audio files (up to 100, then auto-pruned) |
~/.local-tts/config.json |
Persisted server configuration |
Disk tip: Models are large (2.5–6.5 GB each). Delete unwanted model folders from
~/.local-tts/models/to reclaim space.
- Python ≥ 3.11
- Node.js ≥ 18 (for building the web UI)
- GPU (recommended): NVIDIA GPU with ≥ 8 GB VRAM, or Apple Silicon Mac with Metal support
- CPU mode: Works but inference is significantly slower
MIT License