🎙️ Local TTS

A local text-to-speech application powered by Qwen3-TTS with a premium React web UI.

Generate natural, expressive speech locally — custom voices, voice design from natural language descriptions, and voice cloning from audio samples.

✨ Features

Feature	Description
🎤 Custom Voice	10+ built-in speakers with 10 languages and emotional control
🎨 Voice Design	Describe your ideal voice in plain English — the AI creates it
🔄 Clone Speaker	Save a custom voice and use it to generate new speech
📚 Voice Library	Save, manage, and reuse your custom voice profiles
⚡ Ultra Low Latency	Streaming generation with end-to-end latency as low as 97ms
🌍 10 Languages	Chinese, English, Japanese, Korean, German, French, and more
🖥️ Premium Web UI	Dark-themed React interface with glassmorphism and animations
🍎 Apple Silicon	Drop-in Metal GPU acceleration (MPS) for Mac users

📂 Project Structure

This project is organized into two independent modules:

Module	Description	Guide
🔧 backend/	FastAPI server + Qwen3-TTS engine + CLI	Setup Guide →
🎨 frontend/	React web UI (Vite + Zustand)	Setup Guide →

Each module can be developed and tested independently. See their respective READMEs for setup instructions.

🚀 Quick Start

Prerequisites - npm

Development (Two Terminals)

Run the backend and frontend separately to get live reloading on both ends.

# Terminal 1: Start backend
cd backend
make install
make serve

# Terminal 2: Start frontend dev server
cd frontend
npm install
npm run dev

Open http://localhost:5173 — the frontend automatically proxies API requests to the backend.

Production (Single Process)

Run both frontend and backend together via the root Makefile for an easy one-click setup.

first run

npm install

make serve

Open http://localhost:8765 to view the application in production mode.

📁 Storage Locations

All data is stored under ~/.local-tts/ by default. Override with the LOCAL_TTS_DATA_DIR environment variable or the --data-dir CLI flag.

Path	Contents
`~/.local-tts/models/`	Downloaded model weights
`~/.local-tts/voices/`	Saved voice profiles & cloned voice samples
`~/.local-tts/history/`	Generated audio files (up to 100, then auto-pruned)
`~/.local-tts/config.json`	Persisted server configuration

Disk tip: Models are large (2.5–6.5 GB each). Delete unwanted model folders from ~/.local-tts/models/ to reclaim space.

⚠️ Requirements

Python ≥ 3.11
Node.js ≥ 18 (for building the web UI)
GPU (recommended): NVIDIA GPU with ≥ 8 GB VRAM, or Apple Silicon Mac with Metal support
CPU mode: Works but inference is significantly slower

📄 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
Makefile		Makefile
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Local TTS

✨ Features

📂 Project Structure

🚀 Quick Start

Prerequisites - npm

Development (Two Terminals)

Production (Single Process)

📁 Storage Locations

⚠️ Requirements

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Local TTS

✨ Features

📂 Project Structure

🚀 Quick Start

Prerequisites - npm

Development (Two Terminals)

Production (Single Process)

📁 Storage Locations

⚠️ Requirements

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages