A high-quality, lightweight text-to-speech solution powered by KittenTTS with a beautiful web interface and OpenAI-compatible API.
- π― OpenAI-Compatible API: Drop-in replacement for OpenAI's TTS API
- π Modern WebUI: Beautiful, responsive web interface
- π API Key Authentication: Optional security for production use
- π³ Docker Ready: Easy deployment with Docker Compose
- β‘ Lightweight: Only ~79MB model size, works on CPU
- ποΈ Multiple Voices: 8 different voice options
- π¦ Multiple Formats: MP3, WAV, Opus, FLAC support
The easiest and most consistent way to run Kitten TTS.
docker-compose up -d# Set your API key
export API_KEY="your-secret-api-key"
# Start with authentication
docker-compose -f docker-compose.api-key.yml up -dAccess the WebUI: http://localhost:8000
View logs: docker-compose logs -f
Stop: docker-compose down
The start.sh script automatically detects if Docker is available and chooses the best method.
# Make executable (first time only)
chmod +x start.sh
# Run the script
./start.shWhat it does:
- If Docker is available β Uses Docker Compose
- If Docker is not available β Sets up Python virtual environment and runs locally
- Automatically creates
.envfile if missing - Handles API key configuration
Access the WebUI: http://localhost:8000
Run directly with Python without Docker.
# Install dependencies
pip install -r requirements.txt
# Install KittenTTS
pip install https://github.com/KittenML/KittenTTS/releases/download/0.8/kittentts-0.8.0-py3-none-any.whl
# Run the server
python app.pyWith API Key:
export API_KEY="your-secret-key"
python app.pyAccess the WebUI: http://localhost:8000
Stop: Press Ctrl+C
If you just want to serve the WebUI and connect to a remote Kitten TTS API:
# Using Python's built-in HTTP server
cd static
python -m http.server 3000
# Or using Node.js http-server
npm install -g http-server
cd static
http-server -p 3000Then edit static/index.html to change the API endpoint from /v1/audio/speech to your remote server URL (e.g., https://tts.yourdomain.com/v1/audio/speech).
Access the WebUI: http://localhost:3000
| Environment Variable | Default | Description |
|---|---|---|
HOST |
0.0.0.0 |
Server host |
PORT |
8000 |
Server port |
MODEL_NAME |
KittenML/kitten-tts-mini-0.8 |
Model to use |
API_KEY |
(empty) | API key for authentication (optional) |
Via .env file:
cp .env.example .env
# Edit .env with your settingsVia environment variables:
export API_KEY="my-secret-key"
export PORT=8000Via Docker Compose:
environment:
- API_KEY=my-secret-key
- PORT=8000curl http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "kitten-tts-mini-0.8",
"input": "Hello, this is a test!",
"voice": "Jasper",
"response_format": "mp3"
}' \
--output speech.mp3Voices: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo
Formats: mp3, wav, opus, flac
OpenAI Voice Mapping:
alloyβ Jasperechoβ Brunofableβ Bellaonyxβ Hugonovaβ Lunashimmerβ Rosie
curl http://localhost:8000/v1/audio/voices \
-H "Authorization: Bearer YOUR_API_KEY"curl http://localhost:8000/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"import requests
# Without API key
response = requests.post(
"http://localhost:8000/v1/audio/speech",
json={
"model": "kitten-tts-mini-0.8",
"input": "Hello world!",
"voice": "Jasper",
"response_format": "mp3"
}
)
# With API key
headers = {"Authorization": "Bearer YOUR_API_KEY"}
response = requests.post(
"http://localhost:8000/v1/audio/speech",
headers=headers,
json={
"model": "kitten-tts-mini-0.8",
"input": "Hello world!",
"voice": "Jasper",
"response_format": "mp3"
}
)
# Save audio
with open("speech.mp3", "wb") as f:
f.write(response.content)const response = await fetch('http://localhost:8000/v1/audio/speech', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY'
},
body: JSON.stringify({
model: 'kitten-tts-mini-0.8',
input: 'Hello world!',
voice: 'Jasper',
response_format: 'mp3'
})
});
const blob = await response.blob();
const url = URL.createObjectURL(blob);
const audio = new Audio(url);
audio.play();kitten-tts/
βββ app.py # Main FastAPI application
βββ requirements.txt # Python dependencies
βββ Dockerfile # Docker build configuration
βββ docker-compose.yml # Docker Compose (no auth)
βββ docker-compose.api-key.yml # Docker Compose (with auth)
βββ start.sh # Smart startup script
βββ .env.example # Environment template
βββ .dockerignore # Docker ignore patterns
β
βββ static/
β βββ index.html # WebUI
β
βββ examples/
β βββ usage_examples.py # Python examples
β βββ usage_examples.js # JavaScript examples
β
βββ test_api.py # API test suite
β
βββ README.md # This file
βββ QUICKSTART.md # 5-minute quick start
βββ PUBLISHING_GUIDE.md # Docker publishing guide
βββ ARCHITECTURE.md # System architecture
βββ PROJECT_SUMMARY.md # Project overview
- API Key: Always set
API_KEYin production environments - HTTPS: Use a reverse proxy (nginx, traefik) for HTTPS in production
- Rate Limiting: Implement rate limiting for public deployments
- Network: Don't expose the container directly to the internet
server {
listen 443 ssl;
server_name tts.yourdomain.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
# Rate limiting
limit_req zone=one burst=10 nodelay;
}
}curl http://localhost:8000/healthResponse:
{
"status": "healthy",
"model_loaded": true
}- Model: KittenTTS Mini 0.8
- Parameters: 80 million
- Size: ~79MB
- Architecture: StyleTTS 2
- Sample Rate: 24kHz
- GPU Required: No (works on CPU)
- KittenML for the amazing TTS model
- StyleTTS 2 architecture
This project follows the license of the underlying KittenTTS model. Please check the original repository for licensing details.
If the model fails to download on first run:
- Ensure you have internet connectivity
- The model will be cached in the Docker volume for subsequent runs
- Manual cache location:
/root/.cache/huggingface
The model requires approximately 500MB-1GB of RAM:
# Increase container memory
docker update --memory 2g kitten-tts# Find what's using port 8000
lsof -i :8000
# Change port in .env
PORT=8001- Try different voices to find the best match
- Ensure your audio player supports the output format
- Try WAV format for highest quality
If you modify code and need to rebuild:
# Stop current container
docker-compose down
# Rebuild image
docker-compose build --no-cache
# Start updated service
docker-compose up -d
# View logs to verify
docker-compose logs -f- Model Issues: KittenTTS HuggingFace
- API/Deployment Issues: See documentation in this repository
- Examples: Check
examples/folder for usage code
| Document | Purpose |
|---|---|
QUICKSTART.md |
Get started in 5 minutes |
PUBLISHING_GUIDE.md |
Publish Docker images to registries |
ARCHITECTURE.md |
System architecture and design |
PROJECT_SUMMARY.md |
Complete project overview |
examples/ |
Code examples in Python and JavaScript |
Happy Text-to-Speech! π±π΅