Kokoro TTS FastAPI Server

OpenAI-compatible Text-to-Speech server powered by Kokoro. Runs entirely on CPU with no external API calls -- drop it into any OpenAI-compatible client as a local TTS backend.

Features

OpenAI-compatible endpoint -- same request/response format as OpenAI's TTS API
54 voices across 10+ languages and accents
Multiple output formats -- WAV, MP3, OGG, M4A (converted via ffmpeg)
Local inference -- runs entirely on CPU, no external API calls
Concurrent request handling -- semaphore + thread pool for parallel generation
Environment-driven config -- host, port, concurrency, defaults, and auth all via env vars

Quick Start

git clone https://github.com/lobstersyrup/kokoro-tts.git
cd kokoro-tts

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install system dependency
sudo apt install ffmpeg

# Install Python dependencies
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt

# Run
python kokoro_server.py

The server starts on http://0.0.0.0:8880. The Kokoro model (~100MB) is downloaded automatically on first run.

Usage

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello, this is a test of the Kokoro TTS system.",
    "voice": "af_heart",
    "speed": 1.0,
    "response_format": "mp3"
  }' \
  --output speech.mp3

OpenAI Python Client

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="http://localhost:8880/v1"
)

response = client.audio.speech.create(
    model="tts-1",
    input="Hello from Kokoro!",
    voice="af_heart",
    response_format="mp3"
)
response.stream_to_file("output.mp3")

Configuration

All settings are controlled via environment variables:

Variable	Default	Description
`KOKORO_HOST`	`0.0.0.0`	Bind address
`KOKORO_PORT`	`8880`	Listen port
`KOKORO_MAX_CONCURRENT`	`4`	Max parallel TTS generations
`KOKORO_DEFAULT_VOICE`	`af_heart`	Voice used when none specified
`KOKORO_DEFAULT_MODEL`	`tts-1`	Model field default (OpenAI compat)
`KOKORO_API_KEY`	(empty)	API key for auth; empty = no auth

Voice Fallback Logic

The model and voice fields both accept a Kokoro voice name. The server resolves the active voice like this:

If voice is set to something other than KOKORO_DEFAULT_VOICE → use it
Otherwise, if model is set to something other than KOKORO_DEFAULT_MODEL → use it as the voice
Otherwise → use KOKORO_DEFAULT_VOICE

This means OpenAI clients sending model="tts-1", voice="af_heart" get the default voice, while model="af_bella" works as a shorthand for voice selection.

Deployment

systemd (Linux)

Create a user service for automatic startup and crash recovery:

# Create the unit file
cat > ~/.config/systemd/user/kokoro-tts.service << 'EOF'
[Unit]
Description=Kokoro TTS Server
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/path/to/venv/bin/python /path/to/kokoro-tts/kokoro_server.py
Restart=always
RestartSec=5
WorkingDirectory=/path/to/kokoro-tts
Environment=HOME=%h
Environment=PATH=/path/to/venv/bin:/usr/bin:/bin

[Install]
WantedBy=default.target
EOF

# Enable and start
systemctl --user daemon-reload
systemctl --user enable --now kokoro-tts

# Verify
systemctl --user status kokoro-tts
curl http://localhost:8880/health

To override config, add Environment=KOKORO_PORT=8881 (or any other variable) to the [Service] section and restart.

Pass custom environment variables by adding lines to the [Service] block:

Environment=KOKORO_PORT=8881
Environment=KOKORO_MAX_CONCURRENT=2
Environment=KOKORO_API_KEY=your-secret-key

Docker

# Build and start with defaults
docker compose up -d

# Or with custom configuration
KOKORO_PORT=8881 KOKORO_MAX_CONCURRENT=2 docker compose up -d

# Check health
curl http://localhost:8880/health

The Docker image includes ffmpeg, all Python dependencies, and pre-downloads the Kokoro model so the first request is instant. A named volume (kokoro-model-cache) persists the model across container rebuilds.

Custom Docker Compose Overrides

Create a docker-compose.override.yml for persistent configuration:

services:
  kokoro-tts:
    ports:
      - "8881:8881"
    environment:
      - KOKORO_PORT=8881
      - KOKORO_MAX_CONCURRENT=2
      - KOKORO_API_KEY=your-secret-key

Endpoints

Endpoint	Method	Description
`/v1/audio/speech`	POST	Generate speech (OpenAI-compatible)
`/v1/models`	GET	List all 54 available voices
`/health`	GET	Health check with voice count, formats, uptime

Supported Formats

Format	Extension	Notes
`wav`	`.wav`	PCM, no conversion -- fastest
`mp3`	`.mp3`	MPEG audio (default)
`ogg`	`.ogg`	Ogg Vorbis
`m4a`	`.m4a`	AAC in M4A container

Voice List

All 54 available voices. Prefix indicates language/accent:

Prefix	Language/Accent
`af_`	American Female
`am_`	American Male
`bf_`	British Female
`bm_`	British Male
`ef_`	European Female
`em_`	European Male
`ff_`	French Female
`hf_`	Hindi Female
`hm_`	Hindi Male
`if_`	Italian Female
`im_`	Italian Male
`jf_`	Japanese Female
`jm_`	Japanese Male
`pf_`	Polish Female
`pm_`	Polish Male
`zf_`	Chinese Female
`zm_`	Mandarin Male

American Female (`af_`)

Voice	Description
`af_alloy`	Warm, versatile. Good all-rounder
`af_aoede`	Soft, melodic with a gentle tone
`af_bella`	Bright, warm. Popular choice
`af_heart`	Expressive, emotionally rich. Great for engaging content
`af_jessica`	Clear, professional
`af_kore`	Neutral, balanced
`af_nicole`	Soft-spoken, calm
`af_nova`	Upbeat, energetic
`af_river`	Smooth, flowing with a relaxed cadence
`af_sarah`	Neutral, clear
`af_sky`	Bright, cheerful

American Male (`am_`)

Voice	Description
`am_adam`	Deep, authoritative
`am_echo`	Warm, steady with a calm presence
`am_eric`	Friendly, conversational
`am_fenrir`	Serious, deep
`am_liam`	Clear, young
`am_michael`	Warm, mature
`am_onyx`	Deep, rich
`am_puck`	Lighter, youthful
`am_santa`	Deep, jolly (character)

British Female (`bf_`)

Voice	Description
`bf_alice`	Elegant, refined
`bf_emma`	Warm, sophisticated
`bf_isabella`	Clear, articulate
`bf_lily`	Bright, pleasant

British Male (`bm_`)

Voice	Description
`bm_daniel`	Clear, professional
`bm_fable`	Expressive, storytelling
`bm_george`	Deep, authoritative
`bm_lewis`	Warm, friendly

European (`ef_` / `em_` / `ff_`)

Voice	Description
`ef_dora`	European female
`em_alex`	European male, versatile
`em_santa`	European male, deeper tone
`ff_siwis`	French Swiss female

Hindi (`hf_` / `hm_`)

Voice	Description
`hf_alpha`	Hindi female, clear and precise
`hf_beta`	Hindi female, softer tone
`hm_omega`	Hindi male, deep
`hm_psi`	Hindi male, calm

Italian (`if_` / `im_`)

Voice	Description
`if_sara`	Italian female, expressive
`im_nicola`	Italian male, warm

Japanese (`jf_` / `jm_`)

Voice	Description
`jf_alpha`	Japanese female, clear
`jf_gongitsune`	Japanese female, softer
`jf_nezumi`	Japanese female, gentle
`jf_tebukuro`	Japanese female, warm
`jm_kumo`	Japanese male

Polish (`pf_` / `pm_`)

Voice	Description
`pf_dora`	Polish female
`pm_alex`	Polish male, warm
`pm_santa`	Polish male, deeper

Chinese Female (`zf_`)

Voice	Description
`zf_xiaobei`	Northern accent
`zf_xiaoni`	Youthful
`zf_xiaoxiao`	Standard
`zf_xiaoyi`	Gentle

Mandarin Male (`zm_`)

Voice	Description
`zm_yunjian`	Clear
`zm_yunxi`	Expressive
`zm_yunxia`	Warm
`zm_yunyang`	Deep

Architecture

FastAPI -- HTTP endpoints and request validation
Uvicorn -- ASGI server (single worker, async concurrency)
Kokoro -- local TTS inference pipeline
Semaphore + ThreadPoolExecutor -- safe concurrent generation on CPU
Soundfile -- WAV encoding; ffmpeg handles format conversion

The Kokoro pipeline is not thread-safe, so inference runs in a thread pool gated by a semaphore (default 4 concurrent). This avoids OOM while maintaining reasonable throughput. A single Uvicorn worker is used to avoid loading multiple model copies into memory.

Ports

Default: 8880 (configurable via KOKORO_PORT)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
kokoro-tts.service		kokoro-tts.service
kokoro_server.py		kokoro_server.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kokoro TTS FastAPI Server

Features

Quick Start

Usage

OpenAI Python Client

Configuration

Voice Fallback Logic

Deployment

systemd (Linux)

Docker

Custom Docker Compose Overrides

Endpoints

Supported Formats

Voice List

American Female (`af_`)

American Male (`am_`)

British Female (`bf_`)

British Male (`bm_`)

European (`ef_` / `em_` / `ff_`)

Hindi (`hf_` / `hm_`)

Italian (`if_` / `im_`)

Japanese (`jf_` / `jm_`)

Polish (`pf_` / `pm_`)

Chinese Female (`zf_`)

Mandarin Male (`zm_`)

Architecture

Ports

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kokoro TTS FastAPI Server

Features

Quick Start

Usage

OpenAI Python Client

Configuration

Voice Fallback Logic

Deployment

systemd (Linux)

Docker

Custom Docker Compose Overrides

Endpoints

Supported Formats

Voice List

American Female (af_)

American Male (am_)

British Female (bf_)

British Male (bm_)

European (ef_ / em_ / ff_)

Hindi (hf_ / hm_)

Italian (if_ / im_)

Japanese (jf_ / jm_)

Polish (pf_ / pm_)

Chinese Female (zf_)

Mandarin Male (zm_)

Architecture

Ports

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

American Female (`af_`)

American Male (`am_`)

British Female (`bf_`)

British Male (`bm_`)

European (`ef_` / `em_` / `ff_`)

Hindi (`hf_` / `hm_`)

Italian (`if_` / `im_`)

Japanese (`jf_` / `jm_`)

Polish (`pf_` / `pm_`)

Chinese Female (`zf_`)

Mandarin Male (`zm_`)

Packages