supertonic-tts-web-server

A text-to-speech web server built with the Supertonic-2 model. Easy to install, use, and manage. No GPU required.

Features

FastAPI server with HTTP Basic Auth
Supertonic TTS integration (auto-downloads model at startup)
Concurrent request control via semaphore
WAV output
Docker-friendly entrypoint

Requirements

Python 3.13
- It will likely be compatible with all versions supporting ONNX, starting from Python 3.10 and above.

Quick Start with Linux Containers

Build and run using the provided Containerfile and entrypoint.sh:

docker pull docker.io/hufs24/supertonic-tts-web-server:0.0.1
docker run --rm -p 8080:80 --env-file .env supertonic-tts-web-server

You can also build the image locally:

docker build -t supertonic-tts-web-server:0.0.1 -f Containerfile .
docker run --rm -p 8080:80 --env-file .env supertonic-tts-web-server

Note

The container runs gunicorn on port 80 by default.

You can build locally to set up other platforms, such as Windows.

Install Python 3.13
Install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Start the server

For the test server:

python main.py

Or you can use uvicorn in a command line:

python -m uvicorn src.main:app --host 0.0.0.0 --port 80 --workers 1

Using Gunicorn is more recommended for production:

pip install gunicorn

gunicorn src.main:app -w 2 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:80

Data Persistence

All generated WAV files are stored in the temp directory unless you set the TEMP_DIR environment variable. To persist generated files in a containerized deployment, mount a volume to /app/temp or to the custom directory specified by TEMP_DIR.

Usage

This server is designed around a simple API workflow. openapi.json(please visit hufstech.com to find the document) is the right place to check request and response schemas, but the typical lifecycle is easier to understand as an end-to-end flow:

Authenticate every request with HTTP Basic Auth.
- All TTS and file-management endpoints require the same HTTP Basic credentials.
Send a POST /synthesize request with the text and synthesis options.
Receive the generated WAV file immediately in the response body.
Reuse the saved file later from the temporary storage if you need to download it again, inspect what has been generated, or clean it up.

Typical Flow

Start the server and make sure your client can reach it.
Send a synthesis request to POST /synthesize.
Save the response body as a .wav file on the client side if you want to play it immediately.
If you need to manage previously generated files, call GET /files to inspect what is stored in the server's temporary directory.
Download a specific stored file again with GET /file/{file_name}.
Delete a single file with DELETE /file/{file_name} or clear the whole temporary directory with DELETE /files or DELETE /clean.

What Happens When You Call `/synthesize`

The server validates the request body and checks authentication.
The input text is rejected if it exceeds MAXIMUM_LENGTH environment variable.
Character substitutions from CHARACTER_SUBSTITUTION environment variable are applied before synthesis.
The Supertonic model generates audio and the server writes it to the temporary directory as a WAV file.
The same WAV file is also returned directly in the HTTP response as audio/wav.

This means /synthesize is both a generation endpoint and a persistence step. You do not need to call another endpoint to make the file available for later download.

Minimal Example

Generate audio and save the response locally:

curl -u admin:admin \
  -X POST http://localhost:8080/synthesize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, good morning!",
    "voice": "F1",
    "total_steps": 5,
    "speed": 1.05,
    "lang": "en"
  }' \
  --output hello.wav

List saved files on the server:

curl -u admin:admin http://localhost:8080/files

Download one of the saved files again:

curl -u admin:admin \
  http://localhost:8080/file/<file_name> \
  --output downloaded.wav

Delete a single saved file:

curl -u admin:admin -X DELETE http://localhost:8080/file/<file_name>

Clean the entire temporary directory:

curl -u admin:admin -X DELETE http://localhost:8080/files

Operational Notes

POST /synthesize can return 429 Too Many Requests when the per-worker concurrency limit is exhausted.
Generated files remain in TEMP_DIR until you delete them or clean the directory.
DELETE /files and DELETE /clean are equivalent aliases.
If you enable WEB_PLAYGROUND in EXTRA_FEATURES environment variable, the browser UI becomes an optional convenience layer on top of the same API workflow. See the below section for more details.

Configuration

The server loads environment variables from .env(absolute path /app/.env when using Linux container) at startup. Use .env.example as a template. You can create a .env file or set it as an environment variable.

HF_TOKEN: HuggingFace API token (optional)
- The API token accelerates with the download of the model.

Server and Runtime

WORKERS
- Gunicorn worker count (default: 2)
- Must be a non-negative integer
GUNICORN_EXTRA_ARGS
- Extra gunicorn CLI args (optional)

Authentication

USERNAME
- HTTP Basic Auth username (default: admin)
PASSWORD
- HTTP Basic Auth password (default: admin)

Concurrency and Performance

MAXIMUM_CONCURRENT_INFERENCE
- Max concurrent synthesis requests per worker (default: 1)
SUPERTONIC_INTRA_OP_THREADS
- ONNX intra-op thread count (optional)
SUPERTONIC_INTER_OP_THREADS
- ONNX inter-op thread count (optional, recommended 1 if overriding)

Resource Management

ACQUIRE_TIMEOUT_SECONDS
- Semaphore acquire timeout in seconds (default: 1.0)
TEMP_DIR
- Directory for temporary WAV files (default: temp)
MAXIMUM_LENGTH
- Max input text length (default: 300 chars)

Features

CHARACTER_SUBSTITUTION
- JSON object of characters to remove or replace from text when Supertonic generates TTS (default: {"「": "\"", "」": "\"", "·": ","})
EXTRA_FEATURES
- JSON list of additional features to enable (default: [])
- Available features:
  - WEB_PLAYGROUND: Enables the HTML page that allows you to use the server from a browser. It's not suitable for production release. It should only be used for personal use or for testing purposes.

Logging Configuration

The server uses Loguru for sophisticated logging, which can be configured through the following environment variables.

LOG_LEVEL
- Minimum severity level to log (default: INFO)
- Options: DEBUG, INFO, SUCCESS, WARNING, ERROR, CRITICAL
LOG_FORMAT
- Custom format string for log messages
- Default: <green>{time:YYYY-MM-DD HH:mm:ss.SSS}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>
LOG_FILE
- Path to a file where logs will be saved (optional)
- If set, logs will be written to this file in addition to standard output.
LOG_ROTATION
- Condition for rotating the log file (default: 10 MB)
- This environment variable is only used if LOG_FILE is set.
- Examples: 100 MB, 00:00, 1 week, 10 days
LOG_RETENTION
- Duration or number of files to keep for old logs (default: 1 week)
- This environment variable is only used if LOG_FILE is set.
- Examples: 10 days, 2 months
LOG_SERIALIZE
- Whether to output logs in JSON format (default: false)
- Set to true for structured logging, useful for log aggregation systems.
LOG_ENQUEUE
- Whether to enable asynchronous, non-blocking logging (default: true)
- Highly recommended for FastAPI environments to ensure logging operations do not block the event loop.

Development

Run tests:

pytest

License

The web server license is BSD 2-clause.

The Supertonic model adopts the BigScience Open RAIL-M License. Supertonic is a trademark of Supertone.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
extra_features/web_playground		extra_features/web_playground
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Containerfile		Containerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

supertonic-tts-web-server

Table of Contents

Features

Requirements

Quick Start with Linux Containers

You can also build the image locally:

Start the server

Data Persistence

Usage

Typical Flow

What Happens When You Call `/synthesize`

Minimal Example

Operational Notes

Configuration

Server and Runtime

Authentication

Concurrency and Performance

Resource Management

Features

Logging Configuration

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

supertonic-tts-web-server

Table of Contents

Features

Requirements

Quick Start with Linux Containers

You can also build the image locally:

Start the server

Data Persistence

Usage

Typical Flow

What Happens When You Call /synthesize

Minimal Example

Operational Notes

Configuration

Server and Runtime

Authentication

Concurrency and Performance

Resource Management

Features

Logging Configuration

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What Happens When You Call `/synthesize`

Packages