Skip to content

HUFSTech/supertonic-tts-web-server

Repository files navigation

supertonic-tts-web-server

A text-to-speech web server built with the Supertonic-2 model. Easy to install, use, and manage. No GPU required.

Table of Contents

  1. Features
  2. Requirements
  3. Quick Start with Linux Containers
    1. You can also build the image locally
    2. Data Persistence
    3. Usage
  4. Configuration

Features

  • FastAPI server with HTTP Basic Auth
  • Supertonic TTS integration (auto-downloads model at startup)
  • Concurrent request control via semaphore
  • WAV output
  • Docker-friendly entrypoint

Requirements

  • Python 3.13
    • It will likely be compatible with all versions supporting ONNX, starting from Python 3.10 and above.

Quick Start with Linux Containers

Build and run using the provided Containerfile and entrypoint.sh:

docker pull docker.io/hufs24/supertonic-tts-web-server:0.0.1
docker run --rm -p 8080:80 --env-file .env supertonic-tts-web-server

You can also build the image locally:

docker build -t supertonic-tts-web-server:0.0.1 -f Containerfile .
docker run --rm -p 8080:80 --env-file .env supertonic-tts-web-server

Note

The container runs gunicorn on port 80 by default.

You can build locally to set up other platforms, such as Windows.
  1. Install Python 3.13
  2. Install dependencies:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Start the server

  • For the test server:
python main.py
  • Or you can use uvicorn in a command line:
python -m uvicorn src.main:app --host 0.0.0.0 --port 80 --workers 1
  • Using Gunicorn is more recommended for production:
pip install gunicorn
gunicorn src.main:app -w 2 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:80

Data Persistence

All generated WAV files are stored in the temp directory unless you set the TEMP_DIR environment variable. To persist generated files in a containerized deployment, mount a volume to /app/temp or to the custom directory specified by TEMP_DIR.

Usage

This server is designed around a simple API workflow. openapi.json(please visit hufstech.com to find the document) is the right place to check request and response schemas, but the typical lifecycle is easier to understand as an end-to-end flow:

  • Authenticate every request with HTTP Basic Auth.
    • All TTS and file-management endpoints require the same HTTP Basic credentials.
  • Send a POST /synthesize request with the text and synthesis options.
  • Receive the generated WAV file immediately in the response body.
  • Reuse the saved file later from the temporary storage if you need to download it again, inspect what has been generated, or clean it up.

Typical Flow

  1. Start the server and make sure your client can reach it.
  2. Send a synthesis request to POST /synthesize.
  3. Save the response body as a .wav file on the client side if you want to play it immediately.
  4. If you need to manage previously generated files, call GET /files to inspect what is stored in the server's temporary directory.
  5. Download a specific stored file again with GET /file/{file_name}.
  6. Delete a single file with DELETE /file/{file_name} or clear the whole temporary directory with DELETE /files or DELETE /clean.

What Happens When You Call /synthesize

  • The server validates the request body and checks authentication.
  • The input text is rejected if it exceeds MAXIMUM_LENGTH environment variable.
  • Character substitutions from CHARACTER_SUBSTITUTION environment variable are applied before synthesis.
  • The Supertonic model generates audio and the server writes it to the temporary directory as a WAV file.
  • The same WAV file is also returned directly in the HTTP response as audio/wav.

This means /synthesize is both a generation endpoint and a persistence step. You do not need to call another endpoint to make the file available for later download.

Minimal Example

Generate audio and save the response locally:

curl -u admin:admin \
  -X POST http://localhost:8080/synthesize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, good morning!",
    "voice": "F1",
    "total_steps": 5,
    "speed": 1.05,
    "lang": "en"
  }' \
  --output hello.wav

List saved files on the server:

curl -u admin:admin http://localhost:8080/files

Download one of the saved files again:

curl -u admin:admin \
  http://localhost:8080/file/<file_name> \
  --output downloaded.wav

Delete a single saved file:

curl -u admin:admin -X DELETE http://localhost:8080/file/<file_name>

Clean the entire temporary directory:

curl -u admin:admin -X DELETE http://localhost:8080/files

Operational Notes

  • POST /synthesize can return 429 Too Many Requests when the per-worker concurrency limit is exhausted.
  • Generated files remain in TEMP_DIR until you delete them or clean the directory.
  • DELETE /files and DELETE /clean are equivalent aliases.
  • If you enable WEB_PLAYGROUND in EXTRA_FEATURES environment variable, the browser UI becomes an optional convenience layer on top of the same API workflow. See the below section for more details.

Configuration

The server loads environment variables from .env(absolute path /app/.env when using Linux container) at startup. Use .env.example as a template. You can create a .env file or set it as an environment variable.

Server and Runtime

  • WORKERS
    • Gunicorn worker count (default: 2)
    • Must be a non-negative integer
  • GUNICORN_EXTRA_ARGS
    • Extra gunicorn CLI args (optional)

Authentication

  • USERNAME
    • HTTP Basic Auth username (default: admin)
  • PASSWORD
    • HTTP Basic Auth password (default: admin)

Concurrency and Performance

  • MAXIMUM_CONCURRENT_INFERENCE
    • Max concurrent synthesis requests per worker (default: 1)
  • SUPERTONIC_INTRA_OP_THREADS
    • ONNX intra-op thread count (optional)
  • SUPERTONIC_INTER_OP_THREADS
    • ONNX inter-op thread count (optional, recommended 1 if overriding)

Resource Management

  • ACQUIRE_TIMEOUT_SECONDS
    • Semaphore acquire timeout in seconds (default: 1.0)
  • TEMP_DIR
    • Directory for temporary WAV files (default: temp)
  • MAXIMUM_LENGTH
    • Max input text length (default: 300 chars)

Features

  • CHARACTER_SUBSTITUTION

    • JSON object of characters to remove or replace from text when Supertonic generates TTS (default: {"「": "\"", "」": "\"", "·": ","})
  • EXTRA_FEATURES

    • JSON list of additional features to enable (default: [])
    • Available features:
      • WEB_PLAYGROUND: Enables the HTML page that allows you to use the server from a browser. It's not suitable for production release. It should only be used for personal use or for testing purposes.

Logging Configuration

The server uses Loguru for sophisticated logging, which can be configured through the following environment variables.

  • LOG_LEVEL

    • Minimum severity level to log (default: INFO)
    • Options: DEBUG, INFO, SUCCESS, WARNING, ERROR, CRITICAL
  • LOG_FORMAT

    • Custom format string for log messages
    • Default: <green>{time:YYYY-MM-DD HH:mm:ss.SSS}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>
  • LOG_FILE

    • Path to a file where logs will be saved (optional)
    • If set, logs will be written to this file in addition to standard output.
  • LOG_ROTATION

    • Condition for rotating the log file (default: 10 MB)
    • This environment variable is only used if LOG_FILE is set.
    • Examples: 100 MB, 00:00, 1 week, 10 days
  • LOG_RETENTION

    • Duration or number of files to keep for old logs (default: 1 week)
    • This environment variable is only used if LOG_FILE is set.
    • Examples: 10 days, 2 months
  • LOG_SERIALIZE

    • Whether to output logs in JSON format (default: false)
    • Set to true for structured logging, useful for log aggregation systems.
  • LOG_ENQUEUE

    • Whether to enable asynchronous, non-blocking logging (default: true)
    • Highly recommended for FastAPI environments to ensure logging operations do not block the event loop.

Development

  • Run tests:
pytest

License

The web server license is BSD 2-clause.

The Supertonic model adopts the BigScience Open RAIL-M License. Supertonic is a trademark of Supertone.

About

A Text-To-Speach Self-hostable web server built with supertonic. Easy to install, use, and manage. No GPU required

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors