A text-to-speech web server built with the Supertonic-2 model. Easy to install, use, and manage. No GPU required.
- FastAPI server with HTTP Basic Auth
- Supertonic TTS integration (auto-downloads model at startup)
- Concurrent request control via semaphore
- WAV output
- Docker-friendly entrypoint
- Python 3.13
- It will likely be compatible with all versions supporting ONNX, starting from Python 3.10 and above.
Build and run using the provided Containerfile and entrypoint.sh:
docker pull docker.io/hufs24/supertonic-tts-web-server:0.0.1
docker run --rm -p 8080:80 --env-file .env supertonic-tts-web-serverdocker build -t supertonic-tts-web-server:0.0.1 -f Containerfile .
docker run --rm -p 8080:80 --env-file .env supertonic-tts-web-serverNote
The container runs gunicorn on port 80 by default.
You can build locally to set up other platforms, such as Windows.
- Install Python 3.13
- Install dependencies:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt- For the test server:
python main.py- Or you can use uvicorn in a command line:
python -m uvicorn src.main:app --host 0.0.0.0 --port 80 --workers 1- Using Gunicorn is more recommended for production:
pip install gunicorngunicorn src.main:app -w 2 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:80All generated WAV files are stored in the temp directory unless you set the TEMP_DIR environment variable. To persist generated files in a containerized deployment, mount a volume to /app/temp or to the custom directory specified by TEMP_DIR.
This server is designed around a simple API workflow. openapi.json(please visit hufstech.com to find the document) is the right place to check request and response schemas, but the typical lifecycle is easier to understand as an end-to-end flow:
- Authenticate every request with HTTP Basic Auth.
- All TTS and file-management endpoints require the same HTTP Basic credentials.
- Send a
POST /synthesizerequest with the text and synthesis options. - Receive the generated WAV file immediately in the response body.
- Reuse the saved file later from the temporary storage if you need to download it again, inspect what has been generated, or clean it up.
- Start the server and make sure your client can reach it.
- Send a synthesis request to
POST /synthesize. - Save the response body as a
.wavfile on the client side if you want to play it immediately. - If you need to manage previously generated files, call
GET /filesto inspect what is stored in the server's temporary directory. - Download a specific stored file again with
GET /file/{file_name}. - Delete a single file with
DELETE /file/{file_name}or clear the whole temporary directory withDELETE /filesorDELETE /clean.
- The server validates the request body and checks authentication.
- The input text is rejected if it exceeds
MAXIMUM_LENGTHenvironment variable. - Character substitutions from
CHARACTER_SUBSTITUTIONenvironment variable are applied before synthesis. - The Supertonic model generates audio and the server writes it to the temporary directory as a WAV file.
- The same WAV file is also returned directly in the HTTP response as
audio/wav.
This means /synthesize is both a generation endpoint and a persistence step. You do not need to call another endpoint to make the file available for later download.
Generate audio and save the response locally:
curl -u admin:admin \
-X POST http://localhost:8080/synthesize \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, good morning!",
"voice": "F1",
"total_steps": 5,
"speed": 1.05,
"lang": "en"
}' \
--output hello.wavList saved files on the server:
curl -u admin:admin http://localhost:8080/filesDownload one of the saved files again:
curl -u admin:admin \
http://localhost:8080/file/<file_name> \
--output downloaded.wavDelete a single saved file:
curl -u admin:admin -X DELETE http://localhost:8080/file/<file_name>Clean the entire temporary directory:
curl -u admin:admin -X DELETE http://localhost:8080/filesPOST /synthesizecan return429 Too Many Requestswhen the per-worker concurrency limit is exhausted.- Generated files remain in
TEMP_DIRuntil you delete them or clean the directory. DELETE /filesandDELETE /cleanare equivalent aliases.- If you enable
WEB_PLAYGROUNDinEXTRA_FEATURESenvironment variable, the browser UI becomes an optional convenience layer on top of the same API workflow. See the below section for more details.
The server loads environment variables from .env(absolute path /app/.env when using Linux container) at startup. Use .env.example as a template. You can create a .env file or set it as an environment variable.
HF_TOKEN: HuggingFace API token (optional)
WORKERS- Gunicorn worker count (default:
2) - Must be a non-negative integer
- Gunicorn worker count (default:
GUNICORN_EXTRA_ARGS- Extra gunicorn CLI args (optional)
USERNAME- HTTP Basic Auth username (default:
admin)
- HTTP Basic Auth username (default:
PASSWORD- HTTP Basic Auth password (default:
admin)
- HTTP Basic Auth password (default:
MAXIMUM_CONCURRENT_INFERENCE- Max concurrent synthesis requests per worker (default:
1)
- Max concurrent synthesis requests per worker (default:
SUPERTONIC_INTRA_OP_THREADS- ONNX intra-op thread count (optional)
SUPERTONIC_INTER_OP_THREADS- ONNX inter-op thread count (optional, recommended
1if overriding)
- ONNX inter-op thread count (optional, recommended
ACQUIRE_TIMEOUT_SECONDS- Semaphore acquire timeout in seconds (default:
1.0)
- Semaphore acquire timeout in seconds (default:
TEMP_DIR- Directory for temporary WAV files (default:
temp)
- Directory for temporary WAV files (default:
MAXIMUM_LENGTH- Max input text length (default:
300chars)
- Max input text length (default:
-
CHARACTER_SUBSTITUTION- JSON object of characters to remove or replace from text when Supertonic generates TTS (default:
{"「": "\"", "」": "\"", "·": ","})
- JSON object of characters to remove or replace from text when Supertonic generates TTS (default:
-
EXTRA_FEATURES- JSON list of additional features to enable (default:
[]) - Available features:
WEB_PLAYGROUND: Enables the HTML page that allows you to use the server from a browser. It's not suitable for production release. It should only be used for personal use or for testing purposes.
- JSON list of additional features to enable (default:
The server uses Loguru for sophisticated logging, which can be configured through the following environment variables.
-
LOG_LEVEL- Minimum severity level to log (default:
INFO) - Options:
DEBUG,INFO,SUCCESS,WARNING,ERROR,CRITICAL
- Minimum severity level to log (default:
-
LOG_FORMAT- Custom format string for log messages
- Default:
<green>{time:YYYY-MM-DD HH:mm:ss.SSS}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - <level>{message}</level>
-
LOG_FILE- Path to a file where logs will be saved (optional)
- If set, logs will be written to this file in addition to standard output.
-
LOG_ROTATION- Condition for rotating the log file (default:
10 MB) - This environment variable is only used if
LOG_FILEis set. - Examples:
100 MB,00:00,1 week,10 days
- Condition for rotating the log file (default:
-
LOG_RETENTION- Duration or number of files to keep for old logs (default:
1 week) - This environment variable is only used if
LOG_FILEis set. - Examples:
10 days,2 months
- Duration or number of files to keep for old logs (default:
-
LOG_SERIALIZE- Whether to output logs in JSON format (default:
false) - Set to
truefor structured logging, useful for log aggregation systems.
- Whether to output logs in JSON format (default:
-
LOG_ENQUEUE- Whether to enable asynchronous, non-blocking logging (default:
true) - Highly recommended for FastAPI environments to ensure logging operations do not block the event loop.
- Whether to enable asynchronous, non-blocking logging (default:
- Run tests:
pytestThe web server license is BSD 2-clause.
The Supertonic model adopts the BigScience Open RAIL-M License. Supertonic is a trademark of Supertone.