A high-quality Text-to-Speech API built with Modal (serverless GPU) and Coqui XTTS v2.
https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run
curl -X POST "https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run" \
-H "Content-Type: application/json" \
-d '{"text": "Hello, this is a test.", "voiceId": "default"}'Endpoint: POST https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run
Headers:
| Header | Value |
|---|---|
| Content-Type | application/json |
Request Body:
{
"text": "The text you want to convert to speech",
"voiceId": "default"
}| Field | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | The text to synthesize (any length) |
voiceId |
string | No | Voice to use (default: "default"). Corresponds to WAV files in voices/ folder |
Response (Success - 200):
{
"audioContent": "UklGRkTMAQBXQVZF...",
"mimeType": "audio/wav",
"isMock": false
}| Field | Type | Description |
|---|---|---|
audioContent |
string | Base64-encoded WAV audio data |
mimeType |
string | Always "audio/wav" |
isMock |
boolean | Always false for real synthesis |
Response (Error):
{
"error": "Text is required"
}# Basic request
curl -X POST "https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run" \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voiceId": "default"}'
# Save audio to file
curl -X POST "https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run" \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voiceId": "default"}' \
| jq -r '.audioContent' | base64 -d > output.wav./test_api.sh "https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run" "Hello world" default
# Creates output.wav in current directoryasync function synthesizeSpeech(text, voiceId = "default") {
const response = await fetch(
"https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run",
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ text, voiceId }),
}
);
const data = await response.json();
if (data.error) {
throw new Error(data.error);
}
// Convert base64 to audio blob
const audioBytes = atob(data.audioContent);
const audioArray = new Uint8Array(audioBytes.length);
for (let i = 0; i < audioBytes.length; i++) {
audioArray[i] = audioBytes.charCodeAt(i);
}
return new Blob([audioArray], { type: "audio/wav" });
}
// Usage
const audioBlob = await synthesizeSpeech("Hello, how are you?");
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();import requests
import base64
def synthesize_speech(text: str, voice_id: str = "default") -> bytes:
response = requests.post(
"https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run",
json={"text": text, "voiceId": voice_id},
headers={"Content-Type": "application/json"}
)
data = response.json()
if "error" in data:
raise Exception(data["error"])
return base64.b64decode(data["audioContent"])
# Usage
audio_data = synthesize_speech("Hello, this is a test.")
with open("output.wav", "wb") as f:
f.write(audio_data)See integration_route.ts for a complete example. Set TTS_API_URL environment variable:
TTS_API_URL=https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run- Record or obtain a WAV file of the voice you want to clone (3-10 seconds of clear speech works best)
- Place it in the
voices/directory:voices/ ├── default.wav ├── journey.wav └── myvoice.wav - Redeploy:
modal deploy app.py
- Use your voice:
{"text": "Hello", "voiceId": "myvoice"}
Voice file requirements:
- Format: WAV
- Sample rate: 22050 Hz or 24000 Hz recommended
- Duration: 3-10 seconds of clear speech
- Quality: Clean audio without background noise
pip install modal
modal setup # Opens browser to authenticatemodal deploy app.pymodal app logs oreado-ttsVisit: https://modal.com/apps/mrcn/main/deployed/oreado-tts
| Component | Description |
|---|---|
| Modal | Serverless GPU platform - handles scaling, cold starts |
| XTTS v2 | Coqui's multilingual TTS model with voice cloning |
| GPU | Any available GPU (auto-selected by Modal) |
| Scaledown | 5 minutes - containers stay warm for 5 min after last request |
- First request: ~15-30 seconds (model loads into GPU)
- Subsequent requests: ~2-5 seconds (warm container)
- After 5 min idle: Container scales down, next request is cold
Modal charges per GPU-second. Approximate costs:
- ~$0.0003/second for GPU time
- Model is pre-downloaded in image (no download time on cold start)
- Containers stay warm for 5 minutes (configurable via
scaledown_window)
You're making a GET request (e.g., visiting URL in browser). Use POST instead:
curl -X POST "https://..." -H "Content-Type: application/json" -d '{"text": "test"}'Cold start is taking too long. Wait and retry - subsequent requests will be faster.
The voiceId doesn't match any WAV file in voices/. Use "default" or add your voice file.
modal app logs oreado-tts