Skip to content

mrcn/oreado-tts

Repository files navigation

Oreado TTS Service

A high-quality Text-to-Speech API built with Modal (serverless GPU) and Coqui XTTS v2.


Quick Start

Your API Endpoint

https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run

Make a Request

curl -X POST "https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, this is a test.", "voiceId": "default"}'

API Reference

Synthesize Speech

Endpoint: POST https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run

Headers:

Header Value
Content-Type application/json

Request Body:

{
  "text": "The text you want to convert to speech",
  "voiceId": "default"
}
Field Type Required Description
text string Yes The text to synthesize (any length)
voiceId string No Voice to use (default: "default"). Corresponds to WAV files in voices/ folder

Response (Success - 200):

{
  "audioContent": "UklGRkTMAQBXQVZF...",
  "mimeType": "audio/wav",
  "isMock": false
}
Field Type Description
audioContent string Base64-encoded WAV audio data
mimeType string Always "audio/wav"
isMock boolean Always false for real synthesis

Response (Error):

{
  "error": "Text is required"
}

Usage Examples

cURL

# Basic request
curl -X POST "https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voiceId": "default"}'

# Save audio to file
curl -X POST "https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voiceId": "default"}' \
  | jq -r '.audioContent' | base64 -d > output.wav

Test Script

./test_api.sh "https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run" "Hello world" default
# Creates output.wav in current directory

JavaScript / TypeScript

async function synthesizeSpeech(text, voiceId = "default") {
  const response = await fetch(
    "https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run",
    {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ text, voiceId }),
    }
  );

  const data = await response.json();

  if (data.error) {
    throw new Error(data.error);
  }

  // Convert base64 to audio blob
  const audioBytes = atob(data.audioContent);
  const audioArray = new Uint8Array(audioBytes.length);
  for (let i = 0; i < audioBytes.length; i++) {
    audioArray[i] = audioBytes.charCodeAt(i);
  }

  return new Blob([audioArray], { type: "audio/wav" });
}

// Usage
const audioBlob = await synthesizeSpeech("Hello, how are you?");
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();

Python

import requests
import base64

def synthesize_speech(text: str, voice_id: str = "default") -> bytes:
    response = requests.post(
        "https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run",
        json={"text": text, "voiceId": voice_id},
        headers={"Content-Type": "application/json"}
    )

    data = response.json()

    if "error" in data:
        raise Exception(data["error"])

    return base64.b64decode(data["audioContent"])

# Usage
audio_data = synthesize_speech("Hello, this is a test.")
with open("output.wav", "wb") as f:
    f.write(audio_data)

Next.js API Route

See integration_route.ts for a complete example. Set TTS_API_URL environment variable:

TTS_API_URL=https://mrcn--oreado-tts-ttsmodel-synthesize.modal.run

Adding Custom Voices

  1. Record or obtain a WAV file of the voice you want to clone (3-10 seconds of clear speech works best)
  2. Place it in the voices/ directory:
    voices/
    ├── default.wav
    ├── journey.wav
    └── myvoice.wav
    
  3. Redeploy:
    modal deploy app.py
  4. Use your voice:
    {"text": "Hello", "voiceId": "myvoice"}

Voice file requirements:

  • Format: WAV
  • Sample rate: 22050 Hz or 24000 Hz recommended
  • Duration: 3-10 seconds of clear speech
  • Quality: Clean audio without background noise

Deployment

Prerequisites

pip install modal
modal setup  # Opens browser to authenticate

Deploy

modal deploy app.py

View Logs

modal app logs oreado-tts

Dashboard

Visit: https://modal.com/apps/mrcn/main/deployed/oreado-tts


Architecture

Component Description
Modal Serverless GPU platform - handles scaling, cold starts
XTTS v2 Coqui's multilingual TTS model with voice cloning
GPU Any available GPU (auto-selected by Modal)
Scaledown 5 minutes - containers stay warm for 5 min after last request

Cold Start Performance

  • First request: ~15-30 seconds (model loads into GPU)
  • Subsequent requests: ~2-5 seconds (warm container)
  • After 5 min idle: Container scales down, next request is cold

Costs

Modal charges per GPU-second. Approximate costs:

  • ~$0.0003/second for GPU time
  • Model is pre-downloaded in image (no download time on cold start)
  • Containers stay warm for 5 minutes (configurable via scaledown_window)

Troubleshooting

"Method Not Allowed"

You're making a GET request (e.g., visiting URL in browser). Use POST instead:

curl -X POST "https://..." -H "Content-Type: application/json" -d '{"text": "test"}'

Timeout / Connection Reset

Cold start is taking too long. Wait and retry - subsequent requests will be faster.

"Voice reference not found"

The voiceId doesn't match any WAV file in voices/. Use "default" or add your voice file.

Check Logs

modal app logs oreado-tts

About

Text-to-Speech service using Modal serverless GPU and Coqui XTTS v2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •