Skip to content

Latest commit

 

History

History
602 lines (487 loc) · 11.9 KB

File metadata and controls

602 lines (487 loc) · 11.9 KB

Uzbek TTS API Documentation

Complete API reference for Uzbek TTS service.

Base URL

http://localhost:8000

Replace with your actual domain when deployed.

Authentication

Currently no authentication required. Add authentication middleware if needed for production.


API Endpoints

Health & Monitoring

GET /

Root endpoint with service information.

Response:

{
  "service": "Uzbek TTS API",
  "version": "1.0.0",
  "status": "running",
  "docs": "/docs",
  "health": "/health"
}

GET /health

Health check endpoint.

Response:

{
  "status": "healthy",
  "gpu_available": true,
  "model_loaded": true,
  "cache_size": 5,
  "uptime": 3600.5
}

GET /stats

Performance statistics.

Response:

{
  "total_generations": 150,
  "cache_hits": 120,
  "cache_misses": 30,
  "cache_hit_rate": 0.8,
  "avg_latency": 1.2,
  "cache_size": 5
}

GET /gpu/stats

GPU statistics (requires NVIDIA GPU).

Response:

{
  "gpu_name": "NVIDIA A40",
  "gpu_utilization": 65.5,
  "memory_used": 12288.5,
  "memory_total": 49152.0,
  "memory_utilization": 25.0,
  "temperature": 72.0,
  "power_usage": 180.5
}

Reference Audio Management

POST /v1/reference/upload

Upload a reference audio file for voice cloning.

Request:

  • Method: POST
  • Content-Type: multipart/form-data

Parameters:

  • audio_file (file, required): Audio file (WAV, MP3, OGG)
  • ref_id (string, required): Unique identifier for this reference
  • ref_text (string, required): Transcription of the reference audio
  • description (string, optional): Optional description

Example:

curl -X POST http://localhost:8000/v1/reference/upload \
  -F "audio_file=@my_voice.wav" \
  -F "ref_id=my_uzbek_voice" \
  -F "ref_text=Salom, mening ismim Ali." \
  -F "description=Male voice, clear pronunciation"

Response:

{
  "ref_id": "my_uzbek_voice",
  "ref_text": "Salom, mening ismim Ali.",
  "description": "Male voice, clear pronunciation",
  "duration": 3.5,
  "sample_rate": 24000,
  "created_at": "2025-01-28T10:30:00"
}

GET /v1/reference/list

List all uploaded reference audio files.

Response:

[
  {
    "ref_id": "voice1",
    "ref_text": "Salom dunyo",
    "description": "Test voice",
    "duration": 2.5,
    "sample_rate": 24000,
    "created_at": "2025-01-28T10:00:00"
  },
  {
    "ref_id": "voice2",
    "ref_text": "Assalomu alaykum",
    "description": "",
    "duration": 3.0,
    "sample_rate": 24000,
    "created_at": "2025-01-28T10:15:00"
  }
]

GET /v1/reference/{ref_id}

Get information about a specific reference.

Parameters:

  • ref_id (path, required): Reference ID

Example:

curl http://localhost:8000/v1/reference/my_uzbek_voice

Response:

{
  "ref_id": "my_uzbek_voice",
  "ref_text": "Salom, mening ismim Ali.",
  "description": "Male voice, clear pronunciation",
  "duration": 3.5,
  "sample_rate": 24000,
  "created_at": "2025-01-28T10:30:00"
}

DELETE /v1/reference/{ref_id}

Delete a reference audio.

Example:

curl -X DELETE http://localhost:8000/v1/reference/my_uzbek_voice

Response:

{
  "success": true,
  "message": "Reference 'my_uzbek_voice' deleted"
}

TTS Generation

POST /v1/tts/generate

Generate speech from text (returns base64-encoded audio).

Request:

{
  "text": "Assalomu alaykum! Qalaysizlar?",
  "ref_audio_id": "my_uzbek_voice",
  "speed": 1.0
}

Parameters:

  • text (string, required): Text to synthesize (1-5000 characters)
  • ref_audio_id (string, required): Reference audio ID
  • speed (float, optional): Speech speed (0.5-2.0, default: 1.0)

Example:

curl -X POST http://localhost:8000/v1/tts/generate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Assalomu alaykum! Qalaysizlar?",
    "ref_audio_id": "my_uzbek_voice",
    "speed": 1.0
  }'

Response:

{
  "success": true,
  "audio_base64": "UklGRiQAAABXQVZFZm10IBAAAAAB...",
  "duration": 2.5,
  "latency": 1.2,
  "message": "Generation successful"
}

Decode audio in Python:

import base64
import io
from pydub import AudioSegment

# Get response
response = requests.post("http://localhost:8000/v1/tts/generate", json={
    "text": "Salom dunyo",
    "ref_audio_id": "my_voice"
})

# Decode base64
audio_base64 = response.json()["audio_base64"]
audio_bytes = base64.b64decode(audio_base64)

# Save to file
with open("output.wav", "wb") as f:
    f.write(audio_bytes)

# Or load with pydub
audio = AudioSegment.from_file(io.BytesIO(audio_bytes), format="wav")
audio.export("output.mp3", format="mp3")

POST /v1/tts/generate/file

Generate speech and return as downloadable audio file.

Request: Same as /v1/tts/generate

Example:

curl -X POST http://localhost:8000/v1/tts/generate/file \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Assalomu alaykum!",
    "ref_audio_id": "my_voice",
    "speed": 1.0
  }' \
  --output generated_speech.wav

Response: Audio file (WAV format)

POST /v1/tts/batch

Generate speech for multiple texts (batch processing).

Request:

{
  "texts": [
    "Birinchi matn",
    "Ikkinchi matn",
    "Uchinchi matn"
  ],
  "ref_audio_id": "my_uzbek_voice",
  "speed": 1.0
}

Parameters:

  • texts (array, required): List of texts (1-8 items)
  • ref_audio_id (string, required): Reference audio ID
  • speed (float, optional): Speech speed (0.5-2.0, default: 1.0)

Example:

curl -X POST http://localhost:8000/v1/tts/batch \
  -H "Content-Type: application/json" \
  -d '{
    "texts": [
      "Assalomu alaykum!",
      "Xayr, ko'\''rishguncha!",
      "Rahmat sizga!"
    ],
    "ref_audio_id": "my_voice",
    "speed": 1.0
  }'

Response:

{
  "success": true,
  "results": [
    {
      "success": true,
      "audio_base64": "UklGRiQAAABXQVZF...",
      "duration": 1.5,
      "latency": 0.8,
      "message": "Generation successful"
    },
    {
      "success": true,
      "audio_base64": "UklGRiQAAABXQVZF...",
      "duration": 1.8,
      "latency": 0.9,
      "message": "Generation successful"
    },
    {
      "success": true,
      "audio_base64": "UklGRiQAAABXQVZF...",
      "duration": 1.2,
      "latency": 0.7,
      "message": "Generation successful"
    }
  ],
  "total_latency": 2.5
}

Cache Management

POST /v1/cache/clear

Clear all caches (reference audio embeddings, text embeddings).

Example:

curl -X POST http://localhost:8000/v1/cache/clear

Response:

{
  "success": true,
  "message": "Cache cleared"
}

Error Responses

All endpoints return errors in this format:

{
  "success": false,
  "error": "ErrorType",
  "message": "Human-readable error message",
  "detail": "Optional detailed information"
}

Common Error Codes

  • 400 Bad Request: Invalid parameters
  • 404 Not Found: Reference audio not found
  • 500 Internal Server Error: Generation error
  • 503 Service Unavailable: Service not initialized

Example errors:

{
  "success": false,
  "error": "NotFound",
  "message": "Reference 'invalid_id' not found"
}
{
  "success": false,
  "error": "ValidationError",
  "message": "Text cannot be empty!"
}

Usage Examples

Python

import requests
import base64
import wave

# Base URL
BASE_URL = "http://localhost:8000"

# 1. Upload reference audio
with open("my_voice.wav", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/v1/reference/upload",
        files={"audio_file": f},
        data={
            "ref_id": "my_voice",
            "ref_text": "Salom, bu test audiosi",
            "description": "My voice sample"
        }
    )
    print("Upload:", response.json())

# 2. Generate speech
response = requests.post(
    f"{BASE_URL}/v1/tts/generate",
    json={
        "text": "Assalomu alaykum! Bu avtomatik yaratilgan ovoz.",
        "ref_audio_id": "my_voice",
        "speed": 1.0
    }
)

result = response.json()
if result["success"]:
    # Decode base64 audio
    audio_data = base64.b64decode(result["audio_base64"])

    # Save to file
    with open("generated_speech.wav", "wb") as f:
        f.write(audio_data)

    print(f"Generated {result['duration']:.2f}s audio in {result['latency']:.2f}s")

# 3. Batch generation
response = requests.post(
    f"{BASE_URL}/v1/tts/batch",
    json={
        "texts": [
            "Birinchi gap",
            "Ikkinchi gap",
            "Uchinchi gap"
        ],
        "ref_audio_id": "my_voice"
    }
)

batch_result = response.json()
for i, result in enumerate(batch_result["results"]):
    if result["success"]:
        audio_data = base64.b64decode(result["audio_base64"])
        with open(f"batch_{i}.wav", "wb") as f:
            f.write(audio_data)

# 4. Check stats
response = requests.get(f"{BASE_URL}/stats")
print("Stats:", response.json())

# 5. Check GPU
response = requests.get(f"{BASE_URL}/gpu/stats")
print("GPU:", response.json())

JavaScript/Node.js

const axios = require('axios');
const fs = require('fs');

const BASE_URL = 'http://localhost:8000';

// Upload reference
async function uploadReference() {
    const FormData = require('form-data');
    const form = new FormData();

    form.append('audio_file', fs.createReadStream('my_voice.wav'));
    form.append('ref_id', 'my_voice');
    form.append('ref_text', 'Salom, bu test audiosi');

    const response = await axios.post(
        `${BASE_URL}/v1/reference/upload`,
        form,
        { headers: form.getHeaders() }
    );

    console.log('Upload:', response.data);
}

// Generate speech
async function generateSpeech() {
    const response = await axios.post(
        `${BASE_URL}/v1/tts/generate`,
        {
            text: 'Assalomu alaykum!',
            ref_audio_id: 'my_voice',
            speed: 1.0
        }
    );

    if (response.data.success) {
        // Decode base64 and save
        const audioBuffer = Buffer.from(response.data.audio_base64, 'base64');
        fs.writeFileSync('generated.wav', audioBuffer);

        console.log(`Generated ${response.data.duration}s audio`);
    }
}

// Run
uploadReference()
    .then(() => generateSpeech())
    .catch(console.error);

cURL Examples

# Upload reference
curl -X POST http://localhost:8000/v1/reference/upload \
  -F "audio_file=@voice.wav" \
  -F "ref_id=test_voice" \
  -F "ref_text=Test audio sample"

# Generate speech (save response)
curl -X POST http://localhost:8000/v1/tts/generate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Salom dunyo!",
    "ref_audio_id": "test_voice"
  }' | jq -r '.audio_base64' | base64 -d > output.wav

# Generate speech (direct file download)
curl -X POST http://localhost:8000/v1/tts/generate/file \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Salom dunyo!",
    "ref_audio_id": "test_voice"
  }' --output speech.wav

# Check health
curl http://localhost:8000/health

# Check GPU stats
curl http://localhost:8000/gpu/stats

# List references
curl http://localhost:8000/v1/reference/list

# Delete reference
curl -X DELETE http://localhost:8000/v1/reference/test_voice

Interactive Documentation

Visit http://localhost:8000/docs for interactive Swagger UI documentation where you can test all endpoints directly in your browser.


Rate Limits

Current implementation has no rate limits. For production, consider adding:

  • Rate limiting middleware (e.g., slowapi)
  • Request queue management
  • Per-user quotas

Best Practices

  1. Cache reference audio: Upload once, reuse the ref_id
  2. Use batch endpoint: For multiple texts with same voice
  3. Monitor performance: Check /stats and /gpu/stats regularly
  4. Adjust speed: Use speed parameter (0.5-2.0) for desired pace
  5. Clean up: Delete unused references to free storage

Support

For issues, check:

  • Service health: /health
  • GPU status: /gpu/stats
  • Performance stats: /stats
  • Server logs

Happy coding! 🎉