Complete API reference for Uzbek TTS service.
http://localhost:8000
Replace with your actual domain when deployed.
Currently no authentication required. Add authentication middleware if needed for production.
Root endpoint with service information.
Response:
{
"service": "Uzbek TTS API",
"version": "1.0.0",
"status": "running",
"docs": "/docs",
"health": "/health"
}Health check endpoint.
Response:
{
"status": "healthy",
"gpu_available": true,
"model_loaded": true,
"cache_size": 5,
"uptime": 3600.5
}Performance statistics.
Response:
{
"total_generations": 150,
"cache_hits": 120,
"cache_misses": 30,
"cache_hit_rate": 0.8,
"avg_latency": 1.2,
"cache_size": 5
}GPU statistics (requires NVIDIA GPU).
Response:
{
"gpu_name": "NVIDIA A40",
"gpu_utilization": 65.5,
"memory_used": 12288.5,
"memory_total": 49152.0,
"memory_utilization": 25.0,
"temperature": 72.0,
"power_usage": 180.5
}Upload a reference audio file for voice cloning.
Request:
- Method:
POST - Content-Type:
multipart/form-data
Parameters:
audio_file(file, required): Audio file (WAV, MP3, OGG)ref_id(string, required): Unique identifier for this referenceref_text(string, required): Transcription of the reference audiodescription(string, optional): Optional description
Example:
curl -X POST http://localhost:8000/v1/reference/upload \
-F "audio_file=@my_voice.wav" \
-F "ref_id=my_uzbek_voice" \
-F "ref_text=Salom, mening ismim Ali." \
-F "description=Male voice, clear pronunciation"Response:
{
"ref_id": "my_uzbek_voice",
"ref_text": "Salom, mening ismim Ali.",
"description": "Male voice, clear pronunciation",
"duration": 3.5,
"sample_rate": 24000,
"created_at": "2025-01-28T10:30:00"
}List all uploaded reference audio files.
Response:
[
{
"ref_id": "voice1",
"ref_text": "Salom dunyo",
"description": "Test voice",
"duration": 2.5,
"sample_rate": 24000,
"created_at": "2025-01-28T10:00:00"
},
{
"ref_id": "voice2",
"ref_text": "Assalomu alaykum",
"description": "",
"duration": 3.0,
"sample_rate": 24000,
"created_at": "2025-01-28T10:15:00"
}
]Get information about a specific reference.
Parameters:
ref_id(path, required): Reference ID
Example:
curl http://localhost:8000/v1/reference/my_uzbek_voiceResponse:
{
"ref_id": "my_uzbek_voice",
"ref_text": "Salom, mening ismim Ali.",
"description": "Male voice, clear pronunciation",
"duration": 3.5,
"sample_rate": 24000,
"created_at": "2025-01-28T10:30:00"
}Delete a reference audio.
Example:
curl -X DELETE http://localhost:8000/v1/reference/my_uzbek_voiceResponse:
{
"success": true,
"message": "Reference 'my_uzbek_voice' deleted"
}Generate speech from text (returns base64-encoded audio).
Request:
{
"text": "Assalomu alaykum! Qalaysizlar?",
"ref_audio_id": "my_uzbek_voice",
"speed": 1.0
}Parameters:
text(string, required): Text to synthesize (1-5000 characters)ref_audio_id(string, required): Reference audio IDspeed(float, optional): Speech speed (0.5-2.0, default: 1.0)
Example:
curl -X POST http://localhost:8000/v1/tts/generate \
-H "Content-Type: application/json" \
-d '{
"text": "Assalomu alaykum! Qalaysizlar?",
"ref_audio_id": "my_uzbek_voice",
"speed": 1.0
}'Response:
{
"success": true,
"audio_base64": "UklGRiQAAABXQVZFZm10IBAAAAAB...",
"duration": 2.5,
"latency": 1.2,
"message": "Generation successful"
}Decode audio in Python:
import base64
import io
from pydub import AudioSegment
# Get response
response = requests.post("http://localhost:8000/v1/tts/generate", json={
"text": "Salom dunyo",
"ref_audio_id": "my_voice"
})
# Decode base64
audio_base64 = response.json()["audio_base64"]
audio_bytes = base64.b64decode(audio_base64)
# Save to file
with open("output.wav", "wb") as f:
f.write(audio_bytes)
# Or load with pydub
audio = AudioSegment.from_file(io.BytesIO(audio_bytes), format="wav")
audio.export("output.mp3", format="mp3")Generate speech and return as downloadable audio file.
Request: Same as /v1/tts/generate
Example:
curl -X POST http://localhost:8000/v1/tts/generate/file \
-H "Content-Type: application/json" \
-d '{
"text": "Assalomu alaykum!",
"ref_audio_id": "my_voice",
"speed": 1.0
}' \
--output generated_speech.wavResponse: Audio file (WAV format)
Generate speech for multiple texts (batch processing).
Request:
{
"texts": [
"Birinchi matn",
"Ikkinchi matn",
"Uchinchi matn"
],
"ref_audio_id": "my_uzbek_voice",
"speed": 1.0
}Parameters:
texts(array, required): List of texts (1-8 items)ref_audio_id(string, required): Reference audio IDspeed(float, optional): Speech speed (0.5-2.0, default: 1.0)
Example:
curl -X POST http://localhost:8000/v1/tts/batch \
-H "Content-Type: application/json" \
-d '{
"texts": [
"Assalomu alaykum!",
"Xayr, ko'\''rishguncha!",
"Rahmat sizga!"
],
"ref_audio_id": "my_voice",
"speed": 1.0
}'Response:
{
"success": true,
"results": [
{
"success": true,
"audio_base64": "UklGRiQAAABXQVZF...",
"duration": 1.5,
"latency": 0.8,
"message": "Generation successful"
},
{
"success": true,
"audio_base64": "UklGRiQAAABXQVZF...",
"duration": 1.8,
"latency": 0.9,
"message": "Generation successful"
},
{
"success": true,
"audio_base64": "UklGRiQAAABXQVZF...",
"duration": 1.2,
"latency": 0.7,
"message": "Generation successful"
}
],
"total_latency": 2.5
}Clear all caches (reference audio embeddings, text embeddings).
Example:
curl -X POST http://localhost:8000/v1/cache/clearResponse:
{
"success": true,
"message": "Cache cleared"
}All endpoints return errors in this format:
{
"success": false,
"error": "ErrorType",
"message": "Human-readable error message",
"detail": "Optional detailed information"
}- 400 Bad Request: Invalid parameters
- 404 Not Found: Reference audio not found
- 500 Internal Server Error: Generation error
- 503 Service Unavailable: Service not initialized
Example errors:
{
"success": false,
"error": "NotFound",
"message": "Reference 'invalid_id' not found"
}{
"success": false,
"error": "ValidationError",
"message": "Text cannot be empty!"
}import requests
import base64
import wave
# Base URL
BASE_URL = "http://localhost:8000"
# 1. Upload reference audio
with open("my_voice.wav", "rb") as f:
response = requests.post(
f"{BASE_URL}/v1/reference/upload",
files={"audio_file": f},
data={
"ref_id": "my_voice",
"ref_text": "Salom, bu test audiosi",
"description": "My voice sample"
}
)
print("Upload:", response.json())
# 2. Generate speech
response = requests.post(
f"{BASE_URL}/v1/tts/generate",
json={
"text": "Assalomu alaykum! Bu avtomatik yaratilgan ovoz.",
"ref_audio_id": "my_voice",
"speed": 1.0
}
)
result = response.json()
if result["success"]:
# Decode base64 audio
audio_data = base64.b64decode(result["audio_base64"])
# Save to file
with open("generated_speech.wav", "wb") as f:
f.write(audio_data)
print(f"Generated {result['duration']:.2f}s audio in {result['latency']:.2f}s")
# 3. Batch generation
response = requests.post(
f"{BASE_URL}/v1/tts/batch",
json={
"texts": [
"Birinchi gap",
"Ikkinchi gap",
"Uchinchi gap"
],
"ref_audio_id": "my_voice"
}
)
batch_result = response.json()
for i, result in enumerate(batch_result["results"]):
if result["success"]:
audio_data = base64.b64decode(result["audio_base64"])
with open(f"batch_{i}.wav", "wb") as f:
f.write(audio_data)
# 4. Check stats
response = requests.get(f"{BASE_URL}/stats")
print("Stats:", response.json())
# 5. Check GPU
response = requests.get(f"{BASE_URL}/gpu/stats")
print("GPU:", response.json())const axios = require('axios');
const fs = require('fs');
const BASE_URL = 'http://localhost:8000';
// Upload reference
async function uploadReference() {
const FormData = require('form-data');
const form = new FormData();
form.append('audio_file', fs.createReadStream('my_voice.wav'));
form.append('ref_id', 'my_voice');
form.append('ref_text', 'Salom, bu test audiosi');
const response = await axios.post(
`${BASE_URL}/v1/reference/upload`,
form,
{ headers: form.getHeaders() }
);
console.log('Upload:', response.data);
}
// Generate speech
async function generateSpeech() {
const response = await axios.post(
`${BASE_URL}/v1/tts/generate`,
{
text: 'Assalomu alaykum!',
ref_audio_id: 'my_voice',
speed: 1.0
}
);
if (response.data.success) {
// Decode base64 and save
const audioBuffer = Buffer.from(response.data.audio_base64, 'base64');
fs.writeFileSync('generated.wav', audioBuffer);
console.log(`Generated ${response.data.duration}s audio`);
}
}
// Run
uploadReference()
.then(() => generateSpeech())
.catch(console.error);# Upload reference
curl -X POST http://localhost:8000/v1/reference/upload \
-F "audio_file=@voice.wav" \
-F "ref_id=test_voice" \
-F "ref_text=Test audio sample"
# Generate speech (save response)
curl -X POST http://localhost:8000/v1/tts/generate \
-H "Content-Type: application/json" \
-d '{
"text": "Salom dunyo!",
"ref_audio_id": "test_voice"
}' | jq -r '.audio_base64' | base64 -d > output.wav
# Generate speech (direct file download)
curl -X POST http://localhost:8000/v1/tts/generate/file \
-H "Content-Type: application/json" \
-d '{
"text": "Salom dunyo!",
"ref_audio_id": "test_voice"
}' --output speech.wav
# Check health
curl http://localhost:8000/health
# Check GPU stats
curl http://localhost:8000/gpu/stats
# List references
curl http://localhost:8000/v1/reference/list
# Delete reference
curl -X DELETE http://localhost:8000/v1/reference/test_voiceVisit http://localhost:8000/docs for interactive Swagger UI documentation where you can test all endpoints directly in your browser.
Current implementation has no rate limits. For production, consider adding:
- Rate limiting middleware (e.g., slowapi)
- Request queue management
- Per-user quotas
- Cache reference audio: Upload once, reuse the ref_id
- Use batch endpoint: For multiple texts with same voice
- Monitor performance: Check
/statsand/gpu/statsregularly - Adjust speed: Use
speedparameter (0.5-2.0) for desired pace - Clean up: Delete unused references to free storage
For issues, check:
- Service health:
/health - GPU status:
/gpu/stats - Performance stats:
/stats - Server logs
Happy coding! 🎉