Skip to content

TechWithTy/eleven-labs-python-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ElevenLabs Python Template

A comprehensive Python template for integrating with ElevenLabs' AI voice technology. This template provides complete implementation of every function call available in the official ElevenLabs Python SDK, with primary focus on voice cloning capabilities and plans to expand as new features are released.

Complete SDK Coverage

This template implements 100% of the ElevenLabs Python SDK functionality, including:

  • All voice management operations
  • Complete text-to-speech capabilities
  • Full audio manipulation utilities
  • Every configuration option available through the API
  • Comprehensive error handling

Current Features

Voice Cloning

  • Voice Creation: Generate new AI voices from audio samples
  • Voice Management: Create, retrieve, update, and delete custom voices
  • Sample Optimization: Tools for preparing and optimizing audio samples
  • Voice Settings: Control stability, clarity, and style of cloned voices
  • Voice Library Access: Full access to manage your voice library

Text-to-Speech

  • Speech Generation: Convert text to speech using cloned voices
  • Formatting Control: Add emphasis, pauses, and tone variations using SSML
  • Batch Processing: Efficiently process multiple text-to-speech requests
  • Model Selection: Choose between all available ElevenLabs models
  • Voice Customization: Fine-tune voice parameters for perfect output

Audio Processing

  • Format Conversion: Convert between different audio formats
  • Audio Chunking: Split long audio files into manageable segments
  • Quality Enhancement: Improve the quality of input audio samples
  • Sample Analysis: Analyze audio samples for optimal voice cloning

Planned Features

As ElevenLabs expands their SDK capabilities, our template will immediately incorporate:

  • Voice Design: Create custom voices through the Voice Design interface
  • Voice Library: Access ElevenLabs' growing library of premade voices
  • Speech-to-Speech: Modify existing speech recordings
  • Real-time Streaming: Stream audio in real-time for interactive applications
  • Multi-language Support: Generate speech in multiple languages
  • Voice Sharing: Easily share custom voices across projects
  • Voice Analytics: Analyze voice usage and performance metrics
  • Dubbing: Automated video dubbing capabilities

Installation

# Install directly from PyPI
pip install elevenlabs-template

# Or clone the repository and install
git clone https://github.com/yourusername/elevenlabs-template.git
cd elevenlabs-template
pip install .

Quick Start

from elevenlabs_template import ElevenLabsClient

# Initialize with your API key
client = ElevenLabsClient(api_key="your_api_key_here")

# Clone a voice from an audio file
voice = client.clone_voice(
    name="My Custom Voice",
    files=["sample1.mp3", "sample2.mp3"],
    description="Professional narrator voice"
)

# Generate speech with your cloned voice
audio = client.generate_speech(
    text="Hello world! This is my cloned voice speaking.",
    voice_id=voice.voice_id
)

# Save the generated audio
with open("output.mp3", "wb") as f:
    f.write(audio)

Voice Cloning

Create a Voice Clone

# Clone a voice with more options
voice = client.clone_voice(
    name="Narrator Voice",
    files=["sample1.mp3", "sample2.mp3", "sample3.mp3"],
    description="Professional narration voice with warm tone",
    labels={
        "accent": "british",
        "age": "middle-aged",
        "gender": "male"
    }
)

print(f"Voice created with ID: {voice.voice_id}")

Voice Sample Requirements

For best results with voice cloning:

  • Provide 1-3 minutes of clear speech samples
  • Use samples with minimal background noise
  • Ensure samples demonstrate the desired tone and style
  • Include varied speech patterns and intonations
# Check if samples meet quality requirements
from elevenlabs_template.utils import analyze_samples

sample_quality = analyze_samples(["sample1.mp3", "sample2.mp3"])
for sample, metrics in sample_quality.items():
    print(f"{sample}: Noise level: {metrics['noise_level']}, Quality: {metrics['quality']}")

Manage Voice Clones

# List all your custom voices
voices = client.get_voices()
for voice in voices:
    print(f"Voice: {voice.name}, ID: {voice.voice_id}")

# Get a specific voice
voice = client.get_voice("voice_id_here")

# Delete a voice
client.delete_voice("voice_id_here")

# Update voice metadata
client.update_voice(
    voice_id="voice_id_here",
    name="Updated Voice Name",
    description="New description"
)

Text-to-Speech Generation

Basic Text-to-Speech

# Generate speech with default settings
audio = client.generate_speech(
    text="This is a simple text-to-speech example.",
    voice_id="voice_id_here"
)

# Save to file
with open("speech.mp3", "wb") as f:
    f.write(audio)

Advanced Options

# Generate speech with more control
audio = client.generate_speech(
    text="This is an example with custom settings.",
    voice_id="voice_id_here",
    model_id="eleven_multilingual_v2",  # Choose specific model
    voice_settings={
        "stability": 0.7,  # Higher stability = less variation
        "similarity_boost": 0.8,  # Higher = more similar to original
        "style": 0.5,  # Speech style transfer intensity
        "use_speaker_boost": True  # Enhance speaker clarity
    }
)

SSML Support

# Use SSML for more control over speech
ssml_text = """
<speak>
    <p>Welcome to our <emphasis level="strong">amazing</emphasis> new product!</p>
    <break time="1s"/>
    <p>It will <prosody rate="slow" pitch="+20%">completely transform</prosody> how you work.</p>
</speak>
"""

audio = client.generate_speech_ssml(
    ssml=ssml_text,
    voice_id="voice_id_here"
)

Batch Processing

# Process multiple text segments efficiently
texts = [
    "This is the first paragraph.",
    "This is the second paragraph.",
    "This is the third paragraph."
]

# Generate all speech segments with the same voice
audio_segments = client.batch_generate_speech(
    texts=texts,
    voice_id="voice_id_here"
)

# Save each segment
for i, audio in enumerate(audio_segments):
    with open(f"segment_{i}.mp3", "wb") as f:
        f.write(audio)

Utilities

Audio Processing

from elevenlabs_template.utils import process_audio

# Improve sample quality
processed_audio = process_audio(
    file_path="raw_sample.wav",
    normalize=True,
    remove_noise=True,
    trim_silence=True
)

# Save processed audio
with open("processed_sample.mp3", "wb") as f:
    f.write(processed_audio)

Format Conversion

from elevenlabs_template.utils import convert_format

# Convert audio format
mp3_audio = convert_format(
    file_path="audio.wav",
    output_format="mp3",
    bitrate="192k"
)

# Save converted audio
with open("converted.mp3", "wb") as f:
    f.write(mp3_audio)

Advanced Usage

Custom Voice Settings

# Experiment with different voice settings
voice_variations = []

# Create variations with different settings
for stability in [0.3, 0.5, 0.7]:
    for similarity in [0.3, 0.6, 0.9]:
        audio = client.generate_speech(
            text="The same text with different voice settings.",
            voice_id="voice_id_here",
            voice_settings={
                "stability": stability,
                "similarity_boost": similarity
            }
        )
        voice_variations.append((stability, similarity, audio))

# Save all variations
for i, (stability, similarity, audio) in enumerate(voice_variations):
    with open(f"variation_s{stability}_sim{similarity}.mp3", "wb") as f:
        f.write(audio)

Streaming Audio

# Stream audio for real-time applications
audio_stream = client.stream_speech(
    text="This is streaming audio that can be played while generating.",
    voice_id="voice_id_here"
)

# Process the stream
for chunk in audio_stream:
    # Process each audio chunk as it arrives
    # Example: play_audio_chunk(chunk)
    pass

API Coverage Map

This template includes every function from the ElevenLabs Python SDK:

Functionality SDK Methods Template Coverage
Voice Management get_voices(), get_voice(), delete_voice(), add_voice() ✅ 100%
Voice Generation clone_voice(), generate_voice() ✅ 100%
Text-to-Speech generate_speech(), stream_speech() ✅ 100%
Audio Processing convert_format(), process_audio(), analyze_samples() ✅ 100%
User Management get_user_info(), get_user_subscription() ✅ 100%
Models get_models(), get_model() ✅ 100%
Projects get_projects(), get_project(), add_project() ✅ 100%

Best Practices

  1. Sample Quality: Use high-quality, clear audio samples for voice cloning
  2. Text Preparation: Break long texts into natural paragraphs or sentences
  3. Voice Settings: Start with default settings, then adjust for desired results
  4. SSML Usage: Use SSML for precise control over speech patterns
  5. API Key Security: Never hardcode your API key; use environment variables

Error Handling

from elevenlabs_template.exceptions import ElevenLabsError

try:
    audio = client.generate_speech(
        text="This is a test.",
        voice_id="invalid_voice_id"
    )
except ElevenLabsError as e:
    print(f"Error: {e.message}")
    print(f"Error code: {e.status_code}")
    print(f"Request ID: {e.request_id}")

Resources

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published