🎙️ AI Voice Clone with Qwen3-TTS

Clone any voice, design new voices, or use preset speakers — powered by Qwen3-TTS.

✨ Features

Mode	Description
🔁 Voice Cloning	Clone any voice from a 3–30 second reference audio
🎨 Voice Design	Create a new voice from a natural language description
🗣️ Custom Voice	Use 9 built-in high-quality preset voices
🌏 Multilingual	Supports 10 languages with cross-lingual cloning

🚀 Quick Start

Option 1 — Google Colab (Recommended)

Open Qwen3_TTS_Voice_Clone_Full.ipynb in Google Colab
Go to Runtime → Change runtime type → T4 GPU
Run cells from top to bottom

Option 2 — Local Installation

pip install qwen-tts soundfile librosa

import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    'Qwen/Qwen3-TTS-12Hz-1.7B-Base',
    device_map='cuda:0',
    dtype=torch.bfloat16,
)

wavs, sr = model.generate_voice_clone(
    text="Hello! This is a cloned voice.",
    ref_audio='path/to/reference.wav',
    ref_text='Transcript of the reference audio.',
)

sf.write('output.wav', wavs[0], sr)

📋 Requirements

Python 3.9+
CUDA GPU (8 GB+ VRAM for 1.7B, 4 GB+ for 0.6B)
PyTorch 2.0+

🧠 Model Sizes

Model	VRAM	Quality	Speed
`1.7B` (recommended)	~8 GB	⭐⭐⭐⭐⭐	Slower
`0.6B` (lightweight)	~4 GB	⭐⭐⭐⭐	Faster

🔁 Voice Cloning

Clone a voice by providing a reference audio file and its transcript.

wavs, sr = model.generate_voice_clone(
    text="Text you want to synthesize.",
    ref_audio='reference.wav',   # Audio to clone (3–30 seconds)
    ref_text='Exact transcript of reference.wav',  # Must match the audio
)

Important: ref_text must be the exact transcript of what is spoken in ref_audio. Mismatched text will reduce quality significantly.

Reference Audio Tips

✅ 3–10 seconds — ideal
✅ 10–30 seconds — works well
⚠️ 30+ seconds — may cause out-of-memory errors
🎯 Use clean audio with minimal background noise

🎨 Voice Design

Create a new voice from a text description.

from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    'Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign',
    device_map='cuda:0',
    dtype=torch.bfloat16,
)

wavs, sr = model.generate_voice_design(
    text="Welcome to the show!",
    instruct="A warm and energetic female voice with a clear and engaging tone.",
)

🗣️ Custom Voice (Preset Speakers)

Use one of 9 built-in voices with optional style control.

from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    'Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice',
    device_map='cuda:0',
    dtype=torch.bfloat16,
)

wavs, sr = model.generate_custom_voice(
    text="Hello! Nice to meet you.",
    language='English',
    speaker='Vivian',
    instruct='Speak warmly and clearly.',  # Optional
)

Available Speakers

Vivian · Ryan · Ava · Liam · Emma · Noah · Sophia · Oliver · Isabella

Supported Languages

Chinese · English · Japanese · Korean · German · French · Russian · Portuguese · Spanish · Italian

🌏 Cross-lingual Cloning

You can clone a voice in one language and generate speech in another.

# Reference audio in English → output in Japanese
wavs, sr = model.generate_voice_clone(
    text="こんにちは！音声クローニングのデモです。",
    ref_audio='english_voice.wav',
    ref_text='Transcript of the English reference audio.',
)

💾 Reuse a Cloned Voice Across Projects

Qwen3-TTS uses zero-shot cloning — there is no separate voice model file to save. To reuse a cloned voice, keep the reference audio file and its transcript, then pass them into every generation call.

class VoiceCloner:
    def __init__(self, model_size='1.7B'):
        self.model = Qwen3TTSModel.from_pretrained(
            f'Qwen/Qwen3-TTS-12Hz-{model_size}-Base',
            device_map='cuda:0',
            dtype=torch.bfloat16,
        )
        self.ref_audio = None
        self.ref_text  = None

    def set_voice(self, ref_audio_path, ref_text):
        self.ref_audio = ref_audio_path
        self.ref_text  = ref_text

    def speak(self, text, output_path='output.wav'):
        wavs, sr = self.model.generate_voice_clone(
            text=text,
            ref_audio=self.ref_audio,
            ref_text=self.ref_text,
        )
        sf.write(output_path, wavs[0], sr)
        return output_path

Key point: The only files you need to keep are reference.wav and the ref_text string. These two items define the voice identity.

🛠️ Troubleshooting

Problem	Solution
CUDA out of memory	Switch to `0.6B` model or reduce reference audio length
Poor cloning quality	Use cleaner audio; ensure `ref_text` exactly matches `ref_audio`
Slow generation	Install `flash-attn`: `pip install flash-attn --no-build-isolation`
Model download fails	Check internet connection — models are 1–3.5 GB
No GPU in Colab	Go to Runtime → Change runtime type → T4 GPU

📁 Project Structure

.
├── Qwen3_TTS_Voice_Clone_Full.ipynb  # Full Google Colab notebook
├── voice_cloner.py                   # Reusable VoiceCloner class
├── api.py                            # FastAPI server (optional)
├── README.md
└── samples/
    └── reference.wav                 # Your reference audio files

📚 References

📄 License

This project uses Qwen3-TTS which is released under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Qwen3_TTS_Voice_Clone_Full.ipynb		Qwen3_TTS_Voice_Clone_Full.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ AI Voice Clone with Qwen3-TTS

✨ Features

🚀 Quick Start

Option 1 — Google Colab (Recommended)

Option 2 — Local Installation

📋 Requirements

🧠 Model Sizes

🔁 Voice Cloning

Reference Audio Tips

🎨 Voice Design

🗣️ Custom Voice (Preset Speakers)

Available Speakers

Supported Languages

🌏 Cross-lingual Cloning

💾 Reuse a Cloned Voice Across Projects

🛠️ Troubleshooting

📁 Project Structure

📚 References

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ AI Voice Clone with Qwen3-TTS

✨ Features

🚀 Quick Start

Option 1 — Google Colab (Recommended)

Option 2 — Local Installation

📋 Requirements

🧠 Model Sizes

🔁 Voice Cloning

Reference Audio Tips

🎨 Voice Design

🗣️ Custom Voice (Preset Speakers)

Available Speakers

Supported Languages

🌏 Cross-lingual Cloning

💾 Reuse a Cloned Voice Across Projects

🛠️ Troubleshooting

📁 Project Structure

📚 References

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages