Urdu Text-to-Speech with Voice Cloning using SpeechT5

A fine-tuned SpeechT5 model for high-quality Urdu text-to-speech generation with voice cloning capabilities. This model supports both Urdu and Roman Urdu scripts and allows speaker selection for personalized speech synthesis.

Features

🗣️ Urdu TTS: High-quality text-to-speech synthesis for Urdu language
🔊 Voice Cloning: Generate speech in the style of specific speakers (including Zia Mohiuddin's voice)
🌐 Dual Script Support: Works with both Urdu (نثر) and Roman Urdu (Urdu written in Latin script)
🎛️ Speaker Selection: Choose between different voice profiles
🚀 FastAPI Demo: Interactive web interface for testing the model

Model Details

This implementation is based on SpeechT5, a state-of-the-art model for speech synthesis tasks. Key modifications include:

Tokenization: Character-level tokenization specifically adapted for Urdu script
Preprocessing: Updated tokenizer and processor to handle Urdu phonetics and pronunciation
Architecture: Fine-tuned SpeechT5 architecture with multilingual capabilities

Dataset

The model was trained on a merged dataset comprising:

xcollab tts 15k dataset: A comprehensive Urdu speech dataset with 15,000+ recordings
Zia Mohiuddin Dataset: 350 high-quality recordings of the renowned Pakistani broadcaster

This combination enables both general Urdu TTS and voice cloning capabilities for specific speakers.

Training

Epochs: 50 (significant improvement observed after 40 epochs)
Batch Size: 6-8 (smaller batch sizes yielded better results)
Hardware: GPU-accelerated training
Performance: Mid-level quality with potential for improvement through:
- Longer training (100+ epochs recommended)
- Larger model variants
- Additional high-quality data

Demo

Try the interactive demo at [your-demo-link-here] or run locally:

Clone the repository
Install dependencies: pip install -r requirements.txt
Run the server: python app.py
Open your browser to http://localhost:8000

Demo Features:

Text input in Urdu or Roman Urdu
Speaker selection dropdown
Real-time audio generation
Responsive web interface

Installation

git clone https://github.com/your-username/urdu-tts-voice-cloning.git
cd urdu-tts-voice-cloning
pip install -r requirements.txt

Usage

Python API

from model import UrduTTS

tts = UrduTTS()
audio = tts.generate_text_to_speech(
    text="یہ ایک مثال ہے",  # Urdu text
    speaker="zia_mohiuddin"  # Optional speaker selection
)
audio.save("output.wav")

Roman Urdu Support

audio = tts.generate_text_to_speech(
    text="Ye aik misaal hai",  # Roman Urdu
    speaker="default"
)

Performance Notes

Current model achieves mid-level quality with natural-sounding output
Best results obtained with:
- 40+ training epochs
- Batch sizes of 6-8
- Adequate GPU memory (recommended: 16GB+)
Voice cloning works best with clear reference recordings

Future Improvements

Increase training epochs to 100+
Experiment with larger SpeechT5 variants
Expand dataset with more diverse speakers
Implement Roman Urdu normalization
Add prosody control features
Optimize for real-time applications

Tech Stack

Model: SpeechT5 (fine-tuned)
Backend: FastAPI
Frontend: HTML/CSS/JavaScript
Audio Processing: Librosa, SoundFile
ML Framework: PyTorch, Transformers

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

Original SpeechT5 model by Microsoft Research
xcollab tts dataset contributors
Zia Mohiuddin dataset providers

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
api		api
notebook		notebook
output		output
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Urdu Text-to-Speech with Voice Cloning using SpeechT5

Features

Model Details

Dataset

Training

Demo

Demo Features:

Installation

Usage

Python API

Roman Urdu Support

Performance Notes

Future Improvements

Tech Stack

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Urdu Text-to-Speech with Voice Cloning using SpeechT5

Features

Model Details

Dataset

Training

Demo

Demo Features:

Installation

Usage

Python API

Roman Urdu Support

Performance Notes

Future Improvements

Tech Stack

Contributing

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages