TTS 😮
Zero-Shot Speech Editing and Text-to-Speech in the Wild
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Multilingual Voice Understanding Model
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
A Gradio web UI for Large Language Models with support for multiple inference backends.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Simple text to phones converter for multiple languages
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Open Source framework for voice and multimodal conversational AI