🎨 Qwen3-TTS Engine - Create Voices from Text!
Major new engine addition! Qwen3-TTS brings a unique Voice Designer feature that lets you create custom voices from natural language descriptions. Plus three distinct model types for different use cases!
✨ New Features
Qwen3-TTS Engine
- 🎨 Voice Designer - Create custom voices from text descriptions! "A calm female voice with British accent" → instant voice generation
- Three model types with different capabilities:
- CustomVoice: 9 high-quality preset speakers (Vivian, Serena, Dylan, Eric, Ryan, etc.)
- VoiceDesign: Text-to-voice creation - describe your ideal voice and generate it
- Base: Zero-shot voice cloning from audio samples
- 10 language support - Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
- Model sizes: 0.6B (low VRAM) and 1.7B (high quality) variants
- Character voice switching with
[CharacterName]syntax - automatic preset mapping - SRT subtitle timing support with all timing modes (stretch_to_fit, pad_with_silence, etc.)
- Inline edit tags - Apply Step Audio EditX post-processing (emotions, styles, paralinguistic effects)
- Sage attention support - Improved VRAM efficiency with sageattention backend
- Smart caching - Prevents duplicate voice generation, skips model loading for existing voices
- Per-segment parameters - Control
[seed:42],[temperature:0.8]inline - Auto-download system - All 6 model variants downloaded automatically when needed
🎙️ Voice Designer Node
The standout feature of this release! Create voices without audio samples:
- Natural language input - Describe voice characteristics in plain English
- Disk caching - Saved voices load instantly without regeneration
- Standard format - Works seamlessly with Character Voices system
- Unified output - Compatible with all TTS nodes via NARRATOR_VOICE format
Example descriptions:
- "A calm female voice with British accent"
- "Deep male voice, authoritative and professional"
- "Young cheerful woman, slightly high-pitched"
📚 Documentation
- YAML-driven engine tables - Auto-generated comparison tables
- Condensed engine overview in README
- Portuguese accent guidance - Clear documentation of model limitations and workarounds
🎯 Technical Highlights
- Official Qwen3-TTS implementation bundled for stability
- 24kHz mono audio output
- Progress bars with real-time token generation tracking
- VRAM management with automatic model reload and device checking
- Full unified architecture integration
- Interrupt handling for cancellation support
Qwen3-TTS brings a total of 10 TTS engines to the suite, each with unique capabilities. Voice Designer is a first-of-its-kind feature in ComfyUI TTS extensions!