Skip to content

v4.19.0 - Qwen3-TTS Engine with Voice Designer

Latest

Choose a tag to compare

@diodiogod diodiogod released this 28 Jan 14:22
· 25 commits to main since this release

🎨 Qwen3-TTS Engine - Create Voices from Text!

Major new engine addition! Qwen3-TTS brings a unique Voice Designer feature that lets you create custom voices from natural language descriptions. Plus three distinct model types for different use cases!

✨ New Features

Qwen3-TTS Engine

  • 🎨 Voice Designer - Create custom voices from text descriptions! "A calm female voice with British accent" → instant voice generation
  • Three model types with different capabilities:
    • CustomVoice: 9 high-quality preset speakers (Vivian, Serena, Dylan, Eric, Ryan, etc.)
    • VoiceDesign: Text-to-voice creation - describe your ideal voice and generate it
    • Base: Zero-shot voice cloning from audio samples
  • 10 language support - Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
  • Model sizes: 0.6B (low VRAM) and 1.7B (high quality) variants
  • Character voice switching with [CharacterName] syntax - automatic preset mapping
  • SRT subtitle timing support with all timing modes (stretch_to_fit, pad_with_silence, etc.)
  • Inline edit tags - Apply Step Audio EditX post-processing (emotions, styles, paralinguistic effects)
  • Sage attention support - Improved VRAM efficiency with sageattention backend
  • Smart caching - Prevents duplicate voice generation, skips model loading for existing voices
  • Per-segment parameters - Control [seed:42], [temperature:0.8] inline
  • Auto-download system - All 6 model variants downloaded automatically when needed

🎙️ Voice Designer Node

The standout feature of this release! Create voices without audio samples:

  • Natural language input - Describe voice characteristics in plain English
  • Disk caching - Saved voices load instantly without regeneration
  • Standard format - Works seamlessly with Character Voices system
  • Unified output - Compatible with all TTS nodes via NARRATOR_VOICE format

Example descriptions:

  • "A calm female voice with British accent"
  • "Deep male voice, authoritative and professional"
  • "Young cheerful woman, slightly high-pitched"

📚 Documentation

  • YAML-driven engine tables - Auto-generated comparison tables
  • Condensed engine overview in README
  • Portuguese accent guidance - Clear documentation of model limitations and workarounds

🎯 Technical Highlights

  • Official Qwen3-TTS implementation bundled for stability
  • 24kHz mono audio output
  • Progress bars with real-time token generation tracking
  • VRAM management with automatic model reload and device checking
  • Full unified architecture integration
  • Interrupt handling for cancellation support

Qwen3-TTS brings a total of 10 TTS engines to the suite, each with unique capabilities. Voice Designer is a first-of-its-kind feature in ComfyUI TTS extensions!