The most comprehensive Kokoro TTS implementation for ComfyUI with ALL 54+ voices across 9 languages, voice blending, and advanced voice modification effects.
Ignore the Advanced Voice Mod node for now, it's an experimental thing currently.
Special Note: Advanced Voice Mod node currently under construction. Will not function as intended at the moment. Japanese voices do not work at the moment either, will require custom wheel.
- 🌍 54+ Voices: Complete support for all Kokoro-82M voices across 9 languages
- 🎯 9 Languages: US English, UK English, Japanese (not working yet, needs a custom wheel build), Mandarin Chinese, Spanish, French, Hindi, Italian, Brazilian Portuguese
- 🔀 Advanced Voice Blending: Mix any two voices with adjustable blend ratios
- **🐍 Python 3.12: Fully tested and optimized for the latest Python versions
- 📦 Modern Architecture: Completely rewritten following 2025 ComfyUI best practices
- ⚡ Improved Performance: Better memory management and processing speed
- 🛡️ Enhanced Reliability: Robust error handling and fallback mechanisms
- ✅ ALL 54+ Kokoro-82M voices (nothing left out!)
- ✅ Voice blending with linear interpolation
- ✅ NEW: Guided Voice Morphing - Use any audio file to guide voice transformation
- ✅ NEW: Autotune-style Pitch Correction - Match pitch to reference audio
- ✅ NEW: Advanced Spectral Morphing - Match tone, timbre, and character
- ✅ NEW: 18 Voice Profiles - Professional presets for instant transformations
- ✅ Advanced voice modification effects (pitch, formant, reverb, etc.)
- ✅ Intelligent text chunking that preserves sentence order
- ✅ GPU acceleration with automatic CPU fallback
- ✅ Multi-language support with proper phoneme handling
- ✅ Professional audio processing pipeline with Dynamic Time Warping
- ✅ ComfyUI v3.49+ compatibility
- Installation
- Complete Voice List
- Usage Guide
- Voice Blending
- Guided Voice Morphing (NEW!)
- Voice Modification Effects
- Technical Details
- Troubleshooting
- Credits
- ComfyUI v3.49+ (or compatible version)
- **Python 3.9, 3.10, 3.11, 3.12, (3.13+ not supported)
- PyTorch 2.0+ (usually included with ComfyUI)
- 4GB+ RAM (8GB recommended for longer texts)
- Optional: CUDA-capable GPU for faster processing
- Open ComfyUI and navigate to "Manager"
- Click "Install Custom Nodes"
- Search for "Geeky Kokoro TTS"
- Click "Install" and restart ComfyUI
- Done! Nodes will appear in the "audio" category
# Navigate to your ComfyUI custom nodes directory
cd ComfyUI/custom_nodes
# Clone this repository
git clone https://github.com/GeekyGhost/ComfyUI-Geeky-Kokoro-TTS.git
# Navigate into the directory
cd ComfyUI-Geeky-Kokoro-TTS
# Install Python dependencies
pip install -r requirements.txt
# Optional: Run installation verification script
python install.pyREM Navigate to custom nodes directory
cd ComfyUI_windows_portable\ComfyUI\custom_nodes
REM Clone repository
git clone https://github.com/GeekyGhost/ComfyUI-Geeky-Kokoro-TTS.git
REM Navigate into directory
cd ComfyUI-Geeky-Kokoro-TTS
REM Install with portable Python
..\..\..\python_embeded\python.exe -m pip install -r requirements.txtFor best phoneme processing, install espeak-ng:
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install espeak-ngmacOS:
brew install espeak-ngWindows: Download and install from: https://github.com/espeak-ng/espeak-ng/releases
| Voice Name | Code | Character | Best For |
|---|---|---|---|
| Heart ❤️ | af_heart |
Warm, friendly, natural | Narration, audiobooks, general purpose |
| Bella 🔥 | af_bella |
Energetic, dynamic, engaging | Marketing, announcements, enthusiastic content |
| Nicole 🎧 | af_nicole |
Clear, professional, articulate | Training videos, tutorials, instructional content |
| Aoede 🎵 | af_aoede |
Musical, expressive, artistic | Creative content, storytelling, entertainment |
| Kore | af_kore |
Balanced, versatile | General purpose, business content |
| Sarah | af_sarah |
Neutral, calm, reliable | Documentation, formal content, reports |
| Nova ⭐ | af_nova |
Bright, modern, upbeat | Social media, vlogs, casual content |
| Sky ☁️ | af_sky |
Soft, gentle, soothing | Meditation, relaxation, ASMR |
| Alloy | af_alloy |
Professional, authoritative | Corporate, presentations, business |
| Jessica | af_jessica |
Friendly, approachable | Customer service, help content, guides |
| River 🌊 | af_river |
Flowing, natural, smooth | Long-form narration, podcasts |
| Voice Name | Code | Character | Best For |
|---|---|---|---|
| Michael | am_michael |
Deep, authoritative, commanding | Documentary, serious content, news |
| Fenrir 🐺 | am_fenrir |
Strong, bold, powerful | Action content, gaming, intense narration |
| Puck 🎭 | am_puck |
Playful, character-driven, versatile | Entertainment, comedy, character voices |
| Echo 🔊 | am_echo |
Clear, resonant, memorable | Announcements, radio-style content |
| Eric | am_eric |
Reliable, professional | Business, training, educational content |
| Liam | am_liam |
Modern, relatable, friendly | Casual content, social media, vlogs |
| Onyx 💎 | am_onyx |
Rich, deep, elegant | Premium content, luxury brands, sophistication |
| Adam | am_adam |
Classic, versatile, dependable | General purpose, all-around use |
| Santa 🎅 | am_santa |
Warm, jolly, festive | Holiday content, cheerful narration |
| Voice Name | Code | Character | Best For |
|---|---|---|---|
| Emma | bf_emma |
Refined, elegant, sophisticated | Formal content, literature, high-end narration |
| Isabella | bf_isabella |
Professional, articulate | Business, corporate, presentations |
| Alice 📚 | bf_alice |
Clear, storytelling, engaging | Children's content, education, books |
| Lily 🌸 | bf_lily |
Gentle, pleasant, approachable | General content, tutorials, friendly narration |
| Voice Name | Code | Character | Best For |
|---|---|---|---|
| George | bm_george |
Authoritative, professional, commanding | Business, education, serious content |
| Fable 📖 | bm_fable |
Narrative, expressive, storytelling | Audiobooks, tales, creative content |
| Lewis | bm_lewis |
Reliable, clear, articulate | Training, documentation, instructional content |
| Daniel | bm_daniel |
Modern, professional, versatile | General purpose, business, presentations |
| Voice Name | Code | Gender | Character | Best For |
|---|---|---|---|---|
| Hina ひな | jf_hina |
Female | Gentle, youthful, sweet | Anime, casual content, friendly narration |
| Yuki 雪 | jf_yuki |
Female | Cool, elegant, refined | Formal content, professional narration |
| Sakura 桜 | jf_sakura |
Female | Warm, traditional, pleasant | Cultural content, storytelling |
| Sora 空 | jf_sora |
Female | Bright, energetic, cheerful | Entertainment, upbeat content |
| Kaito 海斗 | jm_kaito |
Male | Strong, confident, clear | News, serious content, professional narration |
| Voice Name | Code | Character | Best For |
|---|---|---|---|
| Xiaoxiao 小小 | zf_xiaoxiao |
Gentle, friendly, approachable | General purpose, casual content |
| Yunxi 云希 | zf_yunxi |
Professional, clear, articulate | Business, news, formal content |
| Xiaoyi 小艺 | zf_xiaoyi |
Energetic, youthful, lively | Entertainment, social media |
| Xiaoxuan 小萱 | zf_xiaoxuan |
Warm, expressive, engaging | Storytelling, narration |
| Voice Name | Code | Character | Best For |
|---|---|---|---|
| Yunyang 云扬 | zm_yunyang |
Strong, authoritative, commanding | News, serious content, professional |
| Yunfeng 云枫 | zm_yunfeng |
Calm, mature, reliable | Documentation, education |
| Yunhao 云昊 | zm_yunhao |
Clear, professional, articulate | Business, presentations |
| Yunxia 云霞 | zm_yunxia |
Versatile, balanced | General purpose content |
| Voice Name | Code | Gender | Character | Best For |
|---|---|---|---|---|
| Sofia | ef_sofia |
Female | Warm, friendly, engaging | General content, narration, education |
| Diego | em_diego |
Male | Confident, clear, professional | Business, formal content, news |
| Carlos | em_carlos |
Male | Friendly, approachable, versatile | Casual content, tutorials |
| Voice Name | Code | Gender | Character | Best For |
|---|---|---|---|---|
| Céline | ff_celine |
Female | Elegant, refined, sophisticated | All French content, narration, professional |
| Voice Name | Code | Gender | Character | Best For |
|---|---|---|---|---|
| Priya | hf_priya |
Female | Friendly, warm, approachable | General content, education |
| Anjali | hf_anjali |
Female | Professional, clear, articulate | Business, formal content |
| Arjun | hm_arjun |
Male | Strong, confident, authoritative | News, serious content |
| Raj | hm_raj |
Male | Friendly, versatile, engaging | General purpose, casual content |
| Voice Name | Code | Gender | Character | Best For |
|---|---|---|---|---|
| Giulia | if_giulia |
Female | Expressive, warm, engaging | Narration, storytelling, general content |
| Marco | im_marco |
Male | Confident, professional, clear | Business, formal content, presentations |
| Voice Name | Code | Gender | Character | Best For |
|---|---|---|---|---|
| Lúcia | pf_lucia |
Female | Warm, friendly, natural | General content, education, narration |
| João | pm_joao |
Male | Professional, clear, reliable | Business, news, formal content |
| Pedro | pm_pedro |
Male | Friendly, approachable, versatile | Casual content, tutorials, general purpose |
- Add the Node: In ComfyUI, add "🔊 Geeky Kokoro TTS (2025)" node to your workflow
- Enter Text: Type or paste your text in the multiline text field
- Select Voice: Choose from 54+ voices in the dropdown
- Adjust Speed: Set speed from 0.5x (slower) to 2.0x (faster)
- GPU Option: Enable "use_gpu" if you have a CUDA-capable GPU
- Generate: Connect to audio output or preview node
Voice blending allows you to create unique vocal characteristics by mixing two voices:
- Enable Blending: Check the "enable_blending" checkbox
- Select Second Voice: Choose a second voice from the dropdown
- Adjust Blend Ratio:
1.0= 100% primary voice (no blending)0.7= 70% primary, 30% secondary (subtle blend)0.5= 50/50 mix (balanced blend)0.3= 30% primary, 70% secondary (secondary dominant)0.0= 100% secondary voice
Blending Tips:
- Mix voices from the same language for best results
- Blend male + female voices for androgynous effects
- Try
Heart + Bellaat 0.6 for energetic yet warm narration - Try
Michael + Adamat 0.5 for rich, authoritative voice - Experiment with ratios to find your perfect voice!
The game-changing feature that makes voices sing, match, and transform!
The Advanced Voice node now supports guided voice morphing - using a secondary audio file (like a song or reference voice) to guide the transformation of your TTS output. Perfect for:
- Making TTS voices "sing" along to music
- Matching tone and style of reference speakers
- Creating autotune-style effects
- Professional voice-over matching
-
Connect Guide Audio:
- Load your guide audio (song, reference voice, etc.)
- Connect it to the
guide_audioinput on the Advanced Voice node
-
Enable Morphing:
- Check the
enable_guided_morphcheckbox
- Check the
-
Adjust Morph Parameters (0.0 to 1.0):
- Pitch Morph: Match pitch contour to guide audio (autotune effect)
- Formant Morph: Match vocal character and tone
- Spectral Morph: Match overall timbre and frequency balance
- Amplitude Morph: Match dynamics and volume envelope
- 0.0: No pitch change (original TTS pitch)
- 0.3-0.5: Subtle pitch guidance (natural autotune)
- 0.7-0.9: Strong pitch matching (follows melody closely)
- 1.0: Complete pitch matching (perfect autotune)
Use Cases:
- Music: 0.7-1.0 to make voice follow melody
- Speech matching: 0.3-0.5 for natural intonation
- Character voice: 0.0 (use manual pitch shift instead)
- Matches the vocal tract characteristics
- Affects perceived age, gender, and character
- 0.0: Original voice character
- 0.5: Blend of both voices
- 1.0: Fully matched character
Use Cases:
- Voice cloning: 0.6-0.9
- Gender transformation: 0.5-0.7
- Age adjustment: 0.4-0.6
- Matches overall frequency spectrum and timbre
- Affects "brightness", "warmth", and tonal quality
- Most subtle but powerful for natural matching
Use Cases:
- Microphone matching: 0.5-0.7
- Tone matching: 0.6-0.8
- Style transfer: 0.4-0.6
- Matches volume dynamics and expression
- Follows the energy and intensity patterns
- Great for emotional expression
Use Cases:
- Dynamic speech: 0.5-0.7
- Singing expression: 0.6-0.8
- Whisper/shout: 0.4-0.6
Setup:
1. Generate TTS with lyrics text
2. Load instrumental or vocal track as guide_audio
3. Enable guided morph
4. Set: pitch_morph=0.8, formant_morph=0.3, spectral_morph=0.4
Result: Voice follows melody while maintaining TTS character
Setup:
1. Generate TTS with script
2. Load reference speaker audio as guide_audio
3. Enable guided morph
4. Set: pitch_morph=0.4, formant_morph=0.7, spectral_morph=0.6
Result: TTS matches speaking style and voice character
Setup:
1. Generate TTS with any text
2. Load musical scale or melody as guide_audio
3. Enable guided morph
4. Set: pitch_morph=1.0, formant_morph=0.0, spectral_morph=0.2
Result: Perfect pitch-corrected robotic singing effect
Connect the TTS output to "🎛️ Geeky Kokoro Advanced Voice (2025)" node for effects:
Original Profiles:
- Cinematic: Deep, movie-trailer style (-3 semitones, reverb, compression)
- Monster: Growling creature voice (-6 semitones, formant shift, distortion)
- Robot: Mechanical, synthesized voice (band-pass filter, modulation)
- Child: Young character voice (+3 semitones, formant shift)
- Darth Vader: Deep, breathing villain voice (-4 semitones, echo, modulation)
- Singer: Optimized for vocal content (compression, EQ, reverb)
NEW Profiles:
- Alien: Otherworldly voice (-8 semitones, extreme formant shift, modulation)
- Deep Voice: Professional bass voice (-5 semitones, bass boost)
- Chipmunk: High-pitched cartoon voice (+6 semitones, formant shift up)
- Telephone: Classic phone quality (300-3400Hz bandpass, compression)
- Radio: Broadcast radio sound (100-5000Hz, compression, EQ)
- Cathedral: Large reverberant space (heavy reverb, echo)
- Cave: Echo chamber effect (reverb, echo with feedback)
- Metallic: Robotic metallic sound (ring modulation, bandpass)
- Whisper: Quiet breathy voice (noise, reduced bass)
- Shout: Loud emphasized voice (compression, distortion, mid boost)
- Custom: Full manual control of all parameters
- Pitch Shift: ±12 semitones (0.1 step precision)
- Formant Shift: Vocal tract size adjustment (-5 to +5)
- Time Stretch: Speed without pitch change (0.5x to 2.0x)
- Reverb: Room ambiance with room size control
- Echo: Discrete repeats with adjustable feedback
- Distortion: Harmonic saturation (0.0 to 1.0)
- Compression: Dynamic range control
- 3-Band EQ: Bass, Mid, Treble (-1.0 to +1.0)
- Brightness: High-frequency emphasis (-1.0 to +1.0)
- Warmth: Low-frequency emphasis (-1.0 to +1.0)
- Effect Blend: Mix with original audio (0.0 to 1.0)
- Output Volume: -60dB to +60dB
- Model: Kokoro-82M v0.19
- Parameters: 82 million
- Architecture: Decoder-only based on StyleTTS 2 + ISTFTNet
- Sample Rate: 24kHz
- License: Apache 2.0
- Repository: hexgrad/Kokoro-82M
Processing Speed (Python 3.12, CUDA GPU):
- Short text (< 200 chars): ~2-3 seconds
- Medium text (200-800 chars): ~5-10 seconds
- Long text (800+ chars): ~15-30 seconds
- Voice blending: +20% processing time
- Voice effects: +5-15% processing time
- Guided morphing: +30-50% processing time (feature extraction + morphing)
Memory Usage:
- Base model: ~2GB VRAM/RAM
- With GPU acceleration: ~3GB VRAM
- Voice effects processing: +500MB
- Voice blending: +200MB temporary
- Guided morphing: +800MB-1.5GB (feature extraction + DTW alignment)
Feature Extraction:
- Pitch Tracking: PYIN algorithm with autocorrelation fallback
- Formant Analysis: LPC (Linear Predictive Coding) with Levinson-Durbin recursion
- Spectral Envelope: Cepstral smoothing with liftering
- Amplitude Envelope: RMS energy tracking
- MFCC: 13-coefficient mel-frequency cepstral analysis
Morphing Algorithms:
- Dynamic Time Warping (DTW): Aligns feature sequences between source and guide
- Phase Vocoder: Time-varying pitch shifting with STFT
- Spectral Transfer: Magnitude envelope morphing with phase preservation
- Volume Matching: RMS-based amplitude envelope transfer
Supported Guide Audio:
- Any sample rate (auto-resampling to 24kHz)
- Mono or stereo (auto-converted to mono)
- WAV, MP3, FLAC, OGG formats
- Recommended: 16kHz+ sample rate for best results
- Intelligent Chunking: Automatically splits long texts while preserving sentence order
- Chunk Size: 350 characters (configurable)
- Gap Insertion: 150ms natural pauses between chunks
- Paragraph Awareness: Respects paragraph breaks and structure
- Punctuation Handling: Proper sentence boundary detection
a- American Englishb- British Englishj- Japanesez- Mandarin Chinesee- Spanishf- Frenchh- Hindii- Italianp- Brazilian Portuguese
Solution:
pip install --upgrade kokoro>=0.9.4Solution:
- Restart ComfyUI completely
- Check console for specific error messages
- Ensure all dependencies are installed
- Try reinstalling:
pip install --force-reinstall kokoro
Solutions:
- Disable "use_gpu" option
- Reduce text length
- Close other GPU-intensive applications
- Use CPU mode for very long texts
Solutions:
- Reduce "output_volume" in Voice Mod node
- Lower "effect_blend" ratio (start at 0.3-0.5)
- Reduce distortion and compression amounts
- Check that input audio isn't already clipping
Solution:
python --version # Check your version
# Must be 3.9, 3.10, 3.11, 3.12, or 3.13
pip install --upgrade pip
pip install -r requirements.txt --force-reinstallSolution:
- Ubuntu/Debian:
sudo apt-get install espeak-ng - macOS:
brew install espeak-ng - Windows: Download from espeak-ng GitHub releases
- For long texts: Enable GPU acceleration
- For short texts: CPU mode is often faster
- Memory management: Process texts in batches if needed
- Effect intensity: Start low (30-50%) and increase gradually
- Voice blending: Keep both voices in the same language family
Contributions are welcome! Areas where help is appreciated:
- Additional voice profile presets
- Performance optimizations
- Bug reports and fixes
- Documentation improvements
- Testing on different platforms
- License: MIT License
- Author: GeekyGhost
- Repository: https://github.com/GeekyGhost/ComfyUI-Geeky-Kokoro-TTS
- License: Apache 2.0
- Author: hexgrad
- Model: https://huggingface.co/hexgrad/Kokoro-82M
- librosa: Audio processing (ISC License)
- scipy: Scientific computing (BSD License)
- PyTorch: Deep learning framework (BSD License)
- soundfile: Audio I/O (BSD License)
- hexgrad for the incredible Kokoro-82M model
- ComfyUI Team for the amazing framework
- Community testers and contributors
- Audio processing library developers
- Kokoro Model Page: https://huggingface.co/hexgrad/Kokoro-82M
- ComfyUI Documentation: https://docs.comfy.org
- Issue Tracker: https://github.com/GeekyGhost/ComfyUI-Geeky-Kokoro-TTS/issues
- Discussions: https://github.com/GeekyGhost/ComfyUI-Geeky-Kokoro-TTS/discussions
- StyleTTS 2 architecture
- ISTFTNet vocoder
- Phase vocoder techniques
- Voice morphing and blending
Node: Geeky Kokoro TTS (2025)
Text: "Welcome to my tutorial on advanced AI techniques."
Voice: 🇺🇸 🚺 Nicole 🎧
Speed: 1.0
GPU: true
Node 1: Geeky Kokoro TTS (2025)
Voice: 🇺🇸 🚹 Puck 🎭
Text: "The villain laughed menacingly."
Node 2: Geeky Kokoro Advanced Voice
Profile: Monster
Intensity: 0.7
Node: Geeky Kokoro TTS (2025)
Voice: 🇺🇸 🚺 Heart ❤️
Enable Blending: true
Second Voice: 🇺🇸 🚺 Bella 🔥
Blend Ratio: 0.6
Text: "This creates a warm yet energetic voice perfect for marketing."
Made with ❤️ for the ComfyUI community
Enjoy natural, high-quality text-to-speech with 54+ voices and unlimited creative possibilities! 🎉