Releases: wildminder/ComfyUI-VibeVoice
v1.5.1 - Maintenance release
This is a maintenance release addressing compatibility and reliability issues reported by users.
Fixes and Improvements
-
Resolved
transformersCache Compatibility Issue:- Fixed an
AttributeError: 'VibeVoiceConfig' object has no attribute 'num_hidden_layers'that occurred during generation with certain versions of thetransformerslibrary.
- Fixed an
-
Improved
tokenizer.jsonAcquisition Reliability:- Implemented a robust fallback mechanism for obtaining the required
tokenizer.jsonfile. The node now attempts to locate the file in the following order:- The local model directory.
- A pre-packaged version within the custom node's
vibevoice/configs/directory. - Download from the primary Hugging Face repository (
Qwen/Qwen2.5-1.5B). - Download from a secondary Hugging Face repository (
Qwen/Qwen2.5-7B).
- If all methods fail, the node now raises a
RuntimeErrorwith clear instructions for manual download and placement of the file. This prevents crashes due to network or permission issues.
- Implemented a robust fallback mechanism for obtaining the required
💾 How to Install
via the ComfyUI Manager or do it manually:
git clone https://github.com/wildminder/ComfyUI-VibeVoice💾 How to Upgrade
Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:
git pullThen, restart ComfyUI.
Thank you to everyone who provided feedback and bug reports! We hope you enjoy the new flexibility and performance improvements in this release.
v1.5.0 - Enhanced Stability, Cloning Fidelity, and Hybrid Generation
This release focuses on significant improvements to voice cloning accuracy, generation stability, and script parsing flexibility. The core prompting mechanism has been reworked to align more closely with the model's training data, resolving issues of voice swapping and audio artifacts.
🚀 Key Improvements
-
Improved Voice Cloning Fidelity
-
Enhanced Generation Stability: Reworked prompt logic resolves previous instabilities that could lead to audio artifacts or gibberish output.
-
Flexible Speaker Tagging: The script parser now supports two formats for defining speakers, providing more convenience for script writing:
- Classic Format:
Speaker 1: ... - Concise Format:
[1] ...or[1]: ...
Both formats are parsed identically, ensuring consistent output.
- Classic Format:
-
Hybrid Voice Generation (Cloning + Zero-Shot): The node now supports mixed-mode generation within a single script.
- If a reference audio is provided for a speaker (e.g.,
speaker_1_voice), that voice will be cloned. - If a speaker's voice input is left empty, a unique zero-shot TTS voice will be generated for that speaker.
This allows for flexible combinations of cloned and generated voices in the same dialogue.
- If a reference audio is provided for a speaker (e.g.,
How to Use
- For Voice Cloning: Connect a reference audio clip to the corresponding
speaker_X_voiceinput. In the text, useSpeaker X:or[X]to assign that voice to a line. - For Zero-Shot TTS: Omit the reference audio for a speaker. The model will generate a voice for any speaker tag in the text that does not have a corresponding audio input.
- For Single Utterance (Zero-Shot): Provide text without any speaker tags and connect no audio inputs. The entire text will be synthesized with a default zero-shot voice.
💾 How to Install
via the ComfyUI Manager or do it manually:
git clone https://github.com/wildminder/ComfyUI-VibeVoice💾 How to Upgrade
Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:
git pullThen, restart ComfyUI.
Thank you to everyone who provided feedback and bug reports! We hope you enjoy the new flexibility and performance improvements in this release.
v1.4.0 - The Flexibility & Performance Update
The update is focused on improving model loading flexibility, fixing compatibility with the latest hardware, and incorporating valuable user feedback.
The entire node has undergone a major refactoring for a cleaner, more maintainable file structure, paving the way for easier future development.
🚀 New Features
1. Standalone Model Loading (.safetensors support)
You are no longer limited to the official Hugging Face directory structure! You can now use single-file VibeVoice models directly.
- How it works: Simply place your
.safetensorsfile (e.g.,my_custom_voice.safetensors) inside yourComfyUI/models/tts/VibeVoice/folder. - Configuration: The node will automatically look for a sidecar configuration file with the same name, but ending in
.config.json(e.g.,my_custom_voice.config.json). - Fallback: If no config file is found, the node will intelligently fall back to the default config for either the 1.5B or Large model based on the filename.
2. Support for Custom Model Folders
The node now fully respects ComfyUI's extra_model_paths.yaml file. It will automatically scan all your configured tts directories for a VibeVoice subfolder and discover any models within, whether they are in the standard directory format or as standalone files.
⚡️ Fixes & Improvements
1. Sage Attention Fix for NVIDIA Blackwell (SM90+) GPUs
- Fix: Resolved an inappropriate assertion that caused a crash when using Sage Attention on the latest NVIDIA Blackwell GPUs. The attention function selected now correctly matches Sage Attention's own implementation for Ada (SM89) and Hopper architectures.
- Performance: This fix unlocks significant performance gains. On compatible hardware, Sage Attention is now ~10% faster on average compared to SDPA.
- Validation: Tested successfully with PyTorch 2.9 nightly builds and CUDA 13.
2. Increased CFG Scale Limit
- Change: The maximum value for
cfg_scalehas been increased from2.0to10.0. - Reason: This change is based on community feedback. Some users have reported achieving excellent and often better results with unconventional settings, such as a high CFG scale at a very low step count (e.g.,
cfg_scale: 3.0at just3 steps). This update opens the door for more creative experimentation.
3. Major Code Refactoring
- The node's internal logic has been completely reorganized into a cleaner, more modular structure. This makes the code easier to understand, maintain, and build upon for future updates.
💾 How to Install
via the ComfyUI Manager or do it manually:
git clone https://github.com/wildminder/ComfyUI-VibeVoice💾 How to Upgrade
Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:
git pullThen, restart ComfyUI.
Thank you to everyone who provided feedback and bug reports! We hope you enjoy the new flexibility and performance improvements in this release.
What's Changed
- Update sage_attention_patch.py by @Jarvik7 in #35
- Update vibevoice_nodes.py by @RodriMora in #43
New Contributors
- @Jarvik7 made their first contribution in #35
- @RodriMora made their first contribution in #43
v1.3.2 - Minor fixes
🚀 Improvements & Fixes
- tokenizer.json: It can be downloaded manually and placed in the model folder.
💾 How to Install
via the ComfyUI Manager or do it manually:
git clone https://github.com/wildminder/ComfyUI-VibeVoice💾 How to Upgrade
Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:
git pullThen, restart ComfyUI.
v1.3.1 - Minor fixes
Small fix for old transformers version.
🚀 Improvements & Fixes
- Backward Compatibility: The node is now compatible with
transformerslibrary versions v4.51.3+
💾 How to Install
via the ComfyUI Manager or do it manually:
git clone https://github.com/wildminder/ComfyUI-VibeVoice💾 How to Upgrade
Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:
git pullThen, restart ComfyUI.
v1.3.0 - SageAttention Integration, Performance Enhancements
Introduced support for the SageAttention library and completely overhauled the 4-bit LLM quantization feature to be robust and reliable across different configurations.
The node is now smarter, automatically selecting the most stable settings for your chosen combination of features and providing graceful fallbacks to prevent crashes.
✨ New Features
- SageAttention Support: Full integration with the
sageattentionlibrary. This provides a high-performance, mixed-precision attention mechanism that usesint8andfp8quantization for the attention calculation, offering a new trade-off between speed and memory. - Robust 4-Bit LLM Quantization: The "Quantize LLM (4-bit)" option is now highly stable and delivers significant VRAM savings. The node intelligently configures
bitsandbytesfor either maximum memory efficiency or maximum numerical stability depending on the chosen attention mechanism. - Smart Configuration & Fallbacks: The node now automatically handles incompatible settings to prevent errors:
- If 4-bit quantization is enabled with
eagerorflash_attention_2, it will gracefully fall back to the more stablesdpamode and notify the user. - The underlying data types are now managed dynamically to ensure compatibility between all libraries (
bitsandbytes,transformers,sageattention,pytorch).
- If 4-bit quantization is enabled with
🐛 Bug Fixes & Stability Improvements
- Fixed SageAttention Crashes on Windows: Resolved Triton JIT compilation errors . This allows
sagemode to work out-of-the-box on standard ComfyUI installations. - Fixed Numerical Instability (
NaN/InfErrors): Corrected CUDA assertion errors (Assertion 'input[0] != 0' failed) that occurred when combining 4-bit quantization with SageAttention by forcing a more stablefp32compute data type for that specific combination. - Resolved All
dtypeMismatches: FixedRuntimeErrorcrashes in full-precision mode caused bydtypeconflicts (BFloat16 != Half) between the SageAttention output and the model's linear layers. - Corrected SageAttention Kernel Assertions: Fixed a low-level CUDA assertion (
value.size(1) == num_kv_heads) by ensuring tensors passed to the SageAttention kernel have the correct shape for Grouped-Query Attention. - Prevented 4-Bit Quantization Crashes: Fixed errors when using
eager(.float() is not supported) andflash_attention_2(only support fp16 and bf16) modes with 4-bit quantization via the new smart fallback system. - Addressed Deprecation Warning: Updated the model loading call to use
dtypeinstead of the deprecatedtorch_dtypeargument, cleaning up console warnings.
✅ Tested Environments
- Python 3.12, PyTorch
2.9.0.dev20250902, CUDA 12.8 - Python 3.12, PyTorch
2.3.8, CUDA 12.8
💾 How to Install
via the ComfyUI Manager or do it manually:
git clone https://github.com/wildminder/ComfyUI-VibeVoice💾 How to Upgrade
Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:
git pullThen, restart ComfyUI.
Thank you for your support and collaboration in making this node better
v1.2.0 - Major Compatibility Update & VRAM Management
This is a significant update that resolves critical compatibility issues with the latest transformers library, introduces a powerful new VRAM management feature, and fixes several stability bugs. Users are strongly encouraged to update
🚀 Key Features & Improvements
-
✅ Transformers 4.56+ Compatibility: This release fixes the critical
_prepare_cache_for_generation() takes 6 positional arguments but 7 were givenerror, making the node fully compatible with the latest versions of thetransformerslibrary. The fix is backwards compatible, so it will continue to work seamlessly with older versions as well. -
🔧 New
force_offloadParameter: A new toggle has been added to the node to force the model to offload from VRAM after each generation. This is incredibly useful for complex workflows or systems with limited VRAM, helping to prevent out-of-memory errors.- Keep it enabled for maximum memory savings between runs.
- Keep it disabled for faster subsequent generations if you have sufficient VRAM.
- 🗣️ Enhanced Multi-Speaker Stability: Fixed a critical bug related to
DynamicCachein newertransformersversions that caused errors during multi-speaker audio generation. You can now reliably use multiple speakers without issues.
🐛 Bug Fixes
This release also addresses several underlying bugs to improve stability and compatibility with ComfyUI:
- Fixed ComfyUI API Incompatibility: Resolved an error by replacing a call to the non-existent
unload_model_clones()with the correctunload_all_models()function from ComfyUI's model management API. - Fixed
AttributeErroron Offload: Corrected anAttributeError: 'VibeVoicePatcher' object has no attribute 'is_loaded'that occurred when using the newforce_offloadfeature. - Fixed
DynamicCacheError: The code no longer incorrectly attempts to access.key_cacheonDynamicCacheobjects, which resolves errors and ensures multi-speaker functionality works correctly with recent library updates.
✅ What This Means For You
- Upgrade Safely: You can now update your
transformerslibrary without worrying about breaking the VibeVoice node. - Better VRAM Management: Use the
force_offloadoption to free up precious GPU memory for other tasks in your workflow. - More Reliability: Multi-speaker generation is now stable on the latest libraries, and interactions with ComfyUI's core are more robust.
💾 How to Install
via the ComfyUI Manager or do it manually:
git clone https://github.com/wildminder/ComfyUI-VibeVoice💾 How to Upgrade
Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:
git pullThen, restart ComfyUI.
A huge thank you to the community for reporting these issues. This update makes the VibeVoice node more stable, flexible, and future-proof.