Skip to content

Releases: wildminder/ComfyUI-VibeVoice

v1.5.1 - Maintenance release

25 Sep 10:19
99a9803

Choose a tag to compare

This is a maintenance release addressing compatibility and reliability issues reported by users.

Fixes and Improvements

  • Resolved transformers Cache Compatibility Issue:

    • Fixed an AttributeError: 'VibeVoiceConfig' object has no attribute 'num_hidden_layers' that occurred during generation with certain versions of the transformers library.
  • Improved tokenizer.json Acquisition Reliability:

    • Implemented a robust fallback mechanism for obtaining the required tokenizer.json file. The node now attempts to locate the file in the following order:
      1. The local model directory.
      2. A pre-packaged version within the custom node's vibevoice/configs/ directory.
      3. Download from the primary Hugging Face repository (Qwen/Qwen2.5-1.5B).
      4. Download from a secondary Hugging Face repository (Qwen/Qwen2.5-7B).
    • If all methods fail, the node now raises a RuntimeError with clear instructions for manual download and placement of the file. This prevents crashes due to network or permission issues.

💾 How to Install

via the ComfyUI Manager or do it manually:

git clone https://github.com/wildminder/ComfyUI-VibeVoice

💾 How to Upgrade

Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:

git pull

Then, restart ComfyUI.


Thank you to everyone who provided feedback and bug reports! We hope you enjoy the new flexibility and performance improvements in this release.

v1.5.0 - Enhanced Stability, Cloning Fidelity, and Hybrid Generation

24 Sep 15:42
9c750be

Choose a tag to compare

This release focuses on significant improvements to voice cloning accuracy, generation stability, and script parsing flexibility. The core prompting mechanism has been reworked to align more closely with the model's training data, resolving issues of voice swapping and audio artifacts.

🚀 Key Improvements

  • Improved Voice Cloning Fidelity

  • Enhanced Generation Stability: Reworked prompt logic resolves previous instabilities that could lead to audio artifacts or gibberish output.

  • Flexible Speaker Tagging: The script parser now supports two formats for defining speakers, providing more convenience for script writing:

    • Classic Format: Speaker 1: ...
    • Concise Format: [1] ... or [1]: ...
      Both formats are parsed identically, ensuring consistent output.
  • Hybrid Voice Generation (Cloning + Zero-Shot): The node now supports mixed-mode generation within a single script.

    • If a reference audio is provided for a speaker (e.g., speaker_1_voice), that voice will be cloned.
    • If a speaker's voice input is left empty, a unique zero-shot TTS voice will be generated for that speaker.
      This allows for flexible combinations of cloned and generated voices in the same dialogue.

How to Use

  • For Voice Cloning: Connect a reference audio clip to the corresponding speaker_X_voice input. In the text, use Speaker X: or [X] to assign that voice to a line.
  • For Zero-Shot TTS: Omit the reference audio for a speaker. The model will generate a voice for any speaker tag in the text that does not have a corresponding audio input.
  • For Single Utterance (Zero-Shot): Provide text without any speaker tags and connect no audio inputs. The entire text will be synthesized with a default zero-shot voice.

💾 How to Install

via the ComfyUI Manager or do it manually:

git clone https://github.com/wildminder/ComfyUI-VibeVoice

💾 How to Upgrade

Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:

git pull

Then, restart ComfyUI.


Thank you to everyone who provided feedback and bug reports! We hope you enjoy the new flexibility and performance improvements in this release.

v1.4.0 - The Flexibility & Performance Update

10 Sep 09:46
5dd167a

Choose a tag to compare

The update is focused on improving model loading flexibility, fixing compatibility with the latest hardware, and incorporating valuable user feedback.

The entire node has undergone a major refactoring for a cleaner, more maintainable file structure, paving the way for easier future development.

🚀 New Features

1. Standalone Model Loading (.safetensors support)

You are no longer limited to the official Hugging Face directory structure! You can now use single-file VibeVoice models directly.

  • How it works: Simply place your .safetensors file (e.g., my_custom_voice.safetensors) inside your ComfyUI/models/tts/VibeVoice/ folder.
  • Configuration: The node will automatically look for a sidecar configuration file with the same name, but ending in .config.json (e.g., my_custom_voice.config.json).
  • Fallback: If no config file is found, the node will intelligently fall back to the default config for either the 1.5B or Large model based on the filename.

2. Support for Custom Model Folders

The node now fully respects ComfyUI's extra_model_paths.yaml file. It will automatically scan all your configured tts directories for a VibeVoice subfolder and discover any models within, whether they are in the standard directory format or as standalone files.


⚡️ Fixes & Improvements

1. Sage Attention Fix for NVIDIA Blackwell (SM90+) GPUs

  • Fix: Resolved an inappropriate assertion that caused a crash when using Sage Attention on the latest NVIDIA Blackwell GPUs. The attention function selected now correctly matches Sage Attention's own implementation for Ada (SM89) and Hopper architectures.
  • Performance: This fix unlocks significant performance gains. On compatible hardware, Sage Attention is now ~10% faster on average compared to SDPA.
  • Validation: Tested successfully with PyTorch 2.9 nightly builds and CUDA 13.

2. Increased CFG Scale Limit

  • Change: The maximum value for cfg_scale has been increased from 2.0 to 10.0.
  • Reason: This change is based on community feedback. Some users have reported achieving excellent and often better results with unconventional settings, such as a high CFG scale at a very low step count (e.g., cfg_scale: 3.0 at just 3 steps). This update opens the door for more creative experimentation.

3. Major Code Refactoring

  • The node's internal logic has been completely reorganized into a cleaner, more modular structure. This makes the code easier to understand, maintain, and build upon for future updates.

💾 How to Install

via the ComfyUI Manager or do it manually:

git clone https://github.com/wildminder/ComfyUI-VibeVoice

💾 How to Upgrade

Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:

git pull

Then, restart ComfyUI.


Thank you to everyone who provided feedback and bug reports! We hope you enjoy the new flexibility and performance improvements in this release.

What's Changed

New Contributors

v1.3.2 - Minor fixes

03 Sep 18:00
58b46e2

Choose a tag to compare

🚀 Improvements & Fixes

  • tokenizer.json: It can be downloaded manually and placed in the model folder.

💾 How to Install

via the ComfyUI Manager or do it manually:

git clone https://github.com/wildminder/ComfyUI-VibeVoice

💾 How to Upgrade

Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:

git pull

Then, restart ComfyUI.


v1.3.1 - Minor fixes

03 Sep 15:27
4fe38c9

Choose a tag to compare

Small fix for old transformers version.

🚀 Improvements & Fixes

  • Backward Compatibility: The node is now compatible with transformers library versions v4.51.3+

💾 How to Install

via the ComfyUI Manager or do it manually:

git clone https://github.com/wildminder/ComfyUI-VibeVoice

💾 How to Upgrade

Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:

git pull

Then, restart ComfyUI.


v1.3.0 - SageAttention Integration, Performance Enhancements

03 Sep 09:42
0213ce5

Choose a tag to compare

Introduced support for the SageAttention library and completely overhauled the 4-bit LLM quantization feature to be robust and reliable across different configurations.

The node is now smarter, automatically selecting the most stable settings for your chosen combination of features and providing graceful fallbacks to prevent crashes.

✨ New Features

  • SageAttention Support: Full integration with the sageattention library. This provides a high-performance, mixed-precision attention mechanism that uses int8 and fp8 quantization for the attention calculation, offering a new trade-off between speed and memory.
  • Robust 4-Bit LLM Quantization: The "Quantize LLM (4-bit)" option is now highly stable and delivers significant VRAM savings. The node intelligently configures bitsandbytes for either maximum memory efficiency or maximum numerical stability depending on the chosen attention mechanism.
  • Smart Configuration & Fallbacks: The node now automatically handles incompatible settings to prevent errors:
    • If 4-bit quantization is enabled with eager or flash_attention_2, it will gracefully fall back to the more stable sdpa mode and notify the user.
    • The underlying data types are now managed dynamically to ensure compatibility between all libraries (bitsandbytes, transformers, sageattention, pytorch).

🐛 Bug Fixes & Stability Improvements

  • Fixed SageAttention Crashes on Windows: Resolved Triton JIT compilation errors . This allows sage mode to work out-of-the-box on standard ComfyUI installations.
  • Fixed Numerical Instability (NaN/Inf Errors): Corrected CUDA assertion errors (Assertion 'input[0] != 0' failed) that occurred when combining 4-bit quantization with SageAttention by forcing a more stable fp32 compute data type for that specific combination.
  • Resolved All dtype Mismatches: Fixed RuntimeError crashes in full-precision mode caused by dtype conflicts (BFloat16 != Half) between the SageAttention output and the model's linear layers.
  • Corrected SageAttention Kernel Assertions: Fixed a low-level CUDA assertion (value.size(1) == num_kv_heads) by ensuring tensors passed to the SageAttention kernel have the correct shape for Grouped-Query Attention.
  • Prevented 4-Bit Quantization Crashes: Fixed errors when using eager (.float() is not supported) and flash_attention_2 (only support fp16 and bf16) modes with 4-bit quantization via the new smart fallback system.
  • Addressed Deprecation Warning: Updated the model loading call to use dtype instead of the deprecated torch_dtype argument, cleaning up console warnings.

✅ Tested Environments

  • Python 3.12, PyTorch 2.9.0.dev20250902, CUDA 12.8
  • Python 3.12, PyTorch 2.3.8, CUDA 12.8

💾 How to Install

via the ComfyUI Manager or do it manually:

git clone https://github.com/wildminder/ComfyUI-VibeVoice

💾 How to Upgrade

Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:

git pull

Then, restart ComfyUI.


Thank you for your support and collaboration in making this node better

v1.2.0 - Major Compatibility Update & VRAM Management

03 Sep 06:54
2aa03a8

Choose a tag to compare

This is a significant update that resolves critical compatibility issues with the latest transformers library, introduces a powerful new VRAM management feature, and fixes several stability bugs. Users are strongly encouraged to update

🚀 Key Features & Improvements

  • Transformers 4.56+ Compatibility: This release fixes the critical _prepare_cache_for_generation() takes 6 positional arguments but 7 were given error, making the node fully compatible with the latest versions of the transformers library. The fix is backwards compatible, so it will continue to work seamlessly with older versions as well.

  • 🔧 New force_offload Parameter: A new toggle has been added to the node to force the model to offload from VRAM after each generation. This is incredibly useful for complex workflows or systems with limited VRAM, helping to prevent out-of-memory errors.

    • Keep it enabled for maximum memory savings between runs.
    • Keep it disabled for faster subsequent generations if you have sufficient VRAM.
image
  • 🗣️ Enhanced Multi-Speaker Stability: Fixed a critical bug related to DynamicCache in newer transformers versions that caused errors during multi-speaker audio generation. You can now reliably use multiple speakers without issues.

🐛 Bug Fixes

This release also addresses several underlying bugs to improve stability and compatibility with ComfyUI:

  • Fixed ComfyUI API Incompatibility: Resolved an error by replacing a call to the non-existent unload_model_clones() with the correct unload_all_models() function from ComfyUI's model management API.
  • Fixed AttributeError on Offload: Corrected an AttributeError: 'VibeVoicePatcher' object has no attribute 'is_loaded' that occurred when using the new force_offload feature.
  • Fixed DynamicCache Error: The code no longer incorrectly attempts to access .key_cache on DynamicCache objects, which resolves errors and ensures multi-speaker functionality works correctly with recent library updates.

✅ What This Means For You

  • Upgrade Safely: You can now update your transformers library without worrying about breaking the VibeVoice node.
  • Better VRAM Management: Use the force_offload option to free up precious GPU memory for other tasks in your workflow.
  • More Reliability: Multi-speaker generation is now stable on the latest libraries, and interactions with ComfyUI's core are more robust.

💾 How to Install

via the ComfyUI Manager or do it manually:

git clone https://github.com/wildminder/ComfyUI-VibeVoice

💾 How to Upgrade

Update via the ComfyUI Manager, or navigate to your ComfyUI/custom_nodes/ComfyUI-VibeVoice directory and run:

git pull

Then, restart ComfyUI.


A huge thank you to the community for reporting these issues. This update makes the VibeVoice node more stable, flexible, and future-proof.