Paper | Website | Blog | Demo | HuggingFace | ModelScope

English | 中文

This repository provides the ComfyUI node for Nunchaku, an efficient inference engine for 4-bit neural networks quantized with SVDQuant. For the quantization library, check out DeepCompressor.

Join our user groups on Slack, Discord and WeChat for discussions—details here. If you have any questions, run into issues, or are interested in contributing, feel free to share your thoughts with us!

Nunchaku ComfyUI Node

News

[2025-06-29] 🔥 v0.3.3 now supports FLUX.1-Kontext-dev! Download the quantized model from HuggingFace or ModelScope and use this workflow to get started.
[2025-06-11] Starting from v0.3.2, you can now easily install or update the Nunchaku wheel using this workflow!
[2025-06-07] 🚀 Release Patch v0.3.1! We bring back FB Cache support and fix 4-bit text encoder loading. PuLID nodes are now optional and won’t interfere with other nodes. We've also added a NunchakuWheelInstaller node to help you install the correct Nunchaku wheel.
[2025-06-01] 🚀 Release v0.3.0! This update adds support for multiple-batch inference, ControlNet-Union-Pro 2.0 and initial integration of PuLID. You can now load Nunchaku FLUX models as a single file, and our upgraded 4-bit T5 encoder now matches FP8 T5 in quality!
[2025-04-16] 🎥 Released tutorial videos in both English and Chinese to assist installation and usage.
[2025-04-09] 📢 Published the April roadmap and an FAQ to help the community get started and stay up to date with Nunchaku’s development.
[2025-04-05] 🚀 Release v0.2.0! This release introduces multi-LoRA and ControlNet support, with enhanced performance using FP16 attention and First-Block Cache. We've also added 20-series GPU compatibility and official workflows for FLUX.1-redux!

Installation

We provide tutorial videos to help you install and use Nunchaku on Windows, available in both English and Chinese. You can also follow the corresponding step-by-step text guide at docs/setup_windows.md. If you run into issues, these resources are a good place to start.

Step 1: Install the ComfyUI Plugin

You can use the the following way to install the ComfyUI-nunchaku plugin.

Comfy-CLI

You can easily use comfy-cli to run ComfyUI with Nunchaku:

pip install comfy-cli  # Install ComfyUI CLI
comfy install          # Install ComfyUI
comfy node registry-install ComfyUI-nunchaku  # Install Nunchaku

ComfyUI-Manager

Install ComfyUI with

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Install ComfyUI-Manager with the following commands:

cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager comfyui-manager

Launch ComfyUI

cd ..  # Return to the ComfyUI root directory
python main.py

Open the Manager, search ComfyUI-nunchaku in the Custom Nodes Manager and then install it.

Manual Installation

Set up ComfyUI with the following commands:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Clone this repository into the custom_nodes directory inside ComfyUI:

cd custom_nodes
git clone https://github.com/mit-han-lab/ComfyUI-nunchaku nunchaku_nodes

Step 2: Install the Nunchaku Backend

Starting from ComfyUI-nunchaku v0.3.2, you can easily install or update the Nunchaku wheel using this workflow file, once all dependencies are installed.

Alternatively, you can follow the manual installation instructions in the Nunchaku README.

Usage

Set Up ComfyUI and Nunchaku:

Nunchaku workflows can be found at workflows. To use them, copy the files to user/default/workflows in the ComfyUI root directory:

cd ComfyUI

# Create the example_workflows directory if it doesn't exist
mkdir -p user/default/example_workflows

# Copy workflow configurations
cp custom_nodes/nunchaku_nodes/example_workflows/* user/default/example_workflows/

Install any missing nodes (e.g., comfyui-inpainteasy) by following this tutorial.

Download Required Models: Follow this tutorial to download the necessary models into the appropriate directories. Alternatively, use the following commands:

huggingface-cli download comfyanonymous/flux_text_encoders clip_l.safetensors --local-dir models/text_encoders
huggingface-cli download comfyanonymous/flux_text_encoders t5xxl_fp16.safetensors --local-dir models/text_encoders
huggingface-cli download black-forest-labs/FLUX.1-schnell ae.safetensors --local-dir models/vae

Run ComfyUI: To start ComfyUI, navigate to its root directory and run python main.py. If you are using comfy-cli, simply run comfy launch.
Select the Nunchaku Workflow: Choose one of the Nunchaku workflows (workflows that start with nunchaku-) to get started. For the flux.1-fill workflow, you can use the built-in MaskEditor tool to apply a mask over an image.
All the 4-bit models are available at our HuggingFace or ModelScope collection. Except svdq-flux.1-t5, please download the entire model folder to models/diffusion_models.

Nunchaku Nodes

Nunchaku Flux DiT Loader: A node for loading the FLUX diffusion model.
- model_path: Path to the model folder. You must manually download the model from our Hugging Face collection or ModelScope collection. Once downloaded, set model_path to the corresponding directory.
  
  Note: Legacy model folders are still supported but will be deprecated in v0.4. To migrate, use our merge_safetensors.json workflow to merge your legacy folder into a single .safetensors file or redownload the model from the above collections.
- cache_threshold: Controls the First-Block Cache tolerance, similar to residual_diff_threshold in WaveSpeed. Increasing this value improves speed but may reduce quality. A typical value is 0.12. Setting it to 0 disables the effect.
- attention: Defines the attention implementation method. You can choose between flash-attention2 or nunchaku-fp16. Our nunchaku-fp16 is approximately 1.2× faster than flash-attention2 without compromising precision. For Turing GPUs (20-series), where flash-attention2 is unsupported, you must use nunchaku-fp16.
- cpu_offload: Enables CPU offloading for the transformer model. While this reduces GPU memory usage, it may slow down inference.
  - When set to auto, it will automatically detect your available GPU memory. If your GPU has more than 14GiB of memory, offloading will be disabled. Otherwise, it will be enabled.
  - Memory usage will be further optimized in node later.
- device_id: Indicates the GPU ID for running the model.
- data_type: Defines the data type for the dequantized tensors. Turing GPUs (20-series) do not support bfloat16 and can only use float16.
- i2f_mode: For Turing (20-series) GPUs, this option controls the GEMM implementation mode. enabled and always modes exhibit minor differences. This option is ignored on other GPU architectures.
Nunchaku FLUX LoRA Loader: A node for loading LoRA modules for SVDQuant FLUX models.
- Place your LoRA checkpoints in the models/loras directory. These will appear as selectable options under lora_name.
- lora_strength: Controls the strength of the LoRA module.
- You can connect multiple LoRA nodes together.
- Note: Starting from version 0.2.0, there is no need to convert LoRAs. Simply provide the original LoRA files to the loader.
Nunchaku Text Encoder Loader V2: A node for loading the text encoders.
- Select the CLIP and T5 models to use as text_encoder1 and text_encoder2, following the same convention as in DualCLIPLoader. In addition, you may choose to use our enhanced 4-bit T5XXL model for saving more GPU memory.
- t5_min_length: Sets the minimum sequence length for T5 text embeddings. The default in DualCLIPLoader is hardcoded to 256, but for better image quality, use 512 here.
Nunchaku Wheel Installer: A utility node for automatically installing the correct version of Nunchaku wheels. After installation, please restart ComfyUI to apply the changes.
- source: Select the source of the wheel. Available options include GitHub Release, HuggingFace, and ModelScope.
- version: Choose the compatible Nunchaku version to install.
Nunchaku Text Encoder Loader (will be deprecated in v0.4): A node for loading the text encoders.
- For FLUX, use the following files:
  - text_encoder1: t5xxl_fp16.safetensors (or FP8/GGUF versions of T5 encoders).
  - text_encoder2: clip_l.safetensors
- t5_min_length: Sets the minimum sequence length for T5 text embeddings. The default in DualCLIPLoader is hardcoded to 256, but for better image quality, use 512 here.
- use_4bit_t5: Specifies whether you need to use our quantized 4-bit T5 to save GPU memory.
- int4_model: Specifies the INT4 T5 location. This option is only used when use_4bit_t5 is enabled. You can download our INT4 T5 model folder to models/text_encoders from HuggingFace or ModelScope. For example, you can run the following command:
```
huggingface-cli download mit-han-lab/svdq-flux.1-t5 --local-dir models/text_encoders/svdq-flux.1-t5
```
  After downloading, specify the corresponding folder name as the int4_model.
FLUX.1 Depth Preprocessor (will be deprecated in v0.4) : A legacy node for loading a depth estimation model and producing a corresponding depth map. The model_path parameter specifies the location of the model checkpoint. You can manually download the model repository from Hugging Face and place it under the models/checkpoints directory. Alternatively, use the following CLI command:
```
huggingface-cli download LiheYoung/depth-anything-large-hf --local-dir models/checkpoints/depth-anything-large-hf
```
Note: This node is deprecated and will be removed in a future release. Please use the updated "Depth Anything" node with the depth_anything_vitl14.pth model file instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!