feat: enable dynamic cpu-offloading for qwen image based models by Cyclica-bin · Pull Request #791 · nunchaku-ai/ComfyUI-nunchaku

Cyclica-bin · 2026-01-25T19:24:57Z

Motivation

This PR automates CPU offloading in NunchakuQwenImageDiTLoader, replacing the manual block-count parameter with a dynamic offloading system.

Currently, users must manually specify the number of blocks to load into VRAM. This leads to:

Trial and error: Users have to guess how many blocks fit their specific GPU.
Crashes: If other processes (or browser tabs) consume VRAM, the static block count causes OOM (Out of Memory) errors.
Speeds up the workflow execution even without --lowvram flag if user has sufficient RAM.

Removal of num_blocks_on_gpu: Removed the manual input parameter from the ComfyUI node since users no longer need to guess how many transformer blocks their free VRAM can hold.
API Update on load_model: Removed the same parameter from the argument list of load_model method.
Uses Comfy UI's model management utilities to
- Calculate the available free memory of GPU
- The size of transformer per block calculation
- Identifies the blocks that must be reserved in the memory that are variable size based on nature ( minor perf improvements )
Added additional clamping logic just in case there are architectural changes to base model revision to avoid unexpected crashes.

For reviewers: If you're only helping merge the main branch and haven't contributed code to this PR, please remove yourself as a co-author when merging.
Please feel free to join our Discord ~~orWeChat~~ to discuss your PR.

feat: enable dynamic cpu-offloading for qwen image based models

1301fb0