Open
Description
Hello everyone,
As promised, last month we brought multiple-LoRA and ControlNet-Union-Pro support with faster generation speed. Additionally, we expanded support for 20-series GPUs. We understand some of you may still have faced issues, but rest assured. We're actively working on refining the codebase for better stability, compatibility and user experience.
This roadmap outlines our key development goals for April 2025. The next release is scheduled for mid-May. As always, we welcome your contributions and feedback!
April Focus Areas
- Simplify the deepcompressor backend to reduce quantization costs.
- More comprehensive control support.
- Address memory-related issues to improve stability.
Quantization
- Simplify deepcompressor backend to ease the use and reduce the quantization cost (@synxlin, Can't other models be selected for the main model? #31)
- Add customized model quantization support in ComfyUI-nunchaku (@lmxyy)
- Improve fidelity of the 4-bit T5 text encoder (@Aprilhuu)
LoRA
- Add FLUX-turbo support with FLUX-fill base model (@lmxyy, Can't use lora with Nunchaku Flux Fill workflow #46)
- Support additional LoRA formats (@lmxyy, 无法使用由OneTrainer训练的lora #64, special_lora_error nunchaku#265)
- Fix LoRA combination bugs (@lmxyy, [Bug] Flux FP4 doesn't handle multiple LoRA #71)
Controls
- FP8 ControlNet-Union-Pro support (@ita9naiwa,Please Support -> Shakker-Labs_FLUX.1-dev-ControlNet-Union-Pro-fp8.safetensors nunchaku#241, Does not work with the ControlNet Upscale model #37)
- Expand support for other ControlNet models (@ita9naiwa, Does not work with the ControlNet Upscale model #37)
- Add EasyControl support
- Add PuLID support (@bowen, 有什么办法可以让pulid也有效加速吗,目前这个加速技术似乎没有办法在pulid上应用 #50, PuLID with Nunckaku nunchaku#258)
- INT4/FP4 ControlNets (Does not work with the ControlNet Upscale model #37, Any possibility to get int4 for Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro nunchaku#256)
Speed
- Implement fine-grained First-Block Cache (@Bluear7878)
Memory & Stability
- Optimize memory usage when loading T5 (@Aprilhuu )
- Clean memory cache when deleting models (@lmxyy @sxtyzhangzk, mysterious OOM issue #65, 内存占用不释放,并且会导致CPU 100% #57)
- Serialization errors (@sxtyzhangzk , C:\Users\muyang\Desktop\nunchaku-dev\src\Serialization.cpp:130? #60)
- Improve CPU offloading speed in ComfyUI (@lmxyy)
Quality
- Investigate FLUX.1-fill quality performance (@lmxyy)
- Resolve quality issues when combining ACE-plus with FLUX.1-fill (@lmxyy)
Installation
Other Fixes & Improvements
- Enable multiple-batch inference (@sxtyzhangzk @Bluear7878, AssertionError: assert image_rotary_emb.shape[2] == batch_size * (txt_tokens + img_tokens) nunchaku#148)
- Improve HuggingFace and Modelscope model documentation (@lmxyy)
- Fix device ID setting (@sxtyzhangzk , 能支持multiGPU节点串联多卡用户选择对应的cuda吗? #45)
- downloading
cache_dir
handling (@lmxyy, Error during inference: diffusers.configuration_utils.ConfigMixin.load_config() got multiple values for keyword argument 'cache_dir' nunchaku#255) - Reset
residual_diff_threshold
in First-Block Cache (@ita9naiwa, Anyway to disable set_attention_impl and apply_cache_on_pipe nunchaku#242) - Autotests and deployment CI.
Some future features in plan
- Wan2.1 Support.
- 8-bit model support.
- Operator modularization.
Metadata
Metadata
Assignees
Labels
No labels