feat: Python Backend Float16 Support for Turing GPUs #686

Bluear7878 · 2025-09-12T02:31:11Z

This is a PR that supports float16 for the v2 flux model.

Key change

Fixed CUDA kernel: Updated AWQ CUDA kernel to use input tensor dtype for dispatch instead of scaling_factors dtype
Added dtype utilities: Created nunchaku/dtype_utils.py for centralized dtype management and automatic GPU-based dtype selection
Test coverage: Added float16/bfloat16 parametrized tests in test_flux_dev.py

Juste-Leo2 · 2025-09-12T15:59:45Z

Would it be possible to also add support for the Pascal architecture? Even if the computation speed is slower (since int4 computation isn’t supported), could an on-the-fly conversion to Float16 be performed?

lmxyy · 2025-09-13T19:37:58Z

examples/v1/flux.1-dev_v2.py


 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+
+torch_dtype = torch.float16  # Auto-selects bfloat16 on Ampere+ GPUs, float16 otherwise


this variable seems to have no use

lmxyy · 2025-09-13T19:40:05Z

I think you can choose the torch_dtype by passing torch_dtype in

nunchaku/examples/v1/flux1-dev.py

Line 9 in 5809e9f

    
           f"nunchaku-tech/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"

.

Zhenyi-Wang · 2025-09-14T08:06:04Z

Hi is this for flux only or for others like qwen-image as well?

Bluear7878 · 2025-09-14T23:55:04Z

@lmxyy Thank you for the helpful suggestion. It allowed me to fix the torch_dtype related code.
@Zhenyi-Wang Not directly at the moment, but it should be applicable to qwen-image after some minor tweaks. I'll look into it soon.

Zhenyi-Wang · 2025-09-15T01:01:50Z

@lmxyy Thank you for the helpful suggestion. It allowed me to fix the torch_dtype related code. @Zhenyi-Wang Not directly at the moment, but it should be applicable to qwen-image after some minor tweaks. I'll look into it soon.

That would be fantastic! You Dalaos are so great!

lmxyy

I think you also need to support FP16 Qwen-Image.

lmxyy · 2025-09-15T05:16:14Z

src/kernels/awq/gemv_awq.cu

 Tensor gemv_awq(
    Tensor _in_feats, Tensor _kernel, Tensor _scaling_factors, Tensor _zeros, int m, int n, int k, int group_size) {
-    return dispatchFloat16(_scaling_factors.scalar_type(), [&]<typename half_t>() {
+    return dispatchFloat16(_in_feats.scalar_type(), [&]<typename half_t>() {


Why do you need to change this?

lmxyy · 2025-09-15T05:22:07Z

nunchaku/models/transformers/transformer_flux_v2.py

        transformer.load_state_dict(converted_state_dict)

+        # Convert all quantization buffers to model dtype
+        convert_awq_buffers_to_dtype(transformer, transformer._dtype)


Why do we need to convert the dtype? We can initialize the modules with the correct dtype by passing torch_dtype in _patch_model.

lmxyy · 2025-09-15T05:23:03Z

nunchaku/models/transformers/transformer_flux_v2.py

-        # Load the wtscale from the converted state dict.
+                # Match floating-point tensors to model dtype
+                v = converted_state_dict[k]
+                if isinstance(v, torch.Tensor) and v.is_floating_point():


Have you tested the model on the FP4 models? This doesn't look correct.

Zhenyi-Wang · 2025-09-17T05:09:55Z

Closed?

Bluear7878 · 2025-09-25T01:35:50Z

I have identified critical overflow errors and image quality degradation when using low-precision quantization schemes (FP4, INT4) with torch.float16. The issues seem to originate from intermediate tensor values exceeding the maximum representable range of float16, leading to either crashes or corrupted outputs.

FP4 + Float16 Causes Overflow:
- When employing FP4 quantization, operations consistently result in an overflow error when float16 is used for computations. This suggests that the dynamic range of the model's activations or weights is too high for the limited precision of float16.
Qwen-Image Model Overflow with INT4 + Float16:
- A similar overflow issue was observed specifically with the Qwen-image model using INT4 quantization and float16.
- As shown in the attached screenshot, the values significantly exceed the maximum limit of float16 (~65,504), confirming that an overflow is occurring.
Image Degradation in Flux Model with INT4 + Float16:
- While the combination of INT4 quantization, float16, and the Flux model is operational and does not crash, it produces severe image degradation.
- This is likely due to aggressive clipping, where out-of-range values are forced to the maximum/minimum float16 value instead of throwing an error. This process leads to a significant loss of information and corrupts the final generated image.

Zhenyi-Wang · 2025-09-25T01:42:10Z

As a 2080ti user, I really appreciate your work on this float16 support—could this PR also address issue #492, or would it be possible to tackle that together if not?

Zhenyi-Wang · 2025-09-25T01:52:25Z

As a 2080ti user, I really appreciate your work on this float16 support—could this PR also address issue #492, or would it be possible to tackle that together if not?

Sorry I mean #420

Bluear7878 · 2025-09-25T02:15:47Z

As a 2080ti user, I really appreciate your work on this float16 support—could this PR also address issue #492, or would it be possible to tackle that together if not?

Sorry I mean #420

With this PR, using float16 no longer causes any code-level erros.
However, if the model's required numerical range significantly exceeds what float16 can represent, it will likely result in a black screen due to overflow issues. The Flux model, on the other hand, does generate images successfully with int4+float16, albeit with somewhat degraded quality.

senb-ent · 2025-10-14T09:53:14Z

I am trying to use this branch.
I compiled the branch with Docker based on Dockerfile.torch28.

Then I tried running qwen-image-lightning.py
but got this error:
TypeError: Qwen2_5_VLForConditionalGeneration.__init__() got an unexpected keyword argument 'offload_state_dict'

My GPU: 2080Ti

@Bluear7878 Do you know what's not working?

Thanks in advance

full log:

  warnings.warn("[Nunchaku] Device does not support bfloat16; falling back to float16.")
The config attributes {'pooled_projection_dim': 768} were passed to NunchakuQwenImageTransformer2DModel, but are not expected and will be ignored. Please verify your config.json configuration file.

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]`torch_dtype` is deprecated! Use `dtype` instead!

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/nunchaku/qwen-image-lightning.py", line 43, in <module>
    pipe = QwenImagePipeline.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/nunchaku/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/nunchaku/lib/python3.11/site-packages/diffusers/pipelines/pipeline_utils.py", line 1025, in from_pretrained
    loaded_sub_model = load_sub_model(
                       ^^^^^^^^^^^^^^^
  File "/opt/conda/envs/nunchaku/lib/python3.11/site-packages/diffusers/pipelines/pipeline_loading_utils.py", line 849, in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/nunchaku/lib/python3.11/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/nunchaku/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4974, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Qwen2_5_VLForConditionalGeneration.__init__() got an unexpected keyword argument 'offload_state_dict'

ERROR conda.cli.main_run:execute(127): `conda run /bin/bash -c python qwen-image-lightning.py` failed. (See above for error)```

otherV · 2025-11-08T11:57:28Z

any update on this? nunchaku-t5 does not work on Turing GPUs yet.

silent-rain · 2025-11-10T03:35:20Z

Looking forward to supporting the 20 series

Ph0rk0z · 2025-11-29T12:34:45Z

I am using models on 2080 in comfyui. Does this only fix examples? They don't get NaN and work alright.

otherV · 2025-11-29T13:18:58Z

@Ph0rk0z This is supposed to bring awq kernel support for turing gpus, so that we can use models like nunchaku-t5 (awq-int4-flux.1-t5xxl.safetensors). I tried this pr, didn't work. If you are using models non-dependent on awq, everything works.

Ph0rk0z · 2025-11-29T17:48:22Z

I thought all nunchaku uses AWQ kernels and what was really missing was the MMQ workarounds in them. I never tried to run T5 with this though. Only full models like flux, etc.

sarim · 2025-12-17T20:02:10Z

As a 2080ti user, I really appreciate your work on this float16 support—could this PR also address issue #492, or would it be possible to tackle that together if not?

Sorry I mean #420

With this PR, using float16 no longer causes any code-level erros. However, if the model's required numerical range significantly exceeds what float16 can represent, it will likely result in a black screen due to overflow issues. The Flux model, on the other hand, does generate images successfully with int4+float16, albeit with somewhat degraded quality.

@Bluear7878 So is there any clever workaround, to make turning work with existing model files? I know almost nothing about the underlying science here but only thing I can think of is to quantize models again with custom logic for turning where all weights will be scaled down to fit in float16 range. Is that the only way forward?

Bluear7878 · 2025-12-18T03:30:45Z

@senb-ent @otherV

Sorry for late answer. I will take care of this issue.

@sarim
I believe the approach you mentioned can be used to solve this problem, but the goal of nunchaku is to support accerlation without modifying the existing model weight.
I've mentioned this issue to lmxyy and are discussing it, so please wait patiently.

float16 patch

159c70b

lmxyy changed the title ~~v2 float16 support.~~ feat: Python Backend Float16 Support for Turing GPUs Sep 12, 2025

Bluear7878 closed this Sep 12, 2025

Bluear7878 reopened this Sep 12, 2025

lmxyy requested changes Sep 13, 2025

View reviewed changes

Bluear7878 added 2 commits September 15, 2025 08:48

fix the torch_dtype

e3b31db

lint

f490b4e

lmxyy requested changes Sep 15, 2025

View reviewed changes

Bluear7878 closed this Sep 15, 2025

Bluear7878 added 2 commits September 16, 2025 11:05

flux float16

b5f0da5

qwen image something wrong

960b32c

Bluear7878 added 3 commits September 18, 2025 15:00

fp4 model issue check

8f14182

float16 debugging

7e15877

overflow report

8dfd599

Bluear7878 reopened this Sep 25, 2025

lint

347a7c6

lmxyy mentioned this pull request Oct 6, 2025

64 GB RAM required for Qwen Image? #726

Closed

healthyfat mentioned this pull request Oct 12, 2025

[Bug] AWQ kernels fail on RTX 2060 (SM75) with dtype mismatch: isTypeMatch<half_t> nunchaku-tech/ComfyUI-nunchaku#492

Open

5 tasks


		precision = get_precision() # auto-detect your precision is 'int4' or 'fp4' based on your GPU

		torch_dtype = torch.float16 # Auto-selects bfloat16 on Ampere+ GPUs, float16 otherwise

feat: Python Backend Float16 Support for Turing GPUs #686

Are you sure you want to change the base?

feat: Python Backend Float16 Support for Turing GPUs #686

Conversation

Bluear7878 commented Sep 12, 2025

Uh oh!

Juste-Leo2 commented Sep 12, 2025

Uh oh!

lmxyy Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

lmxyy commented Sep 13, 2025

Uh oh!

Zhenyi-Wang commented Sep 14, 2025

Uh oh!

Bluear7878 commented Sep 14, 2025

Uh oh!

Zhenyi-Wang commented Sep 15, 2025

Uh oh!

lmxyy left a comment

Choose a reason for hiding this comment

Uh oh!

lmxyy Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

lmxyy Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

lmxyy Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Zhenyi-Wang commented Sep 17, 2025

Uh oh!

Bluear7878 commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zhenyi-Wang commented Sep 25, 2025

Uh oh!

Zhenyi-Wang commented Sep 25, 2025

Uh oh!

Bluear7878 commented Sep 25, 2025

Uh oh!

senb-ent commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

otherV commented Nov 8, 2025

Uh oh!

silent-rain commented Nov 10, 2025

Uh oh!

Ph0rk0z commented Nov 29, 2025

Uh oh!

otherV commented Nov 29, 2025

Uh oh!

Ph0rk0z commented Nov 29, 2025

Uh oh!

sarim commented Dec 17, 2025

Uh oh!

Bluear7878 commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Bluear7878 commented Sep 25, 2025 •

edited

Loading

senb-ent commented Oct 14, 2025 •

edited

Loading