-
Notifications
You must be signed in to change notification settings - Fork 212
feat: Python Backend Float16 Support for Turing GPUs #686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Would it be possible to also add support for the Pascal architecture? Even if the computation speed is slower (since int4 computation isn’t supported), could an on-the-fly conversion to Float16 be performed? |
examples/v1/flux.1-dev_v2.py
Outdated
|
|
||
| precision = get_precision() # auto-detect your precision is 'int4' or 'fp4' based on your GPU | ||
|
|
||
| torch_dtype = torch.float16 # Auto-selects bfloat16 on Ampere+ GPUs, float16 otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this variable seems to have no use
|
I think you can choose the nunchaku/examples/v1/flux1-dev.py Line 9 in 5809e9f
|
|
Hi is this for flux only or for others like qwen-image as well? |
|
@lmxyy Thank you for the helpful suggestion. It allowed me to fix the torch_dtype related code. |
That would be fantastic! You Dalaos are so great! |
lmxyy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also need to support FP16 Qwen-Image.
src/kernels/awq/gemv_awq.cu
Outdated
| Tensor gemv_awq( | ||
| Tensor _in_feats, Tensor _kernel, Tensor _scaling_factors, Tensor _zeros, int m, int n, int k, int group_size) { | ||
| return dispatchFloat16(_scaling_factors.scalar_type(), [&]<typename half_t>() { | ||
| return dispatchFloat16(_in_feats.scalar_type(), [&]<typename half_t>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to change this?
| transformer.load_state_dict(converted_state_dict) | ||
|
|
||
| # Convert all quantization buffers to model dtype | ||
| convert_awq_buffers_to_dtype(transformer, transformer._dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to convert the dtype? We can initialize the modules with the correct dtype by passing torch_dtype in _patch_model.
| # Load the wtscale from the converted state dict. | ||
| # Match floating-point tensors to model dtype | ||
| v = converted_state_dict[k] | ||
| if isinstance(v, torch.Tensor) and v.is_floating_point(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested the model on the FP4 models? This doesn't look correct.
|
Closed? |
|
As a 2080ti user, I really appreciate your work on this float16 support—could this PR also address issue #492, or would it be possible to tackle that together if not? |
With this PR, using float16 no longer causes any code-level erros. |
|
I am trying to use this branch. Then I tried running My GPU: 2080Ti @Bluear7878 Do you know what's not working? Thanks in advance full log: |
|
any update on this? nunchaku-t5 does not work on Turing GPUs yet. |
|
Looking forward to supporting the 20 series |
|
I am using models on 2080 in comfyui. Does this only fix examples? They don't get NaN and work alright. |
|
@Ph0rk0z This is supposed to bring awq kernel support for turing gpus, so that we can use models like nunchaku-t5 (awq-int4-flux.1-t5xxl.safetensors). I tried this pr, didn't work. If you are using models non-dependent on awq, everything works. |
|
I thought all nunchaku uses AWQ kernels and what was really missing was the MMQ workarounds in them. I never tried to run T5 with this though. Only full models like flux, etc. |
@Bluear7878 So is there any clever workaround, to make turning work with existing model files? I know almost nothing about the underlying science here but only thing I can think of is to quantize models again with custom logic for turning where all weights will be scaled down to fit in float16 range. Is that the only way forward? |
|
Sorry for late answer. I will take care of this issue. @sarim |

This is a PR that supports float16 for the v2 flux model.
Key change