Enable AWQ on Intel GPU. #2248

xiaowangintel · 2025-05-23T03:35:07Z

Following pytorch/pytorch#153019 requests, we enable awq-uint4 for Intel GPU in pytorch/ao after RTN ready.

How to run awq quantization model:

cd torchao/prototype/awq

python example.py --device xpu  huggingface-model(such as meta-llama/Llama-3.1-8B-Instruct) awq-uint4-128

#Results of meta-llama/Llama-3.1-8B-Instruct on Intel GPU:
{'perplexity': {'perplexity': 10.099576950073242, 'prediction_time': 0.20489671968780787}}

#Results of meta-llama/Llama-3.1-8B-Instruct on NVIDIA-A100 GPU:
Results: {'perplexity': {'perplexity': 10.160041809082031, 'prediction_time': 0.4466673863672577}}

pytorch-bot · 2025-05-23T03:35:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2248

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f60041c with merge base d963a88 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

xiaowangintel · 2025-05-23T06:17:57Z

@liangan1 Can you help to review this PR?

liangan1 · 2025-05-23T06:27:34Z

how about perplexity on cuda?

liangan1

LGTM

liangan1 · 2025-05-23T07:32:24Z

@EikanWang

jerryzh168 · 2025-05-29T00:18:30Z

torchao/dtypes/uintx/int4_xpu_layout.py

@@ -429,15 +428,14 @@ def get_plain(self) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
            # TODO: move this to `unpack_tinygemm_scales_and_zeros`?
            scale = scale.reshape(scale.shape[:-1]).contiguous()
            zero = zero.reshape(zero.shape[:-1]).contiguous()
-            int_data = quantize_affine(
+            int_data = quantize_affine_float_zero_point(


this is actually specific to fbgemm I think, we'd need to rename in a future PR cc @jainapurva

jerryzh168 · 2025-05-29T00:19:02Z

torchao/prototype/awq/api.py

+        if "xpu" in device.type:
+            _layout = Int4XPULayout()
+        else:
+            _layout = TensorCoreTiledLayout(inner_k_tiles=8)


can layout be explicitly passed in instead of inferred from device?

I think it should be OK. We should follow the Int4WeightOnlyConfig to let user to specify the layout information.

Yes, modified done.

jerryzh168 · 2025-05-29T00:19:18Z

torchao/prototype/awq/api.py

@@ -114,6 +116,7 @@ class AWQUIntXConfig(AOBaseConfig):
    group_size: int = 64
    use_hqq: bool = False
    set_inductor_config: bool = True
+    zero_point_domain: Optional[ZeroPointDomain] = ZeroPointDomain.FLOAT


can this be removed if we have layout?

Yes. I agree with you. Following the logic of #2149, preserve_zero and zero_point_domain is too complex to be used in the user UX. It is better way to use layout to decide the zero_point_domain information.

Yes, modified done.

jerryzh168 · 2025-05-29T00:22:15Z

torchao/prototype/awq/example.py

+from torchao.dtypes import Int4XPULayout
+
+
+zero_point_domain_dict = {"float":ZeroPointDomain.FLOAT, "int":ZeroPointDomain.INT, "none":ZeroPointDomain.NONE}


FYI, we used to use this for distinguish between different types of kernels, but now we are keeping the default path of integer zero point and preserve zero for the common path, and split out the other q/dq ops for specific kernels like tinygemm: #2149

I think it's just different ways to implement things and we not necessarily need to have these categorizations like zero_point_domain and preserve_zero since it might complicate the UX.

Yes, modified done.

jerryzh168

I feel we can use layout as a user facing interface

jerryzh168 · 2025-05-29T00:23:52Z

torchao/quantization/utils.py

@@ -473,6 +459,8 @@ def groupwise_affine_quantize_tensor_from_qparams(
            not (check_xpu_version(int_data.device))
        ):
            int_data = (int_data[::, ::2] << 4 | int_data[::, 1::2]).to(torch.uint8)
+        if check_xpu_version(int_data.device):


should probably encapsulate these better when we have a better design for layout conversions: #2249

liangan1 · 2025-05-29T08:03:49Z

torchao/prototype/awq/api.py

@@ -5,6 +5,7 @@
 # LICENSE file in the root directory of this source tree.
 import types
 from dataclasses import dataclass
+from typing import Any, Callable, Dict, Optional, Tuple, Union


Suggested change

from typing import Any, Callable, Dict, Optional, Tuple, Union

from typing import Optional

liangan1 · 2025-05-29T22:05:14Z

@pytorchbot label topic: new feature

pytorch-bot · 2025-05-29T22:05:16Z

Didn't find following labels among repository labels: topic:,new,feature

liangan1 · 2025-05-30T07:01:33Z

@pytorchbot merge

pytorchmergebot · 2025-05-30T07:02:15Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 23, 2025

liangan1 approved these changes May 23, 2025

View reviewed changes

xiaowangintel changed the title ~~[WIP]Enable AWQ on Intel GPU.~~ Enable AWQ on Intel GPU. May 23, 2025

jerryzh168 reviewed May 29, 2025

View reviewed changes

liangan1 reviewed May 29, 2025

View reviewed changes

xiaowangintel requested a review from jerryzh168 May 29, 2025 09:22

jerryzh168 approved these changes May 29, 2025

View reviewed changes

jerryzh168 added the topic: new feature Use this tag if this PR adds a new feature label May 29, 2025

xiaowangintel added 8 commits May 29, 2025 19:01

Enable AWQ on Intel GPU.

24fc0bd

Enable AWQ on Intel GPU.

8ac3b07

Enable AWQ on Intel GPU.

51c2ebb

Enable AWQ on Intel GPU.

cbd74d8

Enable AWQ on Intel GPU.

bf59893

Enable AWQ on Intel GPU.

cc0f29f

Enable AWQ on Intel GPU.

86a23d7

Enable AWQ on Intel GPU.

45e545e

xiaowangintel force-pushed the xw/int4-awq branch from 2617ab1 to 4c8f036 Compare May 30, 2025 02:06

Enable AWQ on Intel GPU.

4f11061

xiaowangintel force-pushed the xw/int4-awq branch from 4c8f036 to 4f11061 Compare May 30, 2025 02:22

liangan1 mentioned this pull request May 30, 2025

[RFC][API-Unstable]Enable A16W4 on XPU Device pytorch/pytorch#153019

Open

5 tasks

Enable AWQ on Intel GPU.

f60041c

pytorchmergebot added the merging label May 30, 2025

pytorchmergebot closed this in abb309a May 30, 2025

pytorchmergebot added Merged and removed merging labels May 30, 2025

		from torchao.dtypes import Int4XPULayout


		zero_point_domain_dict = {"float":ZeroPointDomain.FLOAT, "int":ZeroPointDomain.INT, "none":ZeroPointDomain.NONE}

	from typing import Any, Callable, Dict, Optional, Tuple, Union
	from typing import Optional

Enable AWQ on Intel GPU. #2248

Enable AWQ on Intel GPU. #2248

Conversation

xiaowangintel commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2248

✅ No Failures

Uh oh!

xiaowangintel commented May 23, 2025

Uh oh!

liangan1 commented May 23, 2025

Uh oh!

liangan1 left a comment

Choose a reason for hiding this comment

Uh oh!

liangan1 commented May 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liangan1 commented May 29, 2025

Uh oh!

pytorch-bot bot commented May 29, 2025

Uh oh!

liangan1 commented May 30, 2025

Uh oh!

pytorchmergebot commented May 30, 2025

Merge started

Uh oh!

Uh oh!

xiaowangintel commented May 23, 2025 •

edited

Loading

pytorch-bot bot commented May 23, 2025 •

edited

Loading