[CPU] Fix registration of int4wo linear implementation on CPU #1578

Xia-Weiwen · 2025-01-17T09:00:03Z

Summary
Int4wo on CPU does not run into expected mm op (torch.ops.aten._weight_int4pack_mm_for_cpu). It seems to be a regression after some refactoring of related code. This PR fixes it by registering a linear impl for the Int4CPULayout, which calls torch.ops.aten._weight_int4pack_mm_for_cpu for computation. The new impl is enabled for torch>=2.6. The new impl does not require dtype to be bfloat16. It supports fp32, fp16, bf16 for both weight and activation.

Test plan

python test/quantization/test_quant_api.py -k test_int4wo_cpu

pytorch-bot · 2025-01-17T09:00:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1578

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6783a22 with merge base de5c6e1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

leslie-fang-intel · 2025-01-18T11:17:07Z

test/quantization/test_quant_api.py

+        if x_dim == 3:
+            example_inputs = (example_inputs[0].unsqueeze(0),)
+
+        with torch.no_grad(), torch.autocast(


Since the model and inputs have already been converted to target data type, why we still need to enable autocast here?

leslie-fang-intel · 2025-01-18T11:27:43Z

torchao/dtypes/uintx/tensor_core_tiled_layout.py

-        y = torch.ops.aten._weight_int4pack_mm(
-            act_mat.contiguous(), packed_weight, groupsize, scale_and_zero
-        )
+    y = torch.ops.aten._weight_int4pack_mm(


why we change _weight_int4pack_mm_for_cpu to _weight_int4pack_mm? I remember _weight_int4pack_mm didn't register for CPU.

In this PR, @Xia-Weiwen moved the CPU implementation to torchao/dtypes/uintx/int4_cpu_layout.py, where support for more activation dtypes is also being added.

This code now exclusively pertains to CUDA.

sanchitintel · 2025-01-18T20:47:42Z

Int4wo on CPU does not run into expected mm op (torch.ops.aten._weight_int4pack_mm_for_cpu)

The FX IR pattern you shared with me after running a toy model did have a call to torch.ops.aten._weight_int4pack_mm_for_cpu, but the whole pattern corresponding to it was very weird (even had a aten.mm call at the end), so something is indeed broken.
If we had run an LLM with torchchat with int4 WoQ & simply searched for torch.ops.aten._weight_int4pack_mm_for_cpu in the FX graph of the model, we may probably not have discovered this issue.

Is it possible to add a UT for a small model that uses torch.compile, and also somehow checks if the pattern is as expected?
Perhaps, the corresponding UT in test/integration/test_integration.py could be modified.

Thanks!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 17, 2025

[CPU] Fix registration of int4wo linear implementation on CPU

708d770

Xia-Weiwen force-pushed the weiwen/fix_woq_int4 branch from 30570e4 to 708d770 Compare January 17, 2025 09:02

Xia-Weiwen added topic: not user facing Use this tag if you don't want this PR to show up in release notes topic: bug fix Use this tag for PRs that fix bugs labels Jan 17, 2025

Xia-Weiwen added 4 commits January 17, 2025 01:16

Fix format issues

579002a

Fix format issues (2)

f3e3d4e

Fix bug for 3d input

0777db6

fix format issue

6783a22

Xia-Weiwen requested a review from leslie-fang-intel January 18, 2025 09:43

leslie-fang-intel reviewed Jan 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Fix registration of int4wo linear implementation on CPU #1578

[CPU] Fix registration of int4wo linear implementation on CPU #1578

Xia-Weiwen commented Jan 17, 2025 •

edited

Loading

pytorch-bot bot commented Jan 17, 2025 •

edited

Loading

leslie-fang-intel Jan 18, 2025

leslie-fang-intel Jan 18, 2025

sanchitintel Jan 18, 2025 •

edited

Loading

sanchitintel commented Jan 18, 2025 •

edited

Loading

[CPU] Fix registration of int4wo linear implementation on CPU #1578

Are you sure you want to change the base?

[CPU] Fix registration of int4wo linear implementation on CPU #1578

Conversation

Xia-Weiwen commented Jan 17, 2025 • edited Loading

pytorch-bot bot commented Jan 17, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1578

✅ No Failures

leslie-fang-intel Jan 18, 2025

Choose a reason for hiding this comment

leslie-fang-intel Jan 18, 2025

Choose a reason for hiding this comment

sanchitintel Jan 18, 2025 • edited Loading

Choose a reason for hiding this comment

sanchitintel commented Jan 18, 2025 • edited Loading

Xia-Weiwen commented Jan 17, 2025 •

edited

Loading

pytorch-bot bot commented Jan 17, 2025 •

edited

Loading

sanchitintel Jan 18, 2025 •

edited

Loading

sanchitintel commented Jan 18, 2025 •

edited

Loading