[TorchAO] Int4XPULayout Lacks MoE Quant Support

### 🚀 The feature, motivation and pitch

Currently, the int4_xpu_layout assumes that the `int_data` dim is 2, but in MoE case, the `int_data` dim should be 3. So when calling `_convert_weight_to_int4pack`, it will result into an assert error:

```Bash
_convert_weight_to_int4pack_xpu : expect weight to be 2D tensor.
```

## Solution
Currently, the Int4XPULayout has the code like below:

See reference at:
https://github.com/pytorch/ao/blob/1493b15f65917477ce37abb94365b262cc3b1d95/torchao/dtypes/uintx/int4_xpu_layout.py#L255-L260

We need the logic like CUDA and implement the following:

```Python
def quant_2d(int_data_2d):
    ...
if int_data.dim()==3: # moe case
...
else: # normal case
...
```

See CUDA Code at:
https://github.com/pytorch/ao/blob/418593c0e903f2b76072cc75a3010b3ef5396a20/torchao/dtypes/uintx/tensor_core_tiled_layout.py#L289


### Affected Test Cases

Affected in total 4 Test cases:

```Bash
cd ao/test/quantization
python -m pytest -sv -k test_int4wo test_moe_quant.py

FAILED test_moe_quant.py::TestMoEQuantCompile::test_int4wo_base_0_single_token
FAILED test_moe_quant.py::TestMoEQuantCompile::test_int4wo_base_1_multiple_tokens
FAILED test_moe_quant.py::TestMoEQuantCompile::test_int4wo_fake_dim_0_single_token
FAILED test_moe_quant.py::TestMoEQuantCompile::test_int4wo_fake_dim_1_multiple_tokens
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TorchAO] Int4XPULayout Lacks MoE Quant Support #1913

🚀 The feature, motivation and pitch

Solution

Affected Test Cases

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[TorchAO] Int4XPULayout Lacks MoE Quant Support #1913

Description

🚀 The feature, motivation and pitch

Solution

Affected Test Cases

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions