Supporting As8Wu4 computation flow

## Note
> As8Wu4 : Activation with signed integer 8bit and Weight with unsigned integer 4bit configuration

## Todo
- [x] [ issued ] Support As8Wu4 for 128-group quantized Weight on CPU
- [x] Support As8Wu4 for per-dimension, and per-channel quantized data on CPU : #3485 
- [x] Optimize As8Wu4 for per-dimension, and per-channel quantized data on CPU with NEON kernel : #3497 
- [ ] Apply offline weight packing & Classify for kernel / GEMM-dim configuration & Apply multithreading : #3519
- [ ] Support As8Wu4 for 128-group quantized Weight on GPU with openCL kernel : #3480
- [ ] Optimize As8Wu4 for per-dimension, and per-channel quantized data on CPU with AVX2 kernel
- [ ] Support As8Wu4 GEMM for 64-group quantized data on CPU fallback
- [ ] Optimize As8Wu4 GEMM for 64-group quantized data on CPU  with AVX2 kernel
- [ ] Support As8Wu4 GEMM for 32-group quantized data on CPU fallback
- [ ] Optimize As8Wu4 GEMM for 32-group quantized data on CPU  with NEON kernel
- [ ] Support As8Wu4 for 64-group quantized Weight on CPU

## Issues
### 1. Supporting As8Wu4 for 128-group quantized Weight on CPU might be invalid
- Refered from, [MLAS](https://github.com/microsoft/MLAS) and [kleidiai](https://github.com/ARM-software/kleidiai), which are one of major CPU computational backends of openVINO, number **128** of group quantization is bound to SIMD register size for optimal performance. Thus there are no such kernel that supports **128** group-quantized-weight-matrix-using-GEMM on AVX2 or NEON. *Only AVX512 has it!*
- Although it won't be *impossible* to implement such, but the performance might be suboptimal.
- Problem is, then it is quite cumbersome to use 128 group quantized weight on CPU side, and forcibly using inconsistently-quantized weight might harm the LLM accuracy


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Supporting As8Wu4 computation flow #3489

Note

Todo

Issues

1. Supporting As8Wu4 for 128-group quantized Weight on CPU might be invalid

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Supporting As8Wu4 computation flow #3489

Description

Note

Todo

Issues

1. Supporting As8Wu4 for 128-group quantized Weight on CPU might be invalid

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions