[Feature Request] W4A4 Quantization Support in torchao #1406
Open
Description
Dear team,
I would like to inquire about the possibility of W4A4 quantization support in torchao.
Torchao has proven to be an excellent quantization inference tool, particularly with its comprehensive support for W8A8. However, regarding 4-bit operations, I've only noticed W4A8 implementation (which currently utilizes INT8 GEMM operators under the hood). Given that many modern GPUs now support INT4 GEMM operators with promising results, I was wondering if there are any plans to implement W4A4 in torchao?
Thank you for your attention to this matter.
Best regards