Skip to content

[Feature Request] W4A4 Quantization Support in torchao #1406

Open
@xxw11

Description

Dear team,

I would like to inquire about the possibility of W4A4 quantization support in torchao.

Torchao has proven to be an excellent quantization inference tool, particularly with its comprehensive support for W8A8. However, regarding 4-bit operations, I've only noticed W4A8 implementation (which currently utilizes INT8 GEMM operators under the hood). Given that many modern GPUs now support INT4 GEMM operators with promising results, I was wondering if there are any plans to implement W4A4 in torchao?

Thank you for your attention to this matter.

Best regards

Metadata

Assignees

No one assigned

    Labels

    topic: new featureUse this tag if this PR adds a new featuretopic: performanceUse this tag if this PR improves the performance of a feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions