Skip to content

[MLAS Feature Request] MatMulNBits faster implementation for fp16 input dtype #27251

@qti-ashimaj

Description

@qti-ashimaj

Describe the feature request

Please add support for HQNBIT_CompInt8 computation path for MatMulNBits fp16 input datatype. Currently the performance of MatMulNBits is much slower for fp16 vs fp32 (~6x) on CPU.

Describe scenario use case

For FP16 input dtype, MatMulNBits computation is always falling back to HQNBIT_CompFp16 as there is no implementation present for HQNBIT_CompInt8 compute type.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestrequest for unsupported feature or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions