Open
Description
Context
Matmul operation in OpenVINO assumes an implicit shape alignment for input arguments. It applies transpositions specified by optional transpose_a
and transpose_b
attributes: OV spec.
Currently, weight compression in NNCF does not support transpose_a
=True.
Here's the check and test.
Potentially, it affects Mixed-Precision, AWQ, Scale Estimation and Lora Correction algorithms.
What needs to be done?
The task is to enable data-aware weight compression methods (Mixed-Precision, AWQ, Scale Estimation, Lora Correction) for models with transposed input matrix multiplications.
- At least one function
process_stats
should be corrected, check - removed. - test should pass and be a templated one in order to check OV and Torch backend at once.
- Tests that used
LMLinearModel
withtranspose_a=False
by default should pass withtranspose_a=True
.
Example Pull Requests
Resources
- Contribution guide - start here!
- Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
- How to link your Pull Request to an issue
Contact points
Ticket
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
In Review