[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False#3759
Conversation
|
@Varshith-Yadav, thank you for the contribution! |
|
please also add unit tests. |
I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter? |
|
@ljaljushkin I will update the implementation to use a I'll proceed with this approach and update the PR shortly. |
|
@ljaljushkin I also added a new test file test_utils_slice_weight.py to verify the helper works correctly for both Numpy and PyTorch tensors with different transpose_b settings. |
| assign_weight_column, | ||
| assign_weight_slice, | ||
| extract_weight_column, | ||
| slice_weight, | ||
| zero_mask_columns, |
There was a problem hiding this comment.
I believe you need just 2 methods
def get_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], is_transposed: bool) -> Tensor:
return weight[:, slice_obj] if is_transposed else weight[slice_obj, :]def set_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], value: Tensor, is_transposed: bool) -> None:
if is_transposed:
weight[:, slice_obj] = value
else:
weight[slice_obj, :] = value| weight_tensor = fns.astype(weight_tensor, TensorDataType.float32) | ||
|
|
||
| # Get transpose_b value to handle weight shape correctly | ||
| transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id]["transpose"] |
There was a problem hiding this comment.
the same issue should be in other data-aware algorithms: awq, lora_correction, scale_estimation
Support copy-pasting a test for transpose_b=False + all these methods and check whether it fails: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1960-R1961
| from nncf.quantization.algorithms.weight_compression.config import WeightCompressionParameters | ||
| from nncf.quantization.algorithms.weight_compression.parameters import CompressedWeight | ||
| from nncf.quantization.algorithms.weight_compression.scale_estimation import ScaleEstimation | ||
| from nncf.quantization.algorithms.weight_compression.utils import ( |
There was a problem hiding this comment.
utils name violates the code style: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#474-file-naming
Possible name: tensor_slicing.py
|
also recommend configuring automatic code formating: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#2-automating-code-formatting |
|
@ljaljushkin Thanks for the detailed feedback! I have updated the PR with the following changes: Refactored tensor_slicing: I implemented the simplified get_weight_slice and set_weight_slice helpers exactly as suggested (using generic slicing) in src/nncf/quantization/algorithms/weight_compression/tensor_slicing.py. Algorithm Support: I updated GPTQ, AWQ, Scale Estimation, and LoRA Correction to use these new helpers. They now correctly identify the reduction axis based on transpose_b to handle non-transposed weights. Testing: Added test_compress_weights_algorithms_transpose_b_false which successfully verifies that all 4 algorithms work on a model with transpose_b=False without crashing. Added a new test file tests/openvino/native/test_weight_compression_utils.py to unit-test the helpers with both Numpy and PyTorch tensors. Formatting: Ran pre-commit to apply the automatic Ruff formatting. Ready for review! |
|
Thanks @Varshith-Yadav! |
There was a problem hiding this comment.
@Varshith-Yadav, thank you for your contribution!
My initial comments are below. Besides that, I believe we have to expand tests for each compression algorithm with the transpose_b option for each backend. I'm working on a similar issue right now (support of transpose_a), and when I'll finish my tests I'll share them with you so you can do the same in your PR.
Thank you!
There was a problem hiding this comment.
I'm not sure we really need to make a separate function for that. An example on how to do slicing using the build in slice: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-cefaf6a4a2cb473c23106efa01889f05dc899e43c0dfc74ef8e8d60830e8a467R276-R281
| transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][ | ||
| "transpose" | ||
| ] |
There was a problem hiding this comment.
Looks like an openvino/onnx specific code, please introduce a backend method to get this value from the model
| transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][ | ||
| "transpose" | ||
| ] |
There was a problem hiding this comment.
Openvino/ONNX specific code as well
| # Get transpose_b value to handle weight shape correctly | ||
| transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"] | ||
|
|
There was a problem hiding this comment.
Openvino/ONNX specific code in a common algorithm
| # Get transpose_b value to handle weight shape correctly | ||
| transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"] | ||
|
|
There was a problem hiding this comment.
Onnx/Openvino specific code in a common algorithm
|
@daniil-lyakhov Thanks for the review! I have removed the specific OpenVINO attribute checks from the common algorithms. I introduced I removed the custom utils.py and switched to using standard Python slice() objects and inline if/else checks as suggested. I've kept my current integration test ( |
|
@Varshith-Yadav, please rebase |
17c9f54 to
70176cf
Compare
Changes
fixes #3494
Added full support for data-aware weight compression when
MatMulnodes usetranspose_b=False.Updated and validated
test_compression_with_transposeto ensure it passes fortranspose_b=False.Reason for changes
Previously, NNCF’s weight compression flow assumed that the weight input of
MatMuloperations was always transposed (transpose_b=True).Related tickets
Tests
pytest tests/openvino/native/quantization/test_weights_compression.py -v(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)