Skip to content

Conversation

@Varshith-Yadav
Copy link

Changes

fixes #3494
Added full support for data-aware weight compression when MatMul nodes use transpose_b=False.
Updated and validated test_compression_with_transpose to ensure it passes for transpose_b=False.

Reason for changes

Previously, NNCF’s weight compression flow assumed that the weight input of MatMul operations was always transposed (transpose_b=True).

Related tickets

Tests

pytest tests/openvino/native/quantization/test_weights_compression.py -v
(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)

@Varshith-Yadav Varshith-Yadav requested a review from a team as a code owner November 26, 2025 10:36
@ljaljushkin
Copy link
Contributor

@Varshith-Yadav, thank you for the contribution!
Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

@ljaljushkin
Copy link
Contributor

ljaljushkin commented Nov 26, 2025

please also add unit tests.
at least you can copy-paste from #3725: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1955-R1979 and make sure exception is not raised for transpose_b=False

@ljaljushkin
Copy link
Contributor

@Varshith-Yadav, thank you for the contribution! Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter?

@Varshith-Yadav
Copy link
Author

@ljaljushkin
That makes sense. I agree that explicitly transposing the full weight tensor could introduce unnecessary overhead

I will update the implementation to use a slice_weight helper method . This way, we can fetch the necessary channels dynamically based on the transpose_b parameter without physically reshaping the underlying tensor.

I'll proceed with this approach and update the PR shortly.

@github-actions github-actions bot added the NNCF OpenVINO Pull requests that updates NNCF OpenVINO label Dec 1, 2025
@Varshith-Yadav
Copy link
Author

@ljaljushkin
I've updated the implementation as requested. I added a slice_weight helper in utils.py to handle the data access without performing a full transpose, and refactored the GPTQ logic to use it.

I also added a new test file test_utils_slice_weight.py to verify the helper works correctly for both Numpy and PyTorch tensors with different transpose_b settings.

Comment on lines 31 to 35
assign_weight_column,
assign_weight_slice,
extract_weight_column,
slice_weight,
zero_mask_columns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need just 2 methods

def get_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], is_transposed: bool) -> Tensor:
     return weight[:, slice_obj] if is_transposed else weight[slice_obj, :]
def set_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], value: Tensor, is_transposed: bool) -> None:
    if is_transposed:
        weight[:, slice_obj] = value
    else:
        weight[slice_obj, :] = value

weight_tensor = fns.astype(weight_tensor, TensorDataType.float32)

# Get transpose_b value to handle weight shape correctly
transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id]["transpose"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same issue should be in other data-aware algorithms: awq, lora_correction, scale_estimation
Support copy-pasting a test for transpose_b=False + all these methods and check whether it fails: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1960-R1961

from nncf.quantization.algorithms.weight_compression.config import WeightCompressionParameters
from nncf.quantization.algorithms.weight_compression.parameters import CompressedWeight
from nncf.quantization.algorithms.weight_compression.scale_estimation import ScaleEstimation
from nncf.quantization.algorithms.weight_compression.utils import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utils name violates the code style: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#474-file-naming
Possible name: tensor_slicing.py

@ljaljushkin
Copy link
Contributor

@Varshith-Yadav
Copy link
Author

@ljaljushkin Thanks for the detailed feedback! I have updated the PR with the following changes:

Refactored tensor_slicing: I implemented the simplified get_weight_slice and set_weight_slice helpers exactly as suggested (using generic slicing) in src/nncf/quantization/algorithms/weight_compression/tensor_slicing.py.

Algorithm Support: I updated GPTQ, AWQ, Scale Estimation, and LoRA Correction to use these new helpers. They now correctly identify the reduction axis based on transpose_b to handle non-transposed weights.

Testing:

Added test_compress_weights_algorithms_transpose_b_false which successfully verifies that all 4 algorithms work on a model with transpose_b=False without crashing.

Added a new test file tests/openvino/native/test_weight_compression_utils.py to unit-test the helpers with both Numpy and PyTorch tensors.

Formatting: Ran pre-commit to apply the automatic Ruff formatting.

Ready for review!

@ljaljushkin
Copy link
Contributor

Thanks @Varshith-Yadav!
@daniil-lyakhov, could you please help with the review?

@github-actions github-actions bot added the API Public API-impacting changes label Dec 10, 2025
@daniil-lyakhov daniil-lyakhov self-requested a review December 11, 2025 15:10
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Varshith-Yadav, thank you for your contribution!
My initial comments are below. Besides that, I believe we have to expand tests for each compression algorithm with the transpose_b option for each backend. I'm working on a similar issue right now (support of transpose_a), and when I'll finish my tests I'll share them with you so you can do the same in your PR.

Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we really need to make a separate function for that. An example on how to do slicing using the build in slice: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-cefaf6a4a2cb473c23106efa01889f05dc899e43c0dfc74ef8e8d60830e8a467R276-R281

Comment on lines 126 to 128
transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][
"transpose"
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an openvino/onnx specific code, please introduce a backend method to get this value from the model

Comment on lines 224 to 226
transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][
"transpose"
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Openvino/ONNX specific code as well

Comment on lines 144 to 146
# Get transpose_b value to handle weight shape correctly
transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Openvino/ONNX specific code in a common algorithm

Comment on lines 225 to 227
# Get transpose_b value to handle weight shape correctly
transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Onnx/Openvino specific code in a common algorithm

@Varshith-Yadav
Copy link
Author

@daniil-lyakhov Thanks for the review!

I have removed the specific OpenVINO attribute checks from the common algorithms. I introduced get_weight_transpose_b in the Backend interface and implemented it in all backends (openvino_backend.py, onnx_backend.py, torch_backend.py, torch_fx_backend.py).

I removed the custom utils.py and switched to using standard Python slice() objects and inline if/else checks as suggested.

I've kept my current integration test (test_compress_weights_algorithms_transpose_b_false in test_utils_slice_weight.py) included for now to verify the fix works. I am happy to update/replace this with your standardized test pattern once you share it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Public API-impacting changes NNCF OpenVINO Pull requests that updates NNCF OpenVINO

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Good First Issue][NNCF]: Support not transposed weight for data-aware weight compression methods

3 participants