[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False by Varshith-Yadav · Pull Request #3759 · openvinotoolkit/nncf

Varshith-Yadav · 2025-11-26T10:36:22Z

Changes

fixes #3494
Added full support for data-aware weight compression when MatMul nodes use transpose_b=False.
Updated and validated test_compression_with_transpose to ensure it passes for transpose_b=False.

Reason for changes

Previously, NNCF’s weight compression flow assumed that the weight input of MatMul operations was always transposed (transpose_b=True).

Related tickets

Tests

pytest tests/openvino/native/quantization/test_weights_compression.py -v
(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)

ljaljushkin · 2025-11-26T14:26:17Z

@Varshith-Yadav, thank you for the contribution!
Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

ljaljushkin · 2025-11-26T14:49:21Z

please also add unit tests.
at least you can copy-paste from #3725: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1955-R1979 and make sure exception is not raised for transpose_b=False

ljaljushkin · 2025-11-28T15:10:16Z

@Varshith-Yadav, thank you for the contribution! Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter?

Varshith-Yadav · 2025-11-28T20:09:32Z

@ljaljushkin
That makes sense. I agree that explicitly transposing the full weight tensor could introduce unnecessary overhead

I will update the implementation to use a slice_weight helper method . This way, we can fetch the necessary channels dynamically based on the transpose_b parameter without physically reshaping the underlying tensor.

I'll proceed with this approach and update the PR shortly.

Varshith-Yadav · 2025-12-01T20:50:55Z

@ljaljushkin
I've updated the implementation as requested. I added a slice_weight helper in utils.py to handle the data access without performing a full transpose, and refactored the GPTQ logic to use it.

I also added a new test file test_utils_slice_weight.py to verify the helper works correctly for both Numpy and PyTorch tensors with different transpose_b settings.

ljaljushkin · 2025-12-02T10:33:14Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

+    assign_weight_column,
+    assign_weight_slice,
+    extract_weight_column,
+    slice_weight,
+    zero_mask_columns,


I believe you need just 2 methods

def get_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], is_transposed: bool) -> Tensor: return weight[:, slice_obj] if is_transposed else weight[slice_obj, :]

def set_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], value: Tensor, is_transposed: bool) -> None: if is_transposed: weight[:, slice_obj] = value else: weight[slice_obj, :] = value

ljaljushkin · 2025-12-02T10:39:28Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

        weight_tensor = fns.astype(weight_tensor, TensorDataType.float32)
+
+        # Get transpose_b value to handle weight shape correctly
+        transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id]["transpose"]


the same issue should be in other data-aware algorithms: awq, lora_correction, scale_estimation
Support copy-pasting a test for transpose_b=False + all these methods and check whether it fails: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1960-R1961

ljaljushkin · 2025-12-02T10:49:54Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

 from nncf.quantization.algorithms.weight_compression.config import WeightCompressionParameters
 from nncf.quantization.algorithms.weight_compression.parameters import CompressedWeight
 from nncf.quantization.algorithms.weight_compression.scale_estimation import ScaleEstimation
+from nncf.quantization.algorithms.weight_compression.utils import (


utils name violates the code style: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#474-file-naming
Possible name: tensor_slicing.py

ljaljushkin · 2025-12-02T10:53:28Z

also recommend configuring automatic code formating: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#2-automating-code-formatting

Varshith-Yadav · 2025-12-09T19:12:45Z

@ljaljushkin Thanks for the detailed feedback! I have updated the PR with the following changes:

Refactored tensor_slicing: I implemented the simplified get_weight_slice and set_weight_slice helpers exactly as suggested (using generic slicing) in src/nncf/quantization/algorithms/weight_compression/tensor_slicing.py.

Algorithm Support: I updated GPTQ, AWQ, Scale Estimation, and LoRA Correction to use these new helpers. They now correctly identify the reduction axis based on transpose_b to handle non-transposed weights.

Testing:

Added test_compress_weights_algorithms_transpose_b_false which successfully verifies that all 4 algorithms work on a model with transpose_b=False without crashing.

Added a new test file tests/openvino/native/test_weight_compression_utils.py to unit-test the helpers with both Numpy and PyTorch tensors.

Formatting: Ran pre-commit to apply the automatic Ruff formatting.

Ready for review!

ljaljushkin · 2025-12-10T12:04:43Z

Thanks @Varshith-Yadav!
@daniil-lyakhov, could you please help with the review?

daniil-lyakhov

@Varshith-Yadav, thank you for your contribution!
My initial comments are below. Besides that, I believe we have to expand tests for each compression algorithm with the transpose_b option for each backend. I'm working on a similar issue right now (support of transpose_a), and when I'll finish my tests I'll share them with you so you can do the same in your PR.

Thank you!

daniil-lyakhov · 2025-12-11T15:48:24Z

src/nncf/quantization/algorithms/weight_compression/tensor_slicing.py

I'm not sure we really need to make a separate function for that. An example on how to do slicing using the build in slice: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-cefaf6a4a2cb473c23106efa01889f05dc899e43c0dfc74ef8e8d60830e8a467R276-R281

daniil-lyakhov · 2025-12-11T15:52:08Z

src/nncf/quantization/algorithms/weight_compression/lora_correction.py

+        transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][
+            "transpose"
+        ]


Looks like an openvino/onnx specific code, please introduce a backend method to get this value from the model

daniil-lyakhov · 2025-12-11T15:53:00Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

+        transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id][
+            "transpose"
+        ]


Openvino/ONNX specific code as well

daniil-lyakhov · 2025-12-11T15:53:49Z

src/nncf/quantization/algorithms/weight_compression/scale_estimation.py

+            # Get transpose_b value to handle weight shape correctly
+            transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"]
+


Openvino/ONNX specific code in a common algorithm

daniil-lyakhov · 2025-12-11T15:55:07Z

src/nncf/quantization/algorithms/weight_compression/awq.py

+        # Get transpose_b value to handle weight shape correctly
+        transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"]
+


Onnx/Openvino specific code in a common algorithm

Varshith-Yadav · 2025-12-17T11:58:12Z

@daniil-lyakhov Thanks for the review!

I have removed the specific OpenVINO attribute checks from the common algorithms. I introduced get_weight_transpose_b in the Backend interface and implemented it in all backends (openvino_backend.py, onnx_backend.py, torch_backend.py, torch_fx_backend.py).

I removed the custom utils.py and switched to using standard Python slice() objects and inline if/else checks as suggested.

I've kept my current integration test (test_compress_weights_algorithms_transpose_b_false in test_utils_slice_weight.py) included for now to verify the fix works. I am happy to update/replace this with your standardized test pattern once you share it.

daniil-lyakhov · 2026-02-13T16:20:36Z

@Varshith-Yadav, please rebase

…_b=False

…lgorithms

Varshith-Yadav requested a review from a team as a code owner November 26, 2025 10:36

github-actions bot added the NNCF OpenVINO Pull requests that updates NNCF OpenVINO label Dec 1, 2025

ljaljushkin requested changes Dec 2, 2025

View reviewed changes

Varshith-Yadav requested a review from ljaljushkin December 9, 2025 19:12

github-actions bot added the API Public API-impacting changes label Dec 10, 2025

daniil-lyakhov self-requested a review December 11, 2025 15:10

daniil-lyakhov reviewed Dec 11, 2025

View reviewed changes

Varshith-Yadav requested a review from daniil-lyakhov December 17, 2025 11:58

Varshith-Yadav added 4 commits February 14, 2026 22:15

[NNCF] Enable data-aware weight compression for MatMul with transpose…

9562cea

…_b=False

Refactor: Use slice_weight helper instead of full transpose

5cdc9fb

Refactor weight compression to support transpose_b=False across all a…

84c70c4

…lgorithms

Refactor: Move transpose check to backend and use standard slicing

70176cf

Varshith-Yadav force-pushed the nncf/weight-compression-transposeb-fix branch from 17c9f54 to 70176cf Compare February 14, 2026 16:49

		# Get transpose_b value to handle weight shape correctly
		transpose_b = wp.node_with_weight.layer_attributes.constant_attributes[weight_port_id]["transpose"]

Conversation

Varshith-Yadav commented Nov 26, 2025

Changes

Reason for changes

Related tickets

Tests

Uh oh!

ljaljushkin commented Nov 26, 2025

Uh oh!

ljaljushkin commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ljaljushkin commented Nov 28, 2025

Uh oh!

Varshith-Yadav commented Nov 28, 2025

Uh oh!

Varshith-Yadav commented Dec 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ljaljushkin commented Dec 2, 2025

Uh oh!

Varshith-Yadav commented Dec 9, 2025

Uh oh!

ljaljushkin commented Dec 10, 2025

Uh oh!

daniil-lyakhov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Varshith-Yadav commented Dec 17, 2025

Uh oh!

daniil-lyakhov commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ljaljushkin commented Nov 26, 2025 •

edited

Loading

daniil-lyakhov left a comment •

edited

Loading