[WC] Scale Estimation transpose_a support by daniil-lyakhov · Pull Request #3839 · openvinotoolkit/nncf

daniil-lyakhov · 2026-01-13T18:37:19Z

Changes

Support of conv transpose_a attribute for ONNX/OpenVINO backends in the Scale Estimation algorithm
The static method activations_to_wc_statistics moved from the Scale Estimation algorithm to the GPTQ

Reason for changes

To enable Scale Estimation for the models with SMM (pattern has transpose_a in ONNX/OpenVINO)

Related tickets

Tests

tests/cross_fw/test_templates/template_test_weights_compression.py::test_scale_estimation updated with the transpose_a cases

AlexanderDokuchaev · 2026-01-15T09:10:11Z

src/nncf/quantization/algorithms/weight_compression/scale_estimation.py

+            act_ch_axis = self._backend_entity.get_activation_channel_axis(
+                wp.node_with_weight, activation_port_id, act_shape
+            )
+            act_ch_axis %= len(act_shape)


For which case it's need?
All tests still pass if remove the line.

AlexanderDokuchaev · 2026-01-15T09:28:19Z

src/nncf/quantization/algorithms/weight_compression/scale_estimation.py

-
            weight = self._backend_entity.get_weight(wp.node_with_weight, weight_port_id, model, graph)

+            activation_port_id = self._backend_entity.get_activation_port_id(wp.node_with_weight, graph)


This looks like copypast from awq.py.
Please think about to refactor it into a shared function.

WeightCompressionAlgoBackend.get_activation_channel_axis_and_shape is instroduced, please check

tests/openvino/native/models.py

AlexanderDokuchaev · 2026-01-15T09:46:49Z

tests/openvino/native/models.py


-        matmul = opset.matmul(input_1, weight_data, transpose_a=False, transpose_b=False, name="MoE_MatMul")
+        if tranpsose_a:
+            transpose = opset.transpose(input_1, (0, 2, 1))


Please check, looks like it never runs

Good catch,
It is, because of this skip https://github.com/daniil-lyakhov/nncf/blob/dl/sa_transpose_a/tests/openvino/native/quantization/test_weights_compression.py#L2369

I fixed the test and asked @anzr299 to remove the skip. Ticket 179366

daniil-lyakhov · 2026-01-26T15:40:06Z

@AlexanderDokuchaev, @andreyanufr, please take a look

Copilot

Pull request overview

This PR adds support for the transpose_a attribute in the Scale Estimation algorithm for ONNX and OpenVINO backends, enabling Scale Estimation for models with transposed activations (SMM patterns). Additionally, the activations_to_wc_statistics method is moved from Scale Estimation to GPTQ where it is actually used.

Changes:

Added transpose_a parameter support throughout the Scale Estimation test infrastructure and model builders
Moved activations_to_wc_statistics static method from ScaleEstimation to GPTQ algorithm
Removed the check that previously blocked Scale Estimation for transposed activations

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/torch/fx/test_weights_compression.py	Added `transpose_a` parameter to test model factory methods
tests/torch/function_hook/quantization/test_weights_compression.py	Added `transpose_a` parameter to test model factory methods
tests/openvino/native/quantization/test_weights_compression.py	Added `transpose_a` parameter and test parametrization for Scale Estimation tests
tests/openvino/native/models.py	Updated model builders to support `transpose_a` with actual transpose operations
tests/onnx/quantization/test_weights_compression.py	Added `transpose_a` support with GEMM-based implementation for ONNX
tests/onnx/common.py	Added `add_squeeze` helper method to ModelBuilder
tests/cross_fw/test_templates/template_test_weights_compression.py	Updated test template to parametrize `transpose_a` and removed scale_estimation from transpose skip test
src/nncf/quantization/algorithms/weight_compression/scale_estimation.py	Removed transpose check, added `act_ch_axis` parameter, removed `activations_to_wc_statistics` method
src/nncf/quantization/algorithms/weight_compression/gptq.py	Added `activations_to_wc_statistics` static method moved from ScaleEstimation
src/nncf/quantization/algorithms/weight_compression/backend.py	Added `get_activation_channel_axis_and_shape` helper method
src/nncf/quantization/algorithms/weight_compression/awq.py	Refactored to use new backend helper method instead of private method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/torch/fx/test_weights_compression.py

tests/torch/function_hook/quantization/test_weights_compression.py

tests/onnx/quantization/test_weights_compression.py

src/nncf/quantization/algorithms/weight_compression/backend.py

AlexanderDokuchaev · 2026-02-17T13:58:53Z

src/nncf/quantization/algorithms/weight_compression/backend.py

+        activation_port_id = self.get_activation_port_id(node, graph)
+        act_shape = graph.get_input_edge_by_port_id(node, activation_port_id).tensor_shape
+        act_ch_axis = self.get_activation_channel_axis(node, activation_port_id, act_shape)
+        # Mod the activation axis by the length of the activation shape


https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#424-block-and-inline-comments

Comment that just describe build-in python operation is useless here
Please explain why this is needed." (in which case get_activation_channel_axis returns positive or negative axis and why required here to to use only positive)

And may be it it's problem of backend specific get_activation_channel_axis implementation, and will be better to fix it.

PS. For PT it's actual too, not only for ONNX.

Why Normalization is Required:
Downstream code in AWQ and activation stats processing compares act_ch_axis with positive dimension indices. For example in awq.py line 227:

Why Not Fix Backends Instead:
While it would be cleaner to standardize backend implementations, the current approach is actually correct because:

The abstract method get_activation_channel_axis allows backends flexibility to use their natural conventions

The normalization happens at a single point (get_activation_channel_axis_and_shape) that explicitly documents it returns positive axes

This avoids changing multiple backend implementations and maintains backward compatibility

tests/cross_fw/test_templates/template_test_weights_compression.py

AlexanderDokuchaev · 2026-02-17T16:06:17Z

tests/onnx/quantization/test_weights_compression.py

+        if transpose_a:
+            squeeze = mb.add_squeeze(x)
+            transpose = mb.add_transpose(squeeze, (1, 0))
+            mb.add_gemm(transpose, shape=(8, 16), output=output, weight_data=weights, trans_a=1)


Why for OV used transpose+matmul but for onnx squeeze + transpose + gemm?
Could it be defined by same operations?

Gemm operation does not support batch dimentions (in contrast to OpenVINO backend), to align them one have to rewrite all the tests to eliminate the batch dimentions for all backends.

I can do so but I don't see any value in that and it a lot of effort work

AlexanderDokuchaev · 2026-02-17T16:08:03Z

tests/torch/function_hook/quantization/test_weights_compression.py


    @staticmethod
-    def get_model_for_test_scale_estimation():
+    def get_model_for_test_scale_estimation(transpose_a: bool):


Please add assert transpose_a=False to avoid any confusion from signature of method and result

And looks like using pytest.skip inside model creation function instead of using extra transpose_a_supported fixture, is shorter and easier too understand and support

class TestPTTemplateWeightCompression(TemplateWeightCompression): @staticmethod def get_model_for_test_scale_estimation(transpose_a: bool): if transpose_a: pytest.skip("transpose_a=True is not supported for PT backend") return LinearModel(torch.arange(0, 8 * 16, dtype=torch.float32).reshape(16, 8))

instead of

class TemplateWeightCompression(ABC): @abstractmethod @pytest.fixture def transpose_a_supported(self) -> bool: @pytest.mark.parametrize("transpose_a", [False, True]) @pytest.mark.parametrize("is_moe", [False, True]) @pytest.mark.parametrize("check_sampling_activation_stats_flow", [False, True]) def test_scale_estimation( self, mocker, transpose_a, is_moe, check_sampling_activation_stats_flow, transpose_a_supported ): """Checks that scales match the reference.""" if transpose_a and not transpose_a_supported: msg = "Transpose a is not supported for the current backend" pytest.skip(msg) class TestPTTemplateWeightCompression(TemplateWeightCompression): @pytest.fixture def transpose_a_supported(self) -> bool: return False @staticmethod def get_model_for_test_scale_estimation(transpose_a: bool): return LinearModel(torch.arange(0, 8 * 16, dtype=torch.float32).reshape(16, 8))

AlexanderDokuchaev · 2026-02-17T17:07:50Z

src/nncf/quantization/algorithms/weight_compression/scale_estimation.py

        reduction_axis = reduction_axes[0]

-        s, X = process_stats(statistics, subset_size)
+        s, X = process_stats(statistics, subset_size, act_ch_axis=act_ch_axis)


As far as I understand, this is the main change in this PR.
But when I revert it, all tests still pass.

src/nncf/quantization/algorithms/weight_compression/scale_estimation.py

Co-authored-by: Alexander Dokuchaev <alexander.dokuchaev@intel.com>

github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF ONNX Pull requests that updates NNCF ONNX labels Jan 13, 2026

daniil-lyakhov requested a review from AlexanderDokuchaev January 13, 2026 18:45

daniil-lyakhov assigned andreyanufr Jan 13, 2026

daniil-lyakhov marked this pull request as ready for review January 13, 2026 18:45

daniil-lyakhov requested a review from a team as a code owner January 13, 2026 18:45

daniil-lyakhov requested a review from andreyanufr January 14, 2026 10:16

AlexanderDokuchaev requested changes Jan 15, 2026

View reviewed changes

daniil-lyakhov requested a review from AlexanderDokuchaev January 15, 2026 13:08

This was referenced Jan 26, 2026

[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods #3230

Open

Support transposed input for AWQ, Scale Estimation, and LoRA Correction in OpenVINO #3845

Closed

daniil-lyakhov force-pushed the dl/sa_transpose_a branch from 1f70ea1 to 35d0cc2 Compare January 26, 2026 15:01

daniil-lyakhov added the Code Freeze label Jan 26, 2026

MaximProshin removed the Code Freeze label Jan 27, 2026

AlexanderDokuchaev requested a review from Copilot February 5, 2026 09:19

Copilot AI reviewed Feb 5, 2026

View reviewed changes

daniil-lyakhov mentioned this pull request Feb 17, 2026

Support transposed input for data-aware Weights Compression #3296

Closed

andreyanufr reviewed Feb 17, 2026

View reviewed changes

src/nncf/quantization/algorithms/weight_compression/backend.py Show resolved Hide resolved

daniil-lyakhov requested a review from andreyanufr February 17, 2026 12:07

andreyanufr approved these changes Feb 17, 2026

View reviewed changes

AlexanderDokuchaev requested changes Feb 17, 2026

View reviewed changes

daniil-lyakhov requested a review from AlexanderDokuchaev February 19, 2026 14:37

daniil-lyakhov force-pushed the dl/sa_transpose_a branch from 246ff46 to 5ef1956 Compare February 19, 2026 14:40

daniil-lyakhov and others added 5 commits February 19, 2026 15:51

[WC] Scale Estimation transpose_a support

500cc94

Comments

bf117dd

Comments

be53f8c

revert Comments| add comment

bb9432b

Apply suggestions from code review

b0a4a1f

Co-authored-by: Alexander Dokuchaev <alexander.dokuchaev@intel.com>

Comments

6b796ee

daniil-lyakhov force-pushed the dl/sa_transpose_a branch from 5ef1956 to 817e6d3 Compare February 19, 2026 15:00

WC transpose tests refactoring

817e6d3


		weight = self._backend_entity.get_weight(wp.node_with_weight, weight_port_id, model, graph)

		activation_port_id = self._backend_entity.get_activation_port_id(wp.node_with_weight, graph)

Conversation

daniil-lyakhov commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason for changes

Related tickets

Tests

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniil-lyakhov Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniil-lyakhov commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

daniil-lyakhov commented Jan 13, 2026 •

edited

Loading

daniil-lyakhov Jan 15, 2026 •

edited

Loading