Skip to content

[WC] Scale Estimation transpose_a support#3839

Open
daniil-lyakhov wants to merge 7 commits intoopenvinotoolkit:developfrom
daniil-lyakhov:dl/sa_transpose_a
Open

[WC] Scale Estimation transpose_a support#3839
daniil-lyakhov wants to merge 7 commits intoopenvinotoolkit:developfrom
daniil-lyakhov:dl/sa_transpose_a

Conversation

@daniil-lyakhov
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov commented Jan 13, 2026

Changes

  • Support of conv transpose_a attribute for ONNX/OpenVINO backends in the Scale Estimation algorithm
  • The static method activations_to_wc_statistics moved from the Scale Estimation algorithm to the GPTQ

Reason for changes

  • To enable Scale Estimation for the models with SMM (pattern has transpose_a in ONNX/OpenVINO)

Related tickets

173277
#3230

Tests

tests/cross_fw/test_templates/template_test_weights_compression.py::test_scale_estimation updated with the transpose_a cases

@github-actions github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF ONNX Pull requests that updates NNCF ONNX labels Jan 13, 2026
@daniil-lyakhov daniil-lyakhov marked this pull request as ready for review January 13, 2026 18:45
@daniil-lyakhov daniil-lyakhov requested a review from a team as a code owner January 13, 2026 18:45
act_ch_axis = self._backend_entity.get_activation_channel_axis(
wp.node_with_weight, activation_port_id, act_shape
)
act_ch_axis %= len(act_shape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For which case it's need?
All tests still pass if remove the line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated


weight = self._backend_entity.get_weight(wp.node_with_weight, weight_port_id, model, graph)

activation_port_id = self._backend_entity.get_activation_port_id(wp.node_with_weight, graph)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like copypast from awq.py.
Please think about to refactor it into a shared function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WeightCompressionAlgoBackend.get_activation_channel_axis_and_shape is instroduced, please check


matmul = opset.matmul(input_1, weight_data, transpose_a=False, transpose_b=False, name="MoE_MatMul")
if tranpsose_a:
transpose = opset.transpose(input_1, (0, 2, 1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check, looks like it never runs

Copy link
Collaborator Author

@daniil-lyakhov daniil-lyakhov Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch,
It is, because of this skip https://github.com/daniil-lyakhov/nncf/blob/dl/sa_transpose_a/tests/openvino/native/quantization/test_weights_compression.py#L2369

I fixed the test and asked @anzr299 to remove the skip. Ticket 179366

@daniil-lyakhov
Copy link
Collaborator Author

@AlexanderDokuchaev, @andreyanufr, please take a look

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the transpose_a attribute in the Scale Estimation algorithm for ONNX and OpenVINO backends, enabling Scale Estimation for models with transposed activations (SMM patterns). Additionally, the activations_to_wc_statistics method is moved from Scale Estimation to GPTQ where it is actually used.

Changes:

  • Added transpose_a parameter support throughout the Scale Estimation test infrastructure and model builders
  • Moved activations_to_wc_statistics static method from ScaleEstimation to GPTQ algorithm
  • Removed the check that previously blocked Scale Estimation for transposed activations

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/torch/fx/test_weights_compression.py Added transpose_a parameter to test model factory methods
tests/torch/function_hook/quantization/test_weights_compression.py Added transpose_a parameter to test model factory methods
tests/openvino/native/quantization/test_weights_compression.py Added transpose_a parameter and test parametrization for Scale Estimation tests
tests/openvino/native/models.py Updated model builders to support transpose_a with actual transpose operations
tests/onnx/quantization/test_weights_compression.py Added transpose_a support with GEMM-based implementation for ONNX
tests/onnx/common.py Added add_squeeze helper method to ModelBuilder
tests/cross_fw/test_templates/template_test_weights_compression.py Updated test template to parametrize transpose_a and removed scale_estimation from transpose skip test
src/nncf/quantization/algorithms/weight_compression/scale_estimation.py Removed transpose check, added act_ch_axis parameter, removed activations_to_wc_statistics method
src/nncf/quantization/algorithms/weight_compression/gptq.py Added activations_to_wc_statistics static method moved from ScaleEstimation
src/nncf/quantization/algorithms/weight_compression/backend.py Added get_activation_channel_axis_and_shape helper method
src/nncf/quantization/algorithms/weight_compression/awq.py Refactored to use new backend helper method instead of private method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

activation_port_id = self.get_activation_port_id(node, graph)
act_shape = graph.get_input_edge_by_port_id(node, activation_port_id).tensor_shape
act_ch_axis = self.get_activation_channel_axis(node, activation_port_id, act_shape)
# Mod the activation axis by the length of the activation shape
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#424-block-and-inline-comments

Comment that just describe build-in python operation is useless here
Please explain why this is needed." (in which case get_activation_channel_axis returns positive or negative axis and why required here to to use only positive)

And may be it it's problem of backend specific get_activation_channel_axis implementation, and will be better to fix it.

PS. For PT it's actual too, not only for ONNX.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Normalization is Required:
Downstream code in AWQ and activation stats processing compares act_ch_axis with positive dimension indices. For example in awq.py line 227:

Why Not Fix Backends Instead:
While it would be cleaner to standardize backend implementations, the current approach is actually correct because:

  • The abstract method get_activation_channel_axis allows backends flexibility to use their natural conventions
  • The normalization happens at a single point (get_activation_channel_axis_and_shape) that explicitly documents it returns positive axes
  • This avoids changing multiple backend implementations and maintains backward compatibility

if transpose_a:
squeeze = mb.add_squeeze(x)
transpose = mb.add_transpose(squeeze, (1, 0))
mb.add_gemm(transpose, shape=(8, 16), output=output, weight_data=weights, trans_a=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why for OV used transpose+matmul but for onnx squeeze + transpose + gemm?
Could it be defined by same operations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemm operation does not support batch dimentions (in contrast to OpenVINO backend), to align them one have to rewrite all the tests to eliminate the batch dimentions for all backends.

I can do so but I don't see any value in that and it a lot of effort work


@staticmethod
def get_model_for_test_scale_estimation():
def get_model_for_test_scale_estimation(transpose_a: bool):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add assert transpose_a=False to avoid any confusion from signature of method and result

And looks like using pytest.skip inside model creation function instead of using extra transpose_a_supported fixture, is shorter and easier too understand and support

class TestPTTemplateWeightCompression(TemplateWeightCompression):
    @staticmethod
    def get_model_for_test_scale_estimation(transpose_a: bool):
        if transpose_a:
            pytest.skip("transpose_a=True is not supported for PT backend")
        return LinearModel(torch.arange(0, 8 * 16, dtype=torch.float32).reshape(16, 8))

instead of

class TemplateWeightCompression(ABC):

    @abstractmethod
    @pytest.fixture
    def transpose_a_supported(self) -> bool:

    @pytest.mark.parametrize("transpose_a", [False, True])
    @pytest.mark.parametrize("is_moe", [False, True])
    @pytest.mark.parametrize("check_sampling_activation_stats_flow", [False, True])
    def test_scale_estimation(
        self, mocker, transpose_a, is_moe, check_sampling_activation_stats_flow, transpose_a_supported
    ):
        """Checks that scales match the reference."""
        if transpose_a and not transpose_a_supported:
            msg = "Transpose a is not supported for the current backend"
            pytest.skip(msg)


class TestPTTemplateWeightCompression(TemplateWeightCompression):
    @pytest.fixture
    def transpose_a_supported(self) -> bool:
        return False

    @staticmethod
    def get_model_for_test_scale_estimation(transpose_a: bool):
        return LinearModel(torch.arange(0, 8 * 16, dtype=torch.float32).reshape(16, 8))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

reduction_axis = reduction_axes[0]

s, X = process_stats(statistics, subset_size)
s, X = process_stats(statistics, subset_size, act_ch_axis=act_ch_axis)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, this is the main change in this PR.
But when I revert it, all tests still pass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NNCF ONNX Pull requests that updates NNCF ONNX NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PT Pull requests that updates NNCF PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments