[WC] Scale Estimation transpose_a support#3839
[WC] Scale Estimation transpose_a support#3839daniil-lyakhov wants to merge 7 commits intoopenvinotoolkit:developfrom
Conversation
| act_ch_axis = self._backend_entity.get_activation_channel_axis( | ||
| wp.node_with_weight, activation_port_id, act_shape | ||
| ) | ||
| act_ch_axis %= len(act_shape) |
There was a problem hiding this comment.
For which case it's need?
All tests still pass if remove the line.
|
|
||
| weight = self._backend_entity.get_weight(wp.node_with_weight, weight_port_id, model, graph) | ||
|
|
||
| activation_port_id = self._backend_entity.get_activation_port_id(wp.node_with_weight, graph) |
There was a problem hiding this comment.
This looks like copypast from awq.py.
Please think about to refactor it into a shared function.
There was a problem hiding this comment.
WeightCompressionAlgoBackend.get_activation_channel_axis_and_shape is instroduced, please check
|
|
||
| matmul = opset.matmul(input_1, weight_data, transpose_a=False, transpose_b=False, name="MoE_MatMul") | ||
| if tranpsose_a: | ||
| transpose = opset.transpose(input_1, (0, 2, 1)) |
There was a problem hiding this comment.
Please check, looks like it never runs
There was a problem hiding this comment.
Good catch,
It is, because of this skip https://github.com/daniil-lyakhov/nncf/blob/dl/sa_transpose_a/tests/openvino/native/quantization/test_weights_compression.py#L2369
I fixed the test and asked @anzr299 to remove the skip. Ticket 179366
1f70ea1 to
35d0cc2
Compare
|
@AlexanderDokuchaev, @andreyanufr, please take a look |
There was a problem hiding this comment.
Pull request overview
This PR adds support for the transpose_a attribute in the Scale Estimation algorithm for ONNX and OpenVINO backends, enabling Scale Estimation for models with transposed activations (SMM patterns). Additionally, the activations_to_wc_statistics method is moved from Scale Estimation to GPTQ where it is actually used.
Changes:
- Added
transpose_aparameter support throughout the Scale Estimation test infrastructure and model builders - Moved
activations_to_wc_statisticsstatic method from ScaleEstimation to GPTQ algorithm - Removed the check that previously blocked Scale Estimation for transposed activations
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/torch/fx/test_weights_compression.py | Added transpose_a parameter to test model factory methods |
| tests/torch/function_hook/quantization/test_weights_compression.py | Added transpose_a parameter to test model factory methods |
| tests/openvino/native/quantization/test_weights_compression.py | Added transpose_a parameter and test parametrization for Scale Estimation tests |
| tests/openvino/native/models.py | Updated model builders to support transpose_a with actual transpose operations |
| tests/onnx/quantization/test_weights_compression.py | Added transpose_a support with GEMM-based implementation for ONNX |
| tests/onnx/common.py | Added add_squeeze helper method to ModelBuilder |
| tests/cross_fw/test_templates/template_test_weights_compression.py | Updated test template to parametrize transpose_a and removed scale_estimation from transpose skip test |
| src/nncf/quantization/algorithms/weight_compression/scale_estimation.py | Removed transpose check, added act_ch_axis parameter, removed activations_to_wc_statistics method |
| src/nncf/quantization/algorithms/weight_compression/gptq.py | Added activations_to_wc_statistics static method moved from ScaleEstimation |
| src/nncf/quantization/algorithms/weight_compression/backend.py | Added get_activation_channel_axis_and_shape helper method |
| src/nncf/quantization/algorithms/weight_compression/awq.py | Refactored to use new backend helper method instead of private method |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| activation_port_id = self.get_activation_port_id(node, graph) | ||
| act_shape = graph.get_input_edge_by_port_id(node, activation_port_id).tensor_shape | ||
| act_ch_axis = self.get_activation_channel_axis(node, activation_port_id, act_shape) | ||
| # Mod the activation axis by the length of the activation shape |
There was a problem hiding this comment.
Comment that just describe build-in python operation is useless here
Please explain why this is needed." (in which case get_activation_channel_axis returns positive or negative axis and why required here to to use only positive)
And may be it it's problem of backend specific get_activation_channel_axis implementation, and will be better to fix it.
PS. For PT it's actual too, not only for ONNX.
There was a problem hiding this comment.
Why Normalization is Required:
Downstream code in AWQ and activation stats processing compares act_ch_axis with positive dimension indices. For example in awq.py line 227:
Why Not Fix Backends Instead:
While it would be cleaner to standardize backend implementations, the current approach is actually correct because:
- The abstract method get_activation_channel_axis allows backends flexibility to use their natural conventions
- The normalization happens at a single point (get_activation_channel_axis_and_shape) that explicitly documents it returns positive axes
- This avoids changing multiple backend implementations and maintains backward compatibility
tests/cross_fw/test_templates/template_test_weights_compression.py
Outdated
Show resolved
Hide resolved
| if transpose_a: | ||
| squeeze = mb.add_squeeze(x) | ||
| transpose = mb.add_transpose(squeeze, (1, 0)) | ||
| mb.add_gemm(transpose, shape=(8, 16), output=output, weight_data=weights, trans_a=1) |
There was a problem hiding this comment.
Why for OV used transpose+matmul but for onnx squeeze + transpose + gemm?
Could it be defined by same operations?
There was a problem hiding this comment.
Gemm operation does not support batch dimentions (in contrast to OpenVINO backend), to align them one have to rewrite all the tests to eliminate the batch dimentions for all backends.
I can do so but I don't see any value in that and it a lot of effort work
|
|
||
| @staticmethod | ||
| def get_model_for_test_scale_estimation(): | ||
| def get_model_for_test_scale_estimation(transpose_a: bool): |
There was a problem hiding this comment.
Please add assert transpose_a=False to avoid any confusion from signature of method and result
And looks like using pytest.skip inside model creation function instead of using extra transpose_a_supported fixture, is shorter and easier too understand and support
class TestPTTemplateWeightCompression(TemplateWeightCompression):
@staticmethod
def get_model_for_test_scale_estimation(transpose_a: bool):
if transpose_a:
pytest.skip("transpose_a=True is not supported for PT backend")
return LinearModel(torch.arange(0, 8 * 16, dtype=torch.float32).reshape(16, 8))instead of
class TemplateWeightCompression(ABC):
@abstractmethod
@pytest.fixture
def transpose_a_supported(self) -> bool:
@pytest.mark.parametrize("transpose_a", [False, True])
@pytest.mark.parametrize("is_moe", [False, True])
@pytest.mark.parametrize("check_sampling_activation_stats_flow", [False, True])
def test_scale_estimation(
self, mocker, transpose_a, is_moe, check_sampling_activation_stats_flow, transpose_a_supported
):
"""Checks that scales match the reference."""
if transpose_a and not transpose_a_supported:
msg = "Transpose a is not supported for the current backend"
pytest.skip(msg)
class TestPTTemplateWeightCompression(TemplateWeightCompression):
@pytest.fixture
def transpose_a_supported(self) -> bool:
return False
@staticmethod
def get_model_for_test_scale_estimation(transpose_a: bool):
return LinearModel(torch.arange(0, 8 * 16, dtype=torch.float32).reshape(16, 8))| reduction_axis = reduction_axes[0] | ||
|
|
||
| s, X = process_stats(statistics, subset_size) | ||
| s, X = process_stats(statistics, subset_size, act_ch_axis=act_ch_axis) |
There was a problem hiding this comment.
As far as I understand, this is the main change in this PR.
But when I revert it, all tests still pass.
src/nncf/quantization/algorithms/weight_compression/scale_estimation.py
Outdated
Show resolved
Hide resolved
246ff46 to
5ef1956
Compare
Co-authored-by: Alexander Dokuchaev <alexander.dokuchaev@intel.com>
5ef1956 to
817e6d3
Compare
Changes
transpose_aattribute for ONNX/OpenVINO backends in the Scale Estimation algorithmactivations_to_wc_statisticsmoved from the Scale Estimation algorithm to the GPTQReason for changes
Related tickets
173277
#3230
Tests
tests/cross_fw/test_templates/template_test_weights_compression.py::test_scale_estimation updated with the
transpose_acases