[WC] RoPe ignored pattern without transpose#3930
[WC] RoPe ignored pattern without transpose#3930daniil-lyakhov wants to merge 4 commits intoopenvinotoolkit:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Introduces an additional RoPE ignored pattern variant that omits the transpose node, and updates existing test models/tests to cover both graph forms (with and without transpose), targeting coverage for Phi-3.5-moe.
Changes:
- Added a shared
create_rope_patternhelper that supports both RoPE variants (with/without transpose). - Updated Torch / OpenVINO / ONNX ignored-pattern registrations to use the shared helper.
- Parameterized RoPE test models and the cross-framework RoPE compression test to validate both variants.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/torch/fx/test_weights_compression.py | Adds with_transpose option to the FX RoPE model fixture. |
| tests/torch/function_hook/quantization/test_weights_compression.py | Adds with_transpose option to the Torch RoPE model fixture. |
| tests/openvino/native/quantization/test_weights_compression.py | Adds with_transpose option to the OpenVINO RoPE model fixture. |
| tests/openvino/native/models.py | Makes OpenVINO RoPE reference model optionally skip transpose. |
| tests/onnx/quantization/test_weights_compression.py | Makes ONNX RoPE model optionally skip transpose and adjusts output shapes accordingly. |
| tests/cross_fw/test_templates/template_test_weights_compression.py | Parameterizes RoPE compression test for both transpose variants; updates template method signature. |
| tests/cross_fw/test_templates/helpers.py | Makes Torch RoPE helper model optionally skip transpose. |
| src/nncf/torch/quantization/ignored_patterns.py | Switches Torch RoPE ignored pattern to shared helper supporting both variants. |
| src/nncf/openvino/quantization/ignored_patterns.py | Switches OpenVINO RoPE ignored pattern to shared helper supporting both variants. |
| src/nncf/onnx/quantization/ignored_patterns.py | Switches ONNX RoPE ignored pattern to shared helper supporting both variants. |
| src/nncf/quantization/ignored_patterns.py | Adds shared RoPE pattern builder with transpose/no-transpose alternatives. |
| src/nncf/common/graph/patterns/patterns.py | Loosens typing for add_node and expands add_edge typing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mm_metatype: OperatorMetatype, | ||
| transpose_metatype: OperatorMetatype, | ||
| concat_metatype: OperatorMetatype, | ||
| cos_metatype: OperatorMetatype, | ||
| sin_metatype: OperatorMetatype, |
There was a problem hiding this comment.
The metatype arguments are passed around as metatype classes (e.g., om.PTMatMulMetatype), not OperatorMetatype instances. To avoid misleading public typing (and improve static analysis), these parameters should be annotated as type[OperatorMetatype] (and similarly for the others).
| mm_metatype: OperatorMetatype, | |
| transpose_metatype: OperatorMetatype, | |
| concat_metatype: OperatorMetatype, | |
| cos_metatype: OperatorMetatype, | |
| sin_metatype: OperatorMetatype, | |
| mm_metatype: type[OperatorMetatype], | |
| transpose_metatype: type[OperatorMetatype], | |
| concat_metatype: type[OperatorMetatype], | |
| cos_metatype: type[OperatorMetatype], | |
| sin_metatype: type[OperatorMetatype], |
There was a problem hiding this comment.
It's actual comment
| Creates Rotational Embedding pattern. | ||
|
|
||
| :param mm_metatype: MatMul metatype. | ||
| :param transpose_metatype: Transpose metatype. | ||
| :param concat_metatype: Concat metatype. | ||
| :param cos_metatype: Cos metatype. | ||
| :param sin_metatype: Sin metatype. | ||
| :return: The Rotational Embedding pattern. |
There was a problem hiding this comment.
RoPE is commonly expanded as “Rotary Positional Embedding”. The docstring wording “Rotational Embedding” is ambiguous and may be confusing; consider renaming it to “Rotary Positional Embedding (RoPE)” for clarity.
| Creates Rotational Embedding pattern. | |
| :param mm_metatype: MatMul metatype. | |
| :param transpose_metatype: Transpose metatype. | |
| :param concat_metatype: Concat metatype. | |
| :param cos_metatype: Cos metatype. | |
| :param sin_metatype: Sin metatype. | |
| :return: The Rotational Embedding pattern. | |
| Creates Rotary Positional Embedding (RoPE) pattern. | |
| :param mm_metatype: MatMul metatype. | |
| :param transpose_metatype: Transpose metatype. | |
| :param concat_metatype: Concat metatype. | |
| :param cos_metatype: Cos metatype. | |
| :param sin_metatype: Sin metatype. | |
| :return: The Rotary Positional Embedding (RoPE) pattern. |
There was a problem hiding this comment.
The suggestion looks valid.
There was a problem hiding this comment.
And add information that pattern contain two graph with and wihtout transpose
May add visual graph like in some patterns
(matmul) (matmul)
| |
(transpose) (concat)
| / \
(concat) (cos) (sin)
/ \
(cos) (sin)
|
@AlexanderDokuchaev, could you please take a look? |
f279364 to
6bbf068
Compare
| mm_metatype: OperatorMetatype, | ||
| transpose_metatype: OperatorMetatype, | ||
| concat_metatype: OperatorMetatype, | ||
| cos_metatype: OperatorMetatype, | ||
| sin_metatype: OperatorMetatype, |
There was a problem hiding this comment.
It's actual comment
| Creates Rotational Embedding pattern. | ||
|
|
||
| :param mm_metatype: MatMul metatype. | ||
| :param transpose_metatype: Transpose metatype. | ||
| :param concat_metatype: Concat metatype. | ||
| :param cos_metatype: Cos metatype. | ||
| :param sin_metatype: Sin metatype. | ||
| :return: The Rotational Embedding pattern. |
There was a problem hiding this comment.
The suggestion looks valid.
| INPUT_SIZE = [1, 10] | ||
|
|
||
| def __init__(self): | ||
| def __init__(self, with_transpose: bool = True): |
There was a problem hiding this comment.
Adding a default like with_transpose: bool = True (it's for all parts where it's added) is bad because it hides behavior changes and weakens test coverage.
Old tests may pass without explicitly checking both True and False, so part of the logic remains untested. This creates a false sense of correctness and risks missing bugs related to the transpose path.
For example tests/torch/function_hook/quantization/test_quantized_graphs.py::test_quantized_graphs[rope] tests should be extended too
| Creates Rotational Embedding pattern. | ||
|
|
||
| :param mm_metatype: MatMul metatype. | ||
| :param transpose_metatype: Transpose metatype. | ||
| :param concat_metatype: Concat metatype. | ||
| :param cos_metatype: Cos metatype. | ||
| :param sin_metatype: Sin metatype. | ||
| :return: The Rotational Embedding pattern. |
There was a problem hiding this comment.
And add information that pattern contain two graph with and wihtout transpose
May add visual graph like in some patterns
(matmul) (matmul)
| |
(transpose) (concat)
| / \
(concat) (cos) (sin)
/ \
(cos) (sin)
| def test_rope_weight_compression(self): | ||
| model = self.get_RoPE_model() | ||
| @pytest.mark.parametrize("with_transpose", [True, False]) | ||
| def test_rope_weight_compression(self, with_transpose): |
There was a problem hiding this comment.
How ``test_rope_weight_compression` checks ROPE patterns?
If remove ROPE patterns, this tests still passed.
Co-authored-by: Alexander Dokuchaev <alexander.dokuchaev@intel.com>
26eff12 to
620b411
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @staticmethod | ||
| @abstractmethod | ||
| def get_num_int8_nodes(model: TModel): | ||
| "Returns number of int4 nodes." |
There was a problem hiding this comment.
The docstring for get_num_int8_nodes incorrectly says it returns the number of int4 nodes. This is misleading for implementers of the template and readers of the test expectations; update it to describe int8 nodes.
| # First matmul is always compressed in INT8 format, | ||
| # as there is only one matmul in the target model | ||
| # the check int8 num ref == 0 checks that the | ||
| # ignored ROPE pattern is being applied |
There was a problem hiding this comment.
The explanatory comment about the RoPE test is confusing/contradictory: it states the MatMul is always compressed to INT8, but then references expecting int8_num_nodes == 0 without explaining that this assertion is specifically checking the ignored-pattern path (i.e., compression would otherwise happen). Reword to clearly tie the expected INT8 compression to the default backup-mode behavior and why zero INT8 nodes indicates RoPE was ignored.
| @staticmethod | ||
| def get_num_int4_nodes(model: torch.fx.GraphModule) -> int: | ||
| def _get_num_typed_nodes(model: torch.fx.GraphModule, types: tuple[BaseWeightsDecompressor, ...]): | ||
| num = 0 |
There was a problem hiding this comment.
types is used as the second argument to isinstance, so it is a tuple of classes, not instances. The type annotation should reflect that (e.g., tuple[type[BaseWeightsDecompressor], ...]) to avoid incorrect typing and improve static analysis.
| @staticmethod | ||
| def get_num_int4_nodes(model: torch.nn.Module) -> int: | ||
| def _get_num_typed_nodes(model: torch.nn.Module, types: tuple[BaseWeightsDecompressor, ...]): | ||
| num = 0 |
There was a problem hiding this comment.
types is passed to isinstance, so it must be a tuple of classes. The current annotation tuple[BaseWeightsDecompressor, ...] implies instances; change it to something like tuple[type[BaseWeightsDecompressor], ...] for correct typing.
| @staticmethod | ||
| def get_num_int4_nodes(model: onnx.ModelProto) -> int: | ||
| def _get_num_typed_nodes(model: onnx.ModelProto, types: list[onnx.TensorProto]): | ||
| num = 0 | ||
| for i in model.graph.initializer: | ||
| if i.data_type in [onnx.TensorProto.UINT4, onnx.TensorProto.INT4]: | ||
| if i.data_type in types: | ||
| num += 1 |
There was a problem hiding this comment.
types is a list of ONNX tensor data type enum values (ints like onnx.TensorProto.INT8), but the annotation list[onnx.TensorProto] refers to the TensorProto message type. Consider annotating as Sequence[int] (or the specific DataType enum type) to match actual usage and avoid misleading typing.
| (ModelDesc("embedding_model", EmbeddingModel, [1, 10]), {}), | ||
| (ModelDesc("rope_model", RoPEModel, [1, 10]), {"model_type": ModelType.TRANSFORMER}), | ||
| ( | ||
| ModelDesc("rope_model", partial(RoPEModel, with_transpose=True, with_reshape=True), [1, 10]), |
There was a problem hiding this comment.
Please add a test cases for the new pattern that you have added
There was a problem hiding this comment.
I've updated the test_quantizer_config.py template to cover all the backends at once
| "/cos/0" [id=7, type=cos, metatype=PTCosMetatype]; | ||
| output_0 [id=8, type="nncf_model_output", metatype=PTOutputNoopMetatype]; | ||
| output_1 [id=9, type="nncf_model_output", metatype=PTOutputNoopMetatype]; | ||
| data [id=2, type="nncf_model_const", metatype=PTConstNoopMetatype]; |
There was a problem hiding this comment.
Why did it change?
There was a problem hiding this comment.
Because of the if statement I've added to the original model. Should I revert it or it is ok to have this change?
| x = torch.matmul(reshape, x) | ||
| x = torch.transpose(x, 2, 1) | ||
|
|
||
| if self._with_reshape: |
There was a problem hiding this comment.
Why is this added? Is it part of pattern?
There was a problem hiding this comment.
It was changed because the model with reshape is not compressed by default, by the WC algorithm. It is not the part of the pattern, but with the reshape it is not possible to check that ignored pattern worked out properly. The reason why the reshape was added in the first place is unclear #3059
| def get_RoPE_model() -> ov.Model: | ||
| return RoPEModel().ov_model | ||
| def get_RoPE_model(with_transpose: bool) -> ov.Model: | ||
| return RoPEModel(with_transpose=with_transpose, with_broadcast=False).ov_model |
There was a problem hiding this comment.
For what is needed with broadcast?
There was a problem hiding this comment.
The original subgraph had a broadcast #3049, that's why the reference model had it. However, with that broadcast it is not possible to check that ignored WC pattern has been applied properly, that's why I've added a switch. With broadcast model is still a reference for the quantization
6ee5aa2 to
c2a45d1
Compare
Changes
Reason for changes
Related tickets
180570
Tests
with_transposeto cover the new ignored patternhttps://github.com/openvinotoolkit/nncf/compare/develop...daniil-lyakhov:dl/phi_rot_emb_ig_pattern?expand=1#diff-32bc7464d48f05ead87262b533f2e0ce9ea2db101e9e8ae59cb64c65d6aba351L448-L449