Skip to content

[WC] RoPe ignored pattern without transpose#3930

Open
daniil-lyakhov wants to merge 4 commits intoopenvinotoolkit:developfrom
daniil-lyakhov:dl/phi_rot_emb_ig_pattern
Open

[WC] RoPe ignored pattern without transpose#3930
daniil-lyakhov wants to merge 4 commits intoopenvinotoolkit:developfrom
daniil-lyakhov:dl/phi_rot_emb_ig_pattern

Conversation

@daniil-lyakhov
Copy link
Collaborator

Changes

  • Additional RoPe ignored pattern without the transpose node is introduced
image

Reason for changes

  • To cover RoPe pattern for the Phi-3.5-moe model

Related tickets

180570

Tests

@daniil-lyakhov daniil-lyakhov requested a review from a team as a code owner February 12, 2026 13:58
Copilot AI review requested due to automatic review settings February 12, 2026 13:58
@github-actions github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF ONNX Pull requests that updates NNCF ONNX labels Feb 12, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces an additional RoPE ignored pattern variant that omits the transpose node, and updates existing test models/tests to cover both graph forms (with and without transpose), targeting coverage for Phi-3.5-moe.

Changes:

  • Added a shared create_rope_pattern helper that supports both RoPE variants (with/without transpose).
  • Updated Torch / OpenVINO / ONNX ignored-pattern registrations to use the shared helper.
  • Parameterized RoPE test models and the cross-framework RoPE compression test to validate both variants.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/torch/fx/test_weights_compression.py Adds with_transpose option to the FX RoPE model fixture.
tests/torch/function_hook/quantization/test_weights_compression.py Adds with_transpose option to the Torch RoPE model fixture.
tests/openvino/native/quantization/test_weights_compression.py Adds with_transpose option to the OpenVINO RoPE model fixture.
tests/openvino/native/models.py Makes OpenVINO RoPE reference model optionally skip transpose.
tests/onnx/quantization/test_weights_compression.py Makes ONNX RoPE model optionally skip transpose and adjusts output shapes accordingly.
tests/cross_fw/test_templates/template_test_weights_compression.py Parameterizes RoPE compression test for both transpose variants; updates template method signature.
tests/cross_fw/test_templates/helpers.py Makes Torch RoPE helper model optionally skip transpose.
src/nncf/torch/quantization/ignored_patterns.py Switches Torch RoPE ignored pattern to shared helper supporting both variants.
src/nncf/openvino/quantization/ignored_patterns.py Switches OpenVINO RoPE ignored pattern to shared helper supporting both variants.
src/nncf/onnx/quantization/ignored_patterns.py Switches ONNX RoPE ignored pattern to shared helper supporting both variants.
src/nncf/quantization/ignored_patterns.py Adds shared RoPE pattern builder with transpose/no-transpose alternatives.
src/nncf/common/graph/patterns/patterns.py Loosens typing for add_node and expands add_edge typing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +17 to +21
mm_metatype: OperatorMetatype,
transpose_metatype: OperatorMetatype,
concat_metatype: OperatorMetatype,
cos_metatype: OperatorMetatype,
sin_metatype: OperatorMetatype,
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metatype arguments are passed around as metatype classes (e.g., om.PTMatMulMetatype), not OperatorMetatype instances. To avoid misleading public typing (and improve static analysis), these parameters should be annotated as type[OperatorMetatype] (and similarly for the others).

Suggested change
mm_metatype: OperatorMetatype,
transpose_metatype: OperatorMetatype,
concat_metatype: OperatorMetatype,
cos_metatype: OperatorMetatype,
sin_metatype: OperatorMetatype,
mm_metatype: type[OperatorMetatype],
transpose_metatype: type[OperatorMetatype],
concat_metatype: type[OperatorMetatype],
cos_metatype: type[OperatorMetatype],
sin_metatype: type[OperatorMetatype],

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actual comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +24 to +31
Creates Rotational Embedding pattern.

:param mm_metatype: MatMul metatype.
:param transpose_metatype: Transpose metatype.
:param concat_metatype: Concat metatype.
:param cos_metatype: Cos metatype.
:param sin_metatype: Sin metatype.
:return: The Rotational Embedding pattern.
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RoPE is commonly expanded as “Rotary Positional Embedding”. The docstring wording “Rotational Embedding” is ambiguous and may be confusing; consider renaming it to “Rotary Positional Embedding (RoPE)” for clarity.

Suggested change
Creates Rotational Embedding pattern.
:param mm_metatype: MatMul metatype.
:param transpose_metatype: Transpose metatype.
:param concat_metatype: Concat metatype.
:param cos_metatype: Cos metatype.
:param sin_metatype: Sin metatype.
:return: The Rotational Embedding pattern.
Creates Rotary Positional Embedding (RoPE) pattern.
:param mm_metatype: MatMul metatype.
:param transpose_metatype: Transpose metatype.
:param concat_metatype: Concat metatype.
:param cos_metatype: Cos metatype.
:param sin_metatype: Sin metatype.
:return: The Rotary Positional Embedding (RoPE) pattern.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion looks valid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add information that pattern contain two graph with and wihtout transpose
May add visual graph like in some patterns

     (matmul)            (matmul)
         |                  |
    (transpose)          (concat)
         |                /    \
      (concat)         (cos) (sin)
       /   \
    (cos) (sin)
 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@daniil-lyakhov
Copy link
Collaborator Author

@AlexanderDokuchaev, could you please take a look?

@daniil-lyakhov daniil-lyakhov force-pushed the dl/phi_rot_emb_ig_pattern branch from f279364 to 6bbf068 Compare February 19, 2026 14:48
Comment on lines +17 to +21
mm_metatype: OperatorMetatype,
transpose_metatype: OperatorMetatype,
concat_metatype: OperatorMetatype,
cos_metatype: OperatorMetatype,
sin_metatype: OperatorMetatype,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actual comment

Comment on lines +24 to +31
Creates Rotational Embedding pattern.

:param mm_metatype: MatMul metatype.
:param transpose_metatype: Transpose metatype.
:param concat_metatype: Concat metatype.
:param cos_metatype: Cos metatype.
:param sin_metatype: Sin metatype.
:return: The Rotational Embedding pattern.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion looks valid.

INPUT_SIZE = [1, 10]

def __init__(self):
def __init__(self, with_transpose: bool = True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a default like with_transpose: bool = True (it's for all parts where it's added) is bad because it hides behavior changes and weakens test coverage.
Old tests may pass without explicitly checking both True and False, so part of the logic remains untested. This creates a false sense of correctness and risks missing bugs related to the transpose path.

For example tests/torch/function_hook/quantization/test_quantized_graphs.py::test_quantized_graphs[rope] tests should be extended too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +24 to +31
Creates Rotational Embedding pattern.

:param mm_metatype: MatMul metatype.
:param transpose_metatype: Transpose metatype.
:param concat_metatype: Concat metatype.
:param cos_metatype: Cos metatype.
:param sin_metatype: Sin metatype.
:return: The Rotational Embedding pattern.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add information that pattern contain two graph with and wihtout transpose
May add visual graph like in some patterns

     (matmul)            (matmul)
         |                  |
    (transpose)          (concat)
         |                /    \
      (concat)         (cos) (sin)
       /   \
    (cos) (sin)
 

def test_rope_weight_compression(self):
model = self.get_RoPE_model()
@pytest.mark.parametrize("with_transpose", [True, False])
def test_rope_weight_compression(self, with_transpose):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How ``test_rope_weight_compression` checks ROPE patterns?
If remove ROPE patterns, this tests still passed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Co-authored-by: Alexander Dokuchaev <alexander.dokuchaev@intel.com>
@daniil-lyakhov daniil-lyakhov force-pushed the dl/phi_rot_emb_ig_pattern branch 2 times, most recently from 26eff12 to 620b411 Compare February 24, 2026 13:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +410 to +413
@staticmethod
@abstractmethod
def get_num_int8_nodes(model: TModel):
"Returns number of int4 nodes."
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for get_num_int8_nodes incorrectly says it returns the number of int4 nodes. This is misleading for implementers of the template and readers of the test expectations; update it to describe int8 nodes.

Copilot uses AI. Check for mistakes.
Comment on lines +463 to +466
# First matmul is always compressed in INT8 format,
# as there is only one matmul in the target model
# the check int8 num ref == 0 checks that the
# ignored ROPE pattern is being applied
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanatory comment about the RoPE test is confusing/contradictory: it states the MatMul is always compressed to INT8, but then references expecting int8_num_nodes == 0 without explaining that this assertion is specifically checking the ignored-pattern path (i.e., compression would otherwise happen). Reword to clearly tie the expected INT8 compression to the default backup-mode behavior and why zero INT8 nodes indicates RoPE was ignored.

Copilot uses AI. Check for mistakes.
Comment on lines 601 to 603
@staticmethod
def get_num_int4_nodes(model: torch.fx.GraphModule) -> int:
def _get_num_typed_nodes(model: torch.fx.GraphModule, types: tuple[BaseWeightsDecompressor, ...]):
num = 0
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types is used as the second argument to isinstance, so it is a tuple of classes, not instances. The type annotation should reflect that (e.g., tuple[type[BaseWeightsDecompressor], ...]) to avoid incorrect typing and improve static analysis.

Copilot uses AI. Check for mistakes.
Comment on lines 817 to 819
@staticmethod
def get_num_int4_nodes(model: torch.nn.Module) -> int:
def _get_num_typed_nodes(model: torch.nn.Module, types: tuple[BaseWeightsDecompressor, ...]):
num = 0
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types is passed to isinstance, so it must be a tuple of classes. The current annotation tuple[BaseWeightsDecompressor, ...] implies instances; change it to something like tuple[type[BaseWeightsDecompressor], ...] for correct typing.

Copilot uses AI. Check for mistakes.
Comment on lines 817 to 822
@staticmethod
def get_num_int4_nodes(model: onnx.ModelProto) -> int:
def _get_num_typed_nodes(model: onnx.ModelProto, types: list[onnx.TensorProto]):
num = 0
for i in model.graph.initializer:
if i.data_type in [onnx.TensorProto.UINT4, onnx.TensorProto.INT4]:
if i.data_type in types:
num += 1
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types is a list of ONNX tensor data type enum values (ints like onnx.TensorProto.INT8), but the annotation list[onnx.TensorProto] refers to the TensorProto message type. Consider annotating as Sequence[int] (or the specific DataType enum type) to match actual usage and avoid misleading typing.

Copilot uses AI. Check for mistakes.
(ModelDesc("embedding_model", EmbeddingModel, [1, 10]), {}),
(ModelDesc("rope_model", RoPEModel, [1, 10]), {"model_type": ModelType.TRANSFORMER}),
(
ModelDesc("rope_model", partial(RoPEModel, with_transpose=True, with_reshape=True), [1, 10]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test cases for the new pattern that you have added

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the test_quantizer_config.py template to cover all the backends at once

"/cos/0" [id=7, type=cos, metatype=PTCosMetatype];
output_0 [id=8, type="nncf_model_output", metatype=PTOutputNoopMetatype];
output_1 [id=9, type="nncf_model_output", metatype=PTOutputNoopMetatype];
data [id=2, type="nncf_model_const", metatype=PTConstNoopMetatype];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did it change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the if statement I've added to the original model. Should I revert it or it is ok to have this change?

x = torch.matmul(reshape, x)
x = torch.transpose(x, 2, 1)

if self._with_reshape:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this added? Is it part of pattern?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was changed because the model with reshape is not compressed by default, by the WC algorithm. It is not the part of the pattern, but with the reshape it is not possible to check that ignored pattern worked out properly. The reason why the reshape was added in the first place is unclear #3059

def get_RoPE_model() -> ov.Model:
return RoPEModel().ov_model
def get_RoPE_model(with_transpose: bool) -> ov.Model:
return RoPEModel(with_transpose=with_transpose, with_broadcast=False).ov_model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what is needed with broadcast?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original subgraph had a broadcast #3049, that's why the reference model had it. However, with that broadcast it is not possible to check that ignored WC pattern has been applied properly, that's why I've added a switch. With broadcast model is still a reference for the quantization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Code Freeze NNCF ONNX Pull requests that updates NNCF ONNX NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PT Pull requests that updates NNCF PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants