Commit d4dd5dc
[Skills][Transformations] Introduce debug-matcher-pass skill (openvinotoolkit#35127)
### Details:
Introduces a new “debug-matcher-pass” skill that documents an end-to-end
workflow for diagnosing why an OpenVINO MatcherPass transformation
doesn’t fire, including a filled example diagnosis report to use as a
quality reference.
### Usage Examples
#### Example 1:
Prompt:
> I run the following command -- `./benchmark_app -d CPU -m
<path>/openvino_model.xml -niter 0` and expect MoE3GeMMFusion matcher
pass to be applied, but it looks like the matcher pass fail. Help me to
investigate why this transformation is not applied.
<details>
<summary>The skill's output (click to expand):</summary>
## MatcherPass Diagnosis: MoE3GeMMFusion
**Root cause:** All three `MatMul` nodes (gate, up, down projections) in
the model's MoE subgraph have `transpose_b=false`, but the
`MoE3GeMMFusion` pattern hard-requires `transpose_b=true` via an
`attrs_match` predicate. No explicit `Transpose` node wraps the weight
input either, so neither of the pattern's two matching paths can
succeed.
**Log evidence:**
```
{ MATCHING PATTERN NODE: WrapType<MatMul>(WrapType, any_input)
├─ AGAINST GRAPH NODE: MatMul(Multiply, Reshape)
├─ PREDICATE `attrs_match({ transpose_b: YES, transpose_a: NO }) && consumers_count(1)` FAILED
} NODES' TYPE MATCHED, but PREDICATE FAILED
```
This phrase appears **30 times** in the matcher log — once per MoE layer
candidate — confirming all 30 match attempts fail at exactly the same
point.
**Failing node:** `ov::op::v0::MatMul` (down-projection, and
symmetrically gate/up-projection MatMuls). Op type is correct; only the
`transpose_b` attribute fails the predicate.
**Resolution:**
- File:
`src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/moe_matmuls_fusion.cpp`,
lines ~241, 248, 253
- Remove the `{"transpose_b", true}` constraint from all three
`wrap_type<MatMul>` predicate maps and instead read the actual flag
inside the callback via `gate_mm_node->get_transpose_b()`, then
propagate it to `BatchGatherMatmul`.
- Alternative: add a second pattern branch (via `pattern::op::Or`)
covering `transpose_b=false`.
## Reproducer Test
File:
`src/plugins/intel_cpu/tests/unit/transformations/moe_matmuls_fusion_test.cpp`
Test name:
`MoE3GeMMFusion_TransposeBFalse_TestF.MoE3GeMMFusion_TransposeBFalse_NotApplied`
Status before fix: **PASS (green)** — transformation does not fire;
model unchanged equals the auto-cloned `model_ref`. Matcher log from the
test run shows the identical `attrs_match({ transpose_b: YES })`
predicate failure as the original model, confirming the reproducer is
faithful.
</details>
#### Example 2:
Prompt:
> I need you to investigate why some transformations in the GPU plugin
pipeline don't work. I run the following command:
> `cd /home/guest/golubevv/openvino/bin/intel64/Release &&
./benchmark_app -d GPU -m subgraph.xml -niter 0`
> and see the issue which is most likely caused by the transformation
pipeline problems. I know that the following transformations should
fire:
>
> 1. FuseVectorizedMOE3GEMM
> 2. ConvertMOEToMOECompressed
> 3. FuseMOE3GemmCompressed
> 4. KeepMOE3GemmConstPrecision
>
> But at least one of them didn't fire. Investigate the root-cause
<details>
<summary>The skill's output (click to expand):</summary>
# MatcherPass Diagnosis: `FuseMOE3GemmCompressed`
**Date:** 2026-04-03
**Model:** Trinity-Mini (`subgraph.xml`)
**Run command:** `./benchmark_app -d GPU -m subgraph.xml -compile_only`
---
## Summary of all four passes
| Pass | Result |
|---|---|
| `FuseVectorizedMOE3GEMM` | ✅ Fired (`CALLBACK SUCCEDED`) |
| `ConvertMOEToMOECompressed` | ✅ Fired (`CALLBACK SUCCEDED`) — produced
`MOECompressed` nodes |
| **`FuseMOE3GemmCompressed`** | ❌ **Never matched** — root cause |
| `KeepMOE3GemmConstPrecision` | ❌ Never matched — downstream effect: no
`MOE3GemmFusedCompressed` exists because step 3 didn't fire; triggers
crash `Input moecompressed: ... hasn't been found in primitive_ids map`
|
---
## Root cause
The sigmoid routing normalization branch in the model has an extra
`Multiply(Divide, Constant)` node between the normalization `Divide` and
the `Slice`.
The `FuseMOE3GemmCompressed` pattern's `sig_slice` (defined at line 91
of
[fuse_moe_3gemm_compressed.cpp](src/plugins/intel_gpu/src/plugin/transformations/fuse_moe_3gemm_compressed.cpp))
expects:
```
Slice( Divide(...), ... )
```
But the graph has:
```
Slice( Multiply( Divide(...), Constant ), ... )
```
The extra `Multiply` is a routing-weight scaling constant multiply
inserted after normalization. It is present in the Trinity-Mini model
but was not present in the models used when the pattern was originally
written.
---
## Log evidence
From both `/tmp/matcher.log` (original run) and the unit test
reproducer:
```
{ ARGUMENT 0: WrapType<Divide>
MATCHING PATTERN NODE: WrapType<Divide>(WrapType, WrapType)
AGAINST GRAPH NODE: Multiply(Divide, Constant)
} NODES' TYPE DIDN'T MATCH. EXPECTED: WrapType<Divide>. OBSERVED: Multiply
```
**Location in the match tree:**
`MOECompressed` → ARGUMENT 1 (`Unsqueeze`) → ARGUMENT 0 (`Reshape`) →
ARGUMENT 0 (`Transpose`) → ARGUMENT 0 (`ScatterElementsUpdate`) →
**ARGUMENT 2** (`patternOr(sm_norm_slice | sig_slice)`) → BRANCH 1
(`sig_slice`) → **ARGUMENT 0** (expected `WrapType<Divide>`, observed
`Multiply`).
**Failing graph node:** `Multiply(Divide, Constant)` — the
routing-weight scaling step inserted between `sig_norm` (Divide) and
`sig_slice` (Slice).
---
## Crash consequence
Because `FuseMOE3GemmCompressed` never fires,
`KeepMOE3GemmConstPrecision` finds no `MOE3GemmFusedCompressed` nodes
and also does nothing. The GPU program builder then fails:
```
Exception from src/plugins/intel_gpu/src/plugin/program_builder.cpp:265:
Input moecompressed:__module.model.layers.2.mlp/aten::sum/ReduceSum/MOECompressed
hasn't been found in primitive_ids map
```
---
## Resolution
In
[src/plugins/intel_gpu/src/plugin/transformations/fuse_moe_3gemm_compressed.cpp](src/plugins/intel_gpu/src/plugin/transformations/fuse_moe_3gemm_compressed.cpp)
around lines 90–91, add an `optional<Multiply>` wrapper between
`sig_norm` and `sig_slice`:
```cpp
// Before:
auto sig_norm = wrap_type<ov::op::v1::Divide>({sig_gather_el, sig_add_eps}, consumers_count(1));
auto sig_slice = wrap_type<ov::op::v8::Slice>({sig_norm, ANY, ANY, ANY, ANY}, consumers_count(1));
// After:
auto sig_norm = wrap_type<ov::op::v1::Divide>({sig_gather_el, sig_add_eps}, consumers_count(1));
auto sig_norm_scaled = optional<ov::op::v1::Multiply>({sig_norm, ANY}); // absorbs optional scaling
auto sig_slice = wrap_type<ov::op::v8::Slice>({sig_norm_scaled, ANY, ANY, ANY, ANY}, consumers_count(1));
```
`optional<Multiply>` will match `Multiply(Divide, Constant)` when
present, or fall through to `Divide` directly for models without the
extra scaling — no change to the existing passing test cases.
> **Note:** Once the pattern fires, also verify that
`MOE3GemmFusedCompressed`'s kernel/op implementation correctly applies
the routing-weight scaling factor. The `Multiply` constant is currently
not passed as an explicit argument to the fused op — the kernel may need
a new input or an internal constant to reproduce correct numerical
output.
---
## Reproducer Test
**File:**
[src/plugins/intel_gpu/tests/unit/transformations/fuse_moe_3gemm_compressed_test.cpp](src/plugins/intel_gpu/tests/unit/transformations/fuse_moe_3gemm_compressed_test.cpp)
**Test name:**
`TransformationTestsF.FuseMOE3GemmCompressed_SigmoidBias_ScaledNorm`
**Build target:** `ov_gpu_unit_tests`
**Run command:**
```bash
cd /home/guest/golubevv/openvino/bin/intel64/Release
OV_MATCHER_LOGGING=true OV_MATCHERS_TO_LOG=FuseMOE3GemmCompressed \
./ov_gpu_unit_tests \
--gtest_filter="*FuseMOE3GemmCompressed_SigmoidBias_ScaledNorm*"
```
**Status before fix:** PASS ✅ — the transformation does not fire so the
model is unchanged and matches the auto-cloned `model_ref`. This
confirms the bug is reproduced.
The test log shows the identical failure phrase:
```
NODES' TYPE DIDN'T MATCH. EXPECTED: WrapType<Divide>. OBSERVED: Multiply
AGAINST GRAPH NODE: Slice(Multiply, Constant, ShapeOf, Constant, Constant)
```
**After fix:** the test will FAIL because `model` is now transformed and
no longer matches the auto-cloned ref. At that point, add an explicit
`model_ref` block with the expected `MOE3GemmFusedCompressed` result
graph to turn it into a proper regression guard.
</details>
### Tickets:
- *N\A*
### AI Assistance:
- *yes*
- *AI was used to improve the skill based on real usage examples*
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>1 parent 0615733 commit d4dd5dc
2 files changed
Lines changed: 348 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
0 commit comments