Fix attention fusion in conformer encoder (#23711)

kunal-vaishnavi · web-flow · commit 47a007748acb · 2025-02-16T00:46:12.000-08:00
### Description This PR updates the attention fusion for conformer-encoder models. It is a follow-up to [this PR](#23528). ### Motivation and Context Subsequent modeling code updates have changed (and will continue to change) the graph fusions. However, the three ending attention mask nodes (`Cast --> Unsqueeze --> Equal`) will remain. Thus, the attention fusion should work regardless of any future modeling code changes when handling the attention mask.
diff --git a/onnxruntime/python/tools/transformers/fusion_conformer_attention.py b/onnxruntime/python/tools/transformers/fusion_conformer_attention.py
@@ -79,11 +79,11 @@ def fuse(self, normalize_node, input_name_to_nodes, output_name_to_node):
             where_qk = qk_nodes[2]
             mask_nodes = self.model.match_parent_path(
                 where_qk,
-                ["Equal", "Unsqueeze", "Cast", "Expand"],
-                [0, 0, 0, 0],
+                ["Equal", "Unsqueeze", "Cast"],
+                [0, 0, 0],
             )
             if mask_nodes is not None:
-                attn_mask = mask_nodes[-2].output[0]
+                attn_mask = mask_nodes[-1].output[0]
 
         add_qk, matmul_qk = qk_nodes[-2], qk_nodes[-1]