Skip to content

[NPUW]Transpose V tensor for Softmax - Slice - Matmul.#33807

Merged
intelgaoxiong merged 2 commits intoopenvinotoolkit:masterfrom
intelgaoxiong:xiong/transpose_v_for_gpt_oss
Jan 28, 2026
Merged

[NPUW]Transpose V tensor for Softmax - Slice - Matmul.#33807
intelgaoxiong merged 2 commits intoopenvinotoolkit:masterfrom
intelgaoxiong:xiong/transpose_v_for_gpt_oss

Conversation

@intelgaoxiong
Copy link
Contributor

@intelgaoxiong intelgaoxiong commented Jan 26, 2026

Details:

GPT-OSS SDPA has sink input.
There is a pair of Concat and Slice around Softmax.
V tensor transpose could not work for the pattern.

This PR extended V tensor transpose for GPT-OSS pattern to eliminate the Permutation in compiler.

Tickets:

@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Jan 26, 2026
@intelgaoxiong intelgaoxiong marked this pull request as ready for review January 26, 2026 02:38
@intelgaoxiong intelgaoxiong requested review from a team as code owners January 26, 2026 02:38
@intelgaoxiong
Copy link
Contributor Author

@dmatveev @esmirno Could you please take a look?
Thanks!


// llama2 pattern for value tensor concate
class TransposeValueTensors_llama2 : public TransposeValueTensors {
// MHA (Multi-Head Attention) pattern for value tensor concatenation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you implement test(s) for transformation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkazants Done
Thanks!

@intelgaoxiong intelgaoxiong force-pushed the xiong/transpose_v_for_gpt_oss branch from 55c4fd5 to 68646a4 Compare January 26, 2026 12:20
Copy link
Contributor

@esmirno esmirno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, unit tests are maintained in expected way

@intelgaoxiong intelgaoxiong force-pushed the xiong/transpose_v_for_gpt_oss branch from 68646a4 to 3420d33 Compare January 27, 2026 09:56
@intelgaoxiong
Copy link
Contributor Author

@dmatveev Could you please take a look?
Thanks!

@intelgaoxiong
Copy link
Contributor Author

intelgaoxiong commented Jan 27, 2026

Signed-off-by: intelgaoxiong <xiong.gao@intel.com>
Signed-off-by: intelgaoxiong <xiong.gao@intel.com>
@dmatveev
Copy link
Contributor

@intelgaoxiong in your report, the branch is shown as xiong/gpt-oss_device_routed_rebase, was that the full MoE build including this particular change?

Copy link
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM

@intelgaoxiong
Copy link
Contributor Author

intelgaoxiong commented Jan 27, 2026

@intelgaoxiong in your report, the branch is shown as xiong/gpt-oss_device_routed_rebase, was that the full MoE build including this particular change?

The package name is below, commit ID exactly matches the ID in this PR.
image

The validated packaged only included the changes in this PR.

@dmatveev dmatveev added this to the 2026.1 milestone Jan 27, 2026
@dmatveev dmatveev added this pull request to the merge queue Jan 27, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 28, 2026
@intelgaoxiong
Copy link
Contributor Author

intelgaoxiong commented Jan 28, 2026

image PR was removed from merge queue due to CI failure in above.

Add to merge queue again to see if the failure will disappear.

@intelgaoxiong intelgaoxiong added this pull request to the merge queue Jan 28, 2026
Merged via the queue into openvinotoolkit:master with commit a37bcb7 Jan 28, 2026
187 checks passed
@intelgaoxiong intelgaoxiong deleted the xiong/transpose_v_for_gpt_oss branch January 28, 2026 04:48
Naseer-010 pushed a commit to Naseer-010/openvino that referenced this pull request Feb 18, 2026
…it#33807)

### Details:
GPT-OSS SDPA has sink input.
There is a pair of Concat and Slice around Softmax.
V tensor transpose could not work for the pattern.

This PR extended V tensor transpose for GPT-OSS pattern to eliminate the
Permutation in compiler.

### Tickets:
 - *[EISW-200448](https://jira.devtools.intel.com/browse/EISW-200448)*

---------

Signed-off-by: intelgaoxiong <xiong.gao@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants