[gpt-oss] attention decode optimizations by sraizada-tt · Pull Request #37190 · tenstorrent/tt-metal

sraizada-tt · 2026-02-05T14:31:39Z

Added padding to o_proj weights and bias to ensure tile-aligned dimensions in CCL operations, avoiding expensive untilize-pad-tilize cycles

https://github.com/tenstorrent/tt-metal/actions/runs/21719380426

Copilot

Pull request overview

This PR introduces optimizations for GPT attention operations, focusing on reducing overhead in tensor parallelism (TP) scenarios through padding for tile alignment and a fused QK RoPE kernel for decode mode.

Changes:

Added padding to o_proj weights and bias to ensure tile-aligned dimensions in CCL operations, avoiding expensive untilize-pad-tilize cycles
Implemented fused QK RoPE operation and fused KV cache update for decode mode when batch size ≤ 32
Added slice operation after allreduce to remove padding and restore original hidden dimensions

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
models/demos/gpt_oss/tt/attention/weights.py	Adds padding logic to o_proj weight and bias for tile alignment in TP operations; updates cache keys to reflect padding
models/demos/gpt_oss/tt/attention/operations.py	Adds slice operation after allreduce to remove padding added to support tile-aligned CCL
models/demos/gpt_oss/tt/attention/decode.py	Implements fused QK RoPE optimization for batch_size ≤ 32; adds reshape logic for padded dimensions before allreduce
models/demos/gpt_oss/tt/attention/init.py	Pre-creates fused transformation matrix for fused QK RoPE to avoid host writes during trace

models/demos/gpt_oss/tt/attention/decode.py

This reverts commit d89d34f.

This reverts commit e1c0a22.

This reverts commit 0773d2b.

Added padding to o_proj weights and bias to ensure tile-aligned dimensions in CCL operations, avoiding expensive untilize-pad-tilize cycles https://github.com/tenstorrent/tt-metal/actions/runs/21719380426

This reverts commit e1c0a22.

This reverts commit 0773d2b.

sraizada-tt added 2 commits February 5, 2026 11:58

pad attn weights

d4f0fdd

use fused attention ops

d89d34f

sraizada-tt requested a review from uaydonat as a code owner February 5, 2026 14:31

Copilot AI review requested due to automatic review settings February 5, 2026 14:31

sraizada-tt requested review from a team, handrewsTT and mtairum as code owners February 5, 2026 14:31

Copilot started reviewing on behalf of sraizada-tt February 5, 2026 14:32 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

models/demos/gpt_oss/tt/attention/decode.py Outdated Show resolved Hide resolved

models/demos/gpt_oss/tt/attention/decode.py Outdated Show resolved Hide resolved

models/demos/gpt_oss/tt/attention/decode.py Outdated Show resolved Hide resolved

sraizada-tt changed the title ~~Gpt attn optimizations~~ [gpt-oss] attention decode optimizations Feb 5, 2026

Revert "use fused attention ops"

1040c01

This reverts commit d89d34f.

handrewsTT approved these changes Feb 6, 2026

View reviewed changes

sraizada-tt added this pull request to the merge queue Feb 6, 2026

Merged via the queue into main with commit e1c0a22 Feb 6, 2026
96 of 104 checks passed

sraizada-tt deleted the gpt-attn-optimizations branch February 6, 2026 09:50

handrewsTT added a commit that referenced this pull request Feb 7, 2026

Revert "[gpt-oss] attention decode optimizations (#37190)"

0773d2b

This reverts commit e1c0a22.

handrewsTT added a commit that referenced this pull request Feb 9, 2026

Revert "Revert "[gpt-oss] attention decode optimizations (#37190)""

29edf72

This reverts commit 0773d2b.

handrewsTT added a commit that referenced this pull request Feb 16, 2026

Revert "[gpt-oss] attention decode optimizations (#37190)"

b32c24f

This reverts commit e1c0a22.

handrewsTT added a commit that referenced this pull request Feb 16, 2026

Revert "Revert "[gpt-oss] attention decode optimizations (#37190)""

5eb6f48

This reverts commit 0773d2b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gpt-oss] attention decode optimizations#37190

[gpt-oss] attention decode optimizations#37190
sraizada-tt merged 3 commits intomainfrom
gpt-attn-optimizations

sraizada-tt commented Feb 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sraizada-tt commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sraizada-tt commented Feb 5, 2026 •

edited

Loading