Skip to content

[Reland][Blackwell] Decide TMEMCopy compatibility for block scales early#10571

Open
masahi wants to merge 9 commits into
triton-lang:mainfrom
masahi:scale-tmem-copy-rewrite-revisit-ws-fix
Open

[Reland][Blackwell] Decide TMEMCopy compatibility for block scales early#10571
masahi wants to merge 9 commits into
triton-lang:mainfrom
masahi:scale-tmem-copy-rewrite-revisit-ws-fix

Conversation

@masahi

@masahi masahi commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

#10530 caused perf regression for WS kernels, since WS passes were doing an exact SMEM-encoding equality check which was broken after tensordesc and local_alloc encodings become nominally different (#nvmma_shared vs #shared_linear) even though they describe the equivalent layout. We ended up placing local_alloc after desc_load into the default partition, which generates unncessary local_load / local_store and synchronizations. This is fixed by the commit c366e57 using LinearLayout equivalence instead.

Confirmed no perf regression on a WS version of python/tutorials/10-block-scaled-matmul.py and python/triton_kernels/bench/bench_mlp.py.

@masahi masahi marked this pull request as ready for review June 11, 2026 02:07
@masahi masahi requested review from lezcano and ptillet as code owners June 11, 2026 02:07
@masahi masahi requested review from ThomasRaoux and removed request for ptillet June 11, 2026 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant