[Membar] Fix non-trivial function smem offsets by lezcano · Pull Request #9327 · triton-lang/triton

lezcano · 2026-01-29T12:53:45Z

Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da20ee55ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11

Stacked PRs: * #9327 * #9318 * #9317 * #9221 * #9220 * __->__#9219 --- --- --- ### [BACKEND] Improve and simplify ReduceOp's lowering We implement a LinearLayout-based `ReduceOp` lowering. This has a number of benefits: - The logic is noticeably simpler as we barely have to implement anything. ConvertLayout and some LL helpers do all the heavy lifting - We get shmem swizzling for free - We sometimes save a shmem round-trip (before we did it unconditionally) - It is now clear that we have a `tmpLl` variable we can carefully choose (we'll do so in a future PR) - It opens the door to returning an arbitrary layout (fusing a `convert_layout` into this op) - It is now really simple to generalise this op to perform cross-cluster reductions, provided that `convert_layout` supports them. - We fix some latent issues the previous implementation had when run on arbitrary linear layouts. We add a funky regression test that used to fail and now passes. - All this while being LOC-neutral! In future PRs we will improve the choice fo `tmpLl` to avoid in many cases the last `convert_layout`, and we will pack the inputs in shmem to be able to vectorize the load/stores for full reductions with multiple inputs. This PR was the result of quite a long (but rather successful) vibe-coding session together with `gpt-5.2-codex`. I found particularly useful being able to emit a ConvertLayout within this lowering rather than having to call the lowering of the function manually. This simplifies the code quite a bit and I would have struggled to convince MLIR to do so myself.

Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11

Stacked PRs: * #9327 * #9318 * #9317 * #9221 * __->__#9220 --- --- --- ### [BACKEND] Perform tree reductions on in-thread values We generate ternary trees for suitable integer ops and binary trees for everything else. We manually generate `{add,mul}.{f16,f32}x2` ops. This brings a speed-up to some gluon attention kernels.

Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11

Stacked PRs: * #9327 * #9318 * #9317 * __->__#9221 --- --- --- ### [BACKEND] Implement support for cross-CTA tt.reduce The title of this PR is a bit of a lie. Even though the lowering is now implemented to support cross-CTA reductions, it depends on `convert_layout` supporting them, and it doesn't currently support LinearLayouts. We should generalise this one first and then enable it here. We should also emit the correct cross-CTA barrier from `targetInfo` in the case of cross-CTA memory reuse. In this PR, we take the chance to also generalise the lowering to avoid convert layouts whenever possible.

Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11

lezcano requested review from Jokeren and ptillet as code owners January 29, 2026 12:53

lezcano force-pushed the lezcano/stack/11 branch from 734ec3e to f373485 Compare January 29, 2026 12:53

lezcano changed the title ~~[Membar] Fix non-trivial function smem offsets~~ [Membar] Correctly offset function smem Jan 29, 2026

lezcano marked this pull request as draft January 29, 2026 18:56

lezcano changed the base branch from lezcano/stack/10 to main January 29, 2026 18:56

lezcano force-pushed the lezcano/stack/11 branch from f373485 to da20ee5 Compare January 29, 2026 18:56

lezcano changed the title ~~[Membar] Correctly offset function smem~~ [Membar] Fix non-trivial function smem offsets Jan 29, 2026

lezcano changed the base branch from main to lezcano/stack/10 January 29, 2026 18:56

lezcano marked this pull request as ready for review January 29, 2026 18:56

chatgpt-codex-connector Bot reviewed Jan 29, 2026

View reviewed changes

Comment thread include/triton/Analysis/Membar.h

lezcano marked this pull request as draft January 29, 2026 20:53

lezcano changed the base branch from lezcano/stack/10 to main January 29, 2026 20:53

lezcano force-pushed the lezcano/stack/11 branch from da20ee5 to 3959840 Compare January 29, 2026 20:54

lezcano changed the base branch from main to lezcano/stack/10 January 29, 2026 20:54

lezcano marked this pull request as ready for review January 29, 2026 20:54

lezcano marked this pull request as draft January 29, 2026 22:52

lezcano changed the base branch from lezcano/stack/10 to main January 29, 2026 22:52

lezcano force-pushed the lezcano/stack/11 branch from 3959840 to a97cf3f Compare January 29, 2026 22:52

lezcano changed the base branch from main to lezcano/stack/10 January 29, 2026 22:52

lezcano marked this pull request as ready for review January 29, 2026 22:52

lezcano force-pushed the lezcano/stack/11 branch from 3e0a373 to a0f063c Compare February 5, 2026 17:27

lezcano changed the base branch from main to lezcano/stack/10 February 5, 2026 17:27

lezcano marked this pull request as ready for review February 5, 2026 17:27

lezcano marked this pull request as draft February 5, 2026 21:03

lezcano changed the base branch from lezcano/stack/10 to main February 5, 2026 21:03

lezcano force-pushed the lezcano/stack/11 branch from a0f063c to 0c6dbac Compare February 5, 2026 21:03

lezcano changed the base branch from main to lezcano/stack/10 February 5, 2026 21:03

lezcano marked this pull request as ready for review February 5, 2026 21:03

lezcano marked this pull request as draft February 5, 2026 21:52

lezcano changed the base branch from lezcano/stack/10 to main February 5, 2026 21:52

lezcano force-pushed the lezcano/stack/11 branch from 0c6dbac to 01effe6 Compare February 5, 2026 21:52

lezcano changed the base branch from main to lezcano/stack/10 February 5, 2026 21:52

lezcano marked this pull request as ready for review February 5, 2026 21:52

lezcano marked this pull request as draft February 6, 2026 00:01

lezcano changed the base branch from lezcano/stack/10 to main February 6, 2026 00:01

lezcano force-pushed the lezcano/stack/11 branch from 01effe6 to ce31a25 Compare February 6, 2026 00:01

lezcano changed the base branch from main to lezcano/stack/10 February 6, 2026 00:02

lezcano marked this pull request as ready for review February 6, 2026 00:02

lezcano marked this pull request as draft February 8, 2026 09:28

lezcano changed the base branch from lezcano/stack/10 to main February 8, 2026 09:28

lezcano force-pushed the lezcano/stack/11 branch from ce31a25 to a78c787 Compare February 8, 2026 09:28

[Membar] Fix non-trivial function smem offsets

df2616d

Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Membar] Fix non-trivial function smem offsets#9327

[Membar] Fix non-trivial function smem offsets#9327
lezcano merged 1 commit intomainfrom
lezcano/stack/11

lezcano commented Jan 29, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lezcano commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lezcano commented Jan 29, 2026 •

edited

Loading