Skip to content

[Membar] Fix non-trivial function smem offsets#9327

Merged
lezcano merged 1 commit intomainfrom
lezcano/stack/11
Feb 9, 2026
Merged

[Membar] Fix non-trivial function smem offsets#9327
lezcano merged 1 commit intomainfrom
lezcano/stack/11

Conversation

@lezcano
Copy link
Copy Markdown
Contributor

@lezcano lezcano commented Jan 29, 2026

Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

lezcano added a commit that referenced this pull request Jan 29, 2026
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

stack-info: PR: #9327, branch: lezcano/stack/11
@lezcano lezcano changed the title [Membar] Fix non-trivial function smem offsets [Membar] Correctly offset function smem Jan 29, 2026
@lezcano lezcano marked this pull request as draft January 29, 2026 18:56
@lezcano lezcano changed the base branch from lezcano/stack/10 to main January 29, 2026 18:56
lezcano added a commit that referenced this pull request Jan 29, 2026
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

stack-info: PR: #9327, branch: lezcano/stack/11
@lezcano lezcano changed the title [Membar] Correctly offset function smem [Membar] Fix non-trivial function smem offsets Jan 29, 2026
@lezcano lezcano changed the base branch from main to lezcano/stack/10 January 29, 2026 18:56
@lezcano lezcano marked this pull request as ready for review January 29, 2026 18:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da20ee55ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread include/triton/Analysis/Membar.h
@lezcano lezcano marked this pull request as draft January 29, 2026 20:53
@lezcano lezcano changed the base branch from lezcano/stack/10 to main January 29, 2026 20:53
lezcano added a commit that referenced this pull request Jan 29, 2026
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

stack-info: PR: #9327, branch: lezcano/stack/11
@lezcano lezcano changed the base branch from main to lezcano/stack/10 January 29, 2026 20:54
@lezcano lezcano marked this pull request as ready for review January 29, 2026 20:54
@lezcano lezcano marked this pull request as draft January 29, 2026 22:52
@lezcano lezcano changed the base branch from lezcano/stack/10 to main January 29, 2026 22:52
lezcano added a commit that referenced this pull request Jan 29, 2026
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

stack-info: PR: #9327, branch: lezcano/stack/11
@lezcano lezcano changed the base branch from main to lezcano/stack/10 January 29, 2026 22:52
@lezcano lezcano marked this pull request as ready for review January 29, 2026 22:52
lezcano added a commit that referenced this pull request Feb 5, 2026
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

stack-info: PR: #9327, branch: lezcano/stack/11
@lezcano lezcano changed the base branch from main to lezcano/stack/10 February 5, 2026 17:27
@lezcano lezcano marked this pull request as ready for review February 5, 2026 17:27
@lezcano lezcano marked this pull request as draft February 5, 2026 21:03
@lezcano lezcano changed the base branch from lezcano/stack/10 to main February 5, 2026 21:03
lezcano added a commit that referenced this pull request Feb 5, 2026
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

stack-info: PR: #9327, branch: lezcano/stack/11
@lezcano lezcano changed the base branch from main to lezcano/stack/10 February 5, 2026 21:03
@lezcano lezcano marked this pull request as ready for review February 5, 2026 21:03
lezcano added a commit that referenced this pull request Feb 5, 2026
Stacked PRs:
 * #9327
 * #9318
 * #9317
 * #9221
 * #9220
 * __->__#9219


--- --- ---

### [BACKEND] Improve and simplify ReduceOp's lowering


We implement a LinearLayout-based `ReduceOp` lowering. This has a
number of benefits:

- The logic is noticeably simpler as we barely have to implement
anything. ConvertLayout and some LL helpers do all the heavy lifting
- We get shmem swizzling for free
- We sometimes save a shmem round-trip (before we did it
unconditionally)
- It is now clear that we have a `tmpLl` variable we can carefully
choose (we'll do so in a future PR)
- It opens the door to returning an arbitrary layout (fusing a
`convert_layout` into this op)
- It is now really simple to generalise this op to perform cross-cluster
reductions, provided that `convert_layout` supports them.
- We fix some latent issues the previous implementation had when run on
arbitrary linear layouts. We add a funky regression test that used to
fail and now passes.
- All this while being LOC-neutral!

In future PRs we will improve the choice fo `tmpLl` to avoid in many
cases the last `convert_layout`, and we will pack the inputs in shmem to
be able to vectorize the load/stores for full reductions with multiple
inputs.

This PR was the result of quite a long (but rather successful)
vibe-coding session together with `gpt-5.2-codex`. I found particularly
useful being able to emit a ConvertLayout within this lowering rather
than having to call the lowering of the function manually. This
simplifies the code quite a bit and I would have struggled to convince
MLIR to do so myself.
@lezcano lezcano marked this pull request as draft February 5, 2026 21:52
@lezcano lezcano changed the base branch from lezcano/stack/10 to main February 5, 2026 21:52
lezcano added a commit that referenced this pull request Feb 5, 2026
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

stack-info: PR: #9327, branch: lezcano/stack/11
@lezcano lezcano changed the base branch from main to lezcano/stack/10 February 5, 2026 21:52
@lezcano lezcano marked this pull request as ready for review February 5, 2026 21:52
lezcano added a commit that referenced this pull request Feb 6, 2026
Stacked PRs:
 * #9327
 * #9318
 * #9317
 * #9221
 * __->__#9220


--- --- ---

### [BACKEND] Perform tree reductions on in-thread values


We generate ternary trees for suitable integer ops and binary trees for
everything else.

We manually generate `{add,mul}.{f16,f32}x2` ops. This brings a speed-up
to some gluon attention kernels.
@lezcano lezcano marked this pull request as draft February 6, 2026 00:01
@lezcano lezcano changed the base branch from lezcano/stack/10 to main February 6, 2026 00:01
lezcano added a commit that referenced this pull request Feb 6, 2026
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

stack-info: PR: #9327, branch: lezcano/stack/11
@lezcano lezcano changed the base branch from main to lezcano/stack/10 February 6, 2026 00:02
@lezcano lezcano marked this pull request as ready for review February 6, 2026 00:02
lezcano added a commit that referenced this pull request Feb 6, 2026
Stacked PRs:
 * #9327
 * #9318
 * #9317
 * __->__#9221


--- --- ---

### [BACKEND] Implement support for cross-CTA tt.reduce


The title of this PR is a bit of a lie. Even though the lowering is now
implemented to support cross-CTA reductions, it depends on
`convert_layout` supporting them, and it doesn't currently support
LinearLayouts. We should generalise this one first and then enable it
here. We should also emit the correct cross-CTA barrier from
`targetInfo` in the case of cross-CTA memory reuse.

In this PR, we take the chance to also generalise the lowering to avoid
convert layouts whenever possible.
@lezcano lezcano marked this pull request as draft February 8, 2026 09:28
@lezcano lezcano changed the base branch from lezcano/stack/10 to main February 8, 2026 09:28
lezcano added a commit that referenced this pull request Feb 8, 2026
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

stack-info: PR: #9327, branch: lezcano/stack/11
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)

Codex then went on and fixed it and added a regression test.

stack-info: PR: #9327, branch: lezcano/stack/11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants