[Membar] Fix non-trivial function smem offsets#9327
Merged
Conversation
lezcano
added a commit
that referenced
this pull request
Jan 29, 2026
Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11
734ec3e to
f373485
Compare
This was referenced Jan 29, 2026
lezcano
added a commit
that referenced
this pull request
Jan 29, 2026
Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11
f373485 to
da20ee5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da20ee55ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
lezcano
added a commit
that referenced
this pull request
Jan 29, 2026
Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11
da20ee5 to
3959840
Compare
lezcano
added a commit
that referenced
this pull request
Jan 29, 2026
Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11
3959840 to
a97cf3f
Compare
lezcano
added a commit
that referenced
this pull request
Feb 5, 2026
Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11
3e0a373 to
a0f063c
Compare
lezcano
added a commit
that referenced
this pull request
Feb 5, 2026
Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11
a0f063c to
0c6dbac
Compare
lezcano
added a commit
that referenced
this pull request
Feb 5, 2026
Stacked PRs: * #9327 * #9318 * #9317 * #9221 * #9220 * __->__#9219 --- --- --- ### [BACKEND] Improve and simplify ReduceOp's lowering We implement a LinearLayout-based `ReduceOp` lowering. This has a number of benefits: - The logic is noticeably simpler as we barely have to implement anything. ConvertLayout and some LL helpers do all the heavy lifting - We get shmem swizzling for free - We sometimes save a shmem round-trip (before we did it unconditionally) - It is now clear that we have a `tmpLl` variable we can carefully choose (we'll do so in a future PR) - It opens the door to returning an arbitrary layout (fusing a `convert_layout` into this op) - It is now really simple to generalise this op to perform cross-cluster reductions, provided that `convert_layout` supports them. - We fix some latent issues the previous implementation had when run on arbitrary linear layouts. We add a funky regression test that used to fail and now passes. - All this while being LOC-neutral! In future PRs we will improve the choice fo `tmpLl` to avoid in many cases the last `convert_layout`, and we will pack the inputs in shmem to be able to vectorize the load/stores for full reductions with multiple inputs. This PR was the result of quite a long (but rather successful) vibe-coding session together with `gpt-5.2-codex`. I found particularly useful being able to emit a ConvertLayout within this lowering rather than having to call the lowering of the function manually. This simplifies the code quite a bit and I would have struggled to convince MLIR to do so myself.
lezcano
added a commit
that referenced
this pull request
Feb 5, 2026
Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11
0c6dbac to
01effe6
Compare
lezcano
added a commit
that referenced
this pull request
Feb 6, 2026
Stacked PRs: * #9327 * #9318 * #9317 * #9221 * __->__#9220 --- --- --- ### [BACKEND] Perform tree reductions on in-thread values We generate ternary trees for suitable integer ops and binary trees for everything else. We manually generate `{add,mul}.{f16,f32}x2` ops. This brings a speed-up to some gluon attention kernels.
lezcano
added a commit
that referenced
this pull request
Feb 6, 2026
Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11
01effe6 to
ce31a25
Compare
lezcano
added a commit
that referenced
this pull request
Feb 6, 2026
Stacked PRs: * #9327 * #9318 * #9317 * __->__#9221 --- --- --- ### [BACKEND] Implement support for cross-CTA tt.reduce The title of this PR is a bit of a lie. Even though the lowering is now implemented to support cross-CTA reductions, it depends on `convert_layout` supporting them, and it doesn't currently support LinearLayouts. We should generalise this one first and then enable it here. We should also emit the correct cross-CTA barrier from `targetInfo` in the case of cross-CTA memory reuse. In this PR, we take the chance to also generalise the lowering to avoid convert layouts whenever possible.
lezcano
added a commit
that referenced
this pull request
Feb 8, 2026
Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11
ce31a25 to
a78c787
Compare
Codex rightly identified that we were not considering the offsets of functions in our membar analysis at #9318 (comment) Codex then went on and fixed it and added a regression test. stack-info: PR: #9327, branch: lezcano/stack/11
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Codex rightly identified that we were not considering the offsets of
functions in our membar analysis at #9318 (comment)
Codex then went on and fixed it and added a regression test.