Skip to content

[Consan] Support CLC#10052

Open
lezcano wants to merge 5 commits intomainfrom
clc_consan2
Open

[Consan] Support CLC#10052
lezcano wants to merge 5 commits intomainfrom
clc_consan2

Conversation

@lezcano
Copy link
Copy Markdown
Contributor

@lezcano lezcano commented Apr 16, 2026

CLC gets its own partition, running over threads 48-63.

We model CLC as we model TMA writes, via a Barrier::EffectWrites.
The idea of this mode is that we link all the writes on the op to the
barrier. We also annotate in the table barrierWriteRecipients which
CTAs will become visible once we wait on the associated barrier.

We note something interesting and document it.
BarrierTrackingMode::Frontier should be used when we have a
commit/arrive/expect op that affects anything in flight before it.
Instead, we use BarrierTrackingMode::EffectWrites when the PTX op
accepts a barrier so the barrier just signals the completion of the op's
particular write.

The other point we add is a flag bool diagonalEffectRecipientCTAs.
This differentiates the behaviour between TMA, where after waiting on
the barrier you see all the writes from all the CTAs in the multicas
group, vs. the diagonal version, as in CLC, where waiting on CTAi just
makes the thread see the CTAi memory.

lezcano added 3 commits April 16, 2026 14:40
It doesn't make sense currently to have CLC in a multicta context
without multicast. As such, we hide this flag and we infer it
automatically.

The reason why it doesn't make sense it's because in multicta just one
CTA is allowed to request to cancel. In CUDA you can imagine patterns
like doing CLC without multicast and then share the result from one CTA
to all the others manually. We don't allow that in Gluon.
Smelly bits:
We execute CLC in the TMA partition to avoid having to create a new
partition for CLC. I think we should create a different partition for
CLC but I wanted to have @pawelszczerbuk's approval before doing it.

We model CLC as we model TMA writes, via a Barrier::EffectWrites.
The idea of this mode is that we link all the writes on the op to the
barrier. We also annotate in the table `barrierWriteRecipients` which
CTAs will become visible once we wait on the associated barrier.

We note something interesting and document it.
`BarrierTrackingMode::Frontier` should be used when we have a
commit/arrive/expect op that affects anything in flight before it.
Instead, we use `BarrierTrackingMode::EffectWrites` when the PTX op
accepts a barrier so the barrier just signals the completion of the op's
particular write.

The other point we add is a flag `bool diagonalEffectRecipientCTAs`.
This differentiates the behaviour between TMA, where after waiting on
the barrier you see all the writes from all the CTAs in the multicas
group, vs. the diagonal version, as in CLC, where waiting on CTAi just
makes the thread see the CTAi memory.
@lezcano lezcano requested a review from pawelszczerbuk April 16, 2026 13:20
Base automatically changed from clc_consan to main April 16, 2026 14:57
Comment thread include/triton/Dialect/TritonInstrument/IR/TritonInstrument.md Outdated
Copy link
Copy Markdown
Contributor

@pawelszczerbuk pawelszczerbuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Small nit in the comments

@lezcano lezcano enabled auto-merge (squash) April 16, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants