[ttl] Make TRID DMA wait lowering selectable (default: global barriers) by shutovilyaep · Pull Request #267 · tenstorrent/tt-lang

shutovilyaep · 2026-01-23T14:17:01Z

What?

Adds a pass option to choose how ttl.copy / ttl.wait are lowered to TTKernel DMA ops:

Default (unchanged): emit global ttkernel.noc_async_{read,write}_barrier() and no TRID setup.
Opt-in (use-trid-barriers=1): emit TRID-aware barriers ttkernel.noc_async_{read,write}_barrier_with_trid(trid, noc) and *_set_trid in the copy lowering. Each copy is assigned a TRID (0..15); the transfer handle is lowered to an i32 TRID value and waits emit barriers keyed by that TRID. A post-conversion cleanup (DeduplicateConsecutiveTridBarriers) merges consecutive TRID barriers that target the same TRID and NOC.

The option is plumbed through convert-ttl-to-ttkernel and ttl-to-ttkernel-pipeline. TRID-focused lit tests explicitly enable use-trid-barriers=1; Python lit tests keep using the default and continue to validate global barrier behavior.

Why?

Issue: tenstorrent/tt-lang#87 — lower ttl.copy/ttl.wait to TRID-specific TTKernel noc ops. Hardware supports TRID-scoped barriers so a wait can target only the transfers issued by a specific copy; the default path remains global barriers for compatibility.
Reviewer request: implement as a pass option so callers can choose the lowering while the default stays the same as on main.
CI: Python lit tests expect global noc_async_{read,write}_barrier(); switching the default to TRID barriers broke them. This PR keeps the default as global barriers and gates TRID behavior behind the option.

How?

Pass: convert-ttl-to-ttkernel gains option use-trid-barriers (default false). When false, copy/wait lowering emits global barriers only; when true, copy lowering allocates a TRID per copy, emits noc_async_*_set_trid before the tile read/write loop, and replaces the copy result with the TRID (i32); wait lowering emits noc_async_*_barrier_with_trid(trid, noc).
Pipeline: ttl-to-ttkernel-pipeline accepts use-trid-barriers and forwards it to the pass.
Cleanup: TTKernel cleanup patterns add DeduplicateConsecutiveTridBarriers for TRID barrier ops so consecutive barriers with the same TRID/NOC are merged.
Tests: TRID conversion tests (trid_barriers.mlir, dma_single_core.mlir, loopback_dram_copy.mlir) and relevant TTL-to-Cpp tests run with use-trid-barriers=1. trid_barriers.mlir uses tile-grid tensor types (tensor<1x1x!ttcore.tile<32x32,f32>>) so copy lowering's getTileGridShapeFromValue is valid (it expects TileType element type). Python lit tests are unchanged and run with the default (global barriers).

How to Test?

# Default: global barriers (same as main)
ttlang-opt --convert-ttl-to-ttkernel %s | FileCheck ...

# TRID mode
ttlang-opt --convert-ttl-to-ttkernel="use-trid-barriers=1" %s | FileCheck ...

# Pipeline
ttlang-opt --ttl-to-ttkernel-pipeline="use-trid-barriers=1" %s -o %t.mlir

llvm-lit test/ttlang/Conversion/TTLToTTKernel/ — conversion tests (default + TRID where used).
llvm-lit test/ttlang/Translate/TTLToCpp/ — translate tests that use the pipeline with use-trid-barriers=1.
llvm-lit test/python/ — Python lit tests (default lowering, no change).

Checklist

Self-reviewed (style, logic)
Added/updated tests; TRID tests gated behind use-trid-barriers=1, default path unchanged
PR is focused (pass option + pipeline plumbing + cleanup + tests)
Default behavior matches main (global barriers only)
No scope creep (pass/pipeline option and related tests only)

shutovilyaep · 2026-01-26T16:19:00Z

Comment received from @brnorris03:

It would be great if you can implement this as a pass option so we can choose between different lowerings (there will probably be more optimizations later), keeping the default the same as what's in main now.

Add a convert-ttl-to-ttkernel pass option (use-trid-barriers) and plumb it through ttl-to-ttkernel-pipeline so callers can choose between legacy global barriers and TRID-aware barriers. Keep the default on legacy global barriers to match mainline codegen. Also group ttlang-translate static archives on ELF linkers to avoid link-order dependent failures.

…getTileGridShapeFromValue The test used tensor<32x32xf32> (element type f32). Copy lowering calls getTileGridShapeFromValue() which asserts the tensor has TileType element type. Use tensor<1x1x!ttcore.tile<32x32,f32>> like other DMA tests to fix CI crash (SIGABRT) in TTLToTTKernel conversion. Attempt to fix CI failure in PR tenstorrent#267 / #1222.

Update TRID-focused conversion and translation lit tests to explicitly enable TRID barrier lowering so the default (global barrier) path remains stable.

shutovilyaep · 2026-01-30T13:39:54Z

/codeowners ping

brnorris03

Looks great, thank you! The only more significant issue I see is the lack of runtime tests, I think the best approach for now is to parameterize (some of) the test/me2e tests with the new option, what do you think? I can help with more concrete suggestions on how to do that if you agree.dd

Some general questions, mainly stemming from my lack of deep knowledge of the low-level semantics of the metal ops.

Is the TRID value semantically meaningful, or just needs to be unique per copy? I am guessing order doesn't matter? As defined the generated TRIDs could be nondeterministic (but correctly unique) due to parallel pattern application.
With the new ops requiring explicit NOC, I see that NOC 0 is always used -- is this appropriate or something that needs to be generalized (perhaps later PR)?

Again, thank you for contributing this!!

brnorris03 · 2026-01-30T15:02:55Z

lib/Dialect/TTKernel/Transforms/TTKernelCleanupPatterns.cpp

  patterns.add<DeduplicateConsecutiveBarriers<NocAsyncReadBarrierOp>>(
      patterns.getContext());
  patterns.add<DeduplicateConsecutiveBarriers<NocAsyncWriteBarrierOp>>(
      patterns.getContext());
+  patterns
+      .add<DeduplicateConsecutiveTridBarriers<NocAsyncReadBarrierWithTridOp>>(
+          patterns.getContext());
+  patterns
+      .add<DeduplicateConsecutiveTridBarriers<NocAsyncWriteBarrierWithTridOp>>(
+          patterns.getContext());


Probably doesn't matter that much, but could make the relevant patterns conditional on the option that enables TRID?

brnorris03 · 2026-01-30T15:04:53Z

include/ttlang/Dialect/TTL/Passes.td


+  let options = [
+    Option<"useTridBarriers", "use-trid-barriers", "bool", "false",
+           "Use TRID-aware DMA waits (barrier_with_trid) instead of global barriers.">,
+  ];


Thank you for adding the option! Not asking you to do this in the PR but it would be interesting to profile the different approaches with a small set of representative benchmarks and set the default based on that (perhaps add a short TODO to that effect here if you agree?).

brnorris03 · 2026-01-30T15:23:25Z

lib/Dialect/TTL/Transforms/ConvertTTLToTTKernel.cpp

+class TridAllocator {
+public:
+  uint32_t allocateTrid() { return nextTrid++ & 0xF; }
+
+private:
+  uint32_t nextTrid = 0;
+};


There is wrapping at 16 TRIDs, but what happens if the 0th, etc are still not completed at that point? Is there any way to check/detect TRID overflow? Maybe add a TODO for future improvement to make this more robust.

shutovilyaep mentioned this pull request Jan 23, 2026

[ttl] Lower ttl.copy ttl.wait to TRID-specific ttkernel noc ops #87

Open

shutovilyaep marked this pull request as ready for review January 26, 2026 13:47

shutovilyaep requested a review from a team as a code owner January 26, 2026 13:47

shutovilyaep force-pushed the feat/lower_copy_wait branch from ca8a242 to d754a28 Compare January 28, 2026 13:10

shutovilyaep force-pushed the feat/lower_copy_wait branch from d754a28 to 42dbe50 Compare January 30, 2026 11:59

shutovilyaep changed the title ~~TTL: Lower async DMA waits to TRID barriers~~ [ttl] Make TRID DMA wait lowering selectable (default: global barriers) Jan 30, 2026

[test] Gate TRID lit checks behind use-trid-barriers

cc01d5b

Update TRID-focused conversion and translation lit tests to explicitly enable TRID barrier lowering so the default (global barrier) path remains stable.

shutovilyaep force-pushed the feat/lower_copy_wait branch from fbe3c1d to cc01d5b Compare January 30, 2026 13:36

brnorris03 reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ttl] Make TRID DMA wait lowering selectable (default: global barriers)#267

[ttl] Make TRID DMA wait lowering selectable (default: global barriers)#267
shutovilyaep wants to merge 2 commits intotenstorrent:mainfrom
shutovilyaep:feat/lower_copy_wait

shutovilyaep commented Jan 23, 2026 •

edited

Loading

Uh oh!

shutovilyaep commented Jan 26, 2026

Uh oh!

shutovilyaep commented Jan 30, 2026

Uh oh!

brnorris03 left a comment

Uh oh!

brnorris03 Jan 30, 2026

Uh oh!

brnorris03 Jan 30, 2026

Uh oh!

brnorris03 Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shutovilyaep commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

How?

How to Test?

Checklist

Uh oh!

shutovilyaep commented Jan 26, 2026

Uh oh!

shutovilyaep commented Jan 30, 2026

Uh oh!

brnorris03 left a comment

Choose a reason for hiding this comment

Uh oh!

brnorris03 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

brnorris03 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

brnorris03 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shutovilyaep commented Jan 23, 2026 •

edited

Loading