[TTL] DST optimization: separate pack_tile loop by brnorris03 · Pull Request #229 · tenstorrent/tt-lang

brnorris03 · 2026-01-13T04:19:28Z

Problem

Two related bugs prevent multi-tile (e.g., 2x2) CB shapes from working correctly:

Pack_tile interleaving bug (compute side): When lowering ttl.compute for multitile blocks, pack_tile ops are emitted interleaved with math ops inside the tile loops. This is incorrect because pack_tile reads from DST registers that should contain all computed tiles first.
Data movement threads are not copying the correct blocks that compute expects. Tensor slices created for multi-tile CBs don't match the CB's block shape. (closes [ttl] Incorrect ttl.copy lowering generates individual tile transfers instead of block transfers #138)

What changed

Split ttl.compute lowering into two separate loop nests (compute phase + pack phase)
Added tile_regs_commit and tile_regs_wait synchronization between loops
Implemented BodyPhases categorization to separate compute ops from pack ops
Dynamic DST index computation for outputs only (inputs reuse DST registers)

1. Allocate DST registers for inputs (block arguments) based on liveness 2. Allocate DST registers for outputs starting after inputs This ensures outputs get indices >= inputs_footprint, so they map to DST[inputs_footprint + tile_index] in multi-tile compute cases.

… unary ops

zoecarver · 2026-01-13T17:39:35Z

include/ttlang/Dialect/TTL/TTLElementwiseOps.def

-// Special binary ops with non-standard lowering
-// Max uses 2-arg in-place form (TTLTileMaxToTTKernel template)
-TTL_BINARY_TILE_OP_SPECIAL(Max, MaxTileOp, BinaryMaxTileInitOp, BinaryMaxTileOp)
+TTL_BINARY_TILE_OP(Max, MaxTileOp, BinaryMaxTileInitOp, BinaryMaxTileOp)


Is this intentional? Seems like at least should be it's own PR with tests.

Certainly -- I think it should have been done back when switching to using the binary op (since max became no longer special at that point).

brnorris03 added 12 commits January 12, 2026 18:56

remove pure trait from TTL_TileOp

4f2ceca

initial implementation (mlir tests updated)

f08ceed

add dynamic dst index for multitile

b595948

update tests

ca06589

update doc

c1fe0e3

check dst capacity for multitile compute

4c71a10

update tests

86df5f5

fix DST analysis for unary ops; add tile op unary and binary traits

8e475e4

fix single-tile pack dst registers; correctly reuse dst registers for…

20ab590

… unary ops

update mlir lit tests

58cb829

update lit tests

75fb025

zoecarver reviewed Jan 13, 2026

View reviewed changes

brnorris03 added 4 commits January 13, 2026 09:42

update doc

7a17c0a

use op instead of adaptor to ensure dst_index attr is available

8c2ccf6

get tensor stride from layout encoding

70913a6

fix multi-block separate pack_tile loop use of dynamic dst indices

21c94f6

brnorris03 changed the title ~~[TTL] Fix multitile block lowering~~ [TTL] DST optimization: separate pack_tile loop Jan 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TTL] DST optimization: separate pack_tile loop#229

[TTL] DST optimization: separate pack_tile loop#229
brnorris03 wants to merge 16 commits intomainfrom
bnorris/fix-multitile-mlir

brnorris03 commented Jan 13, 2026 •

edited

Loading

Uh oh!

zoecarver Jan 13, 2026

Uh oh!

brnorris03 Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brnorris03 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What changed

Uh oh!

zoecarver Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

brnorris03 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brnorris03 commented Jan 13, 2026 •

edited

Loading