[D2M] Implement graph coloring-based DST register allocation #5879

brnorris03 · 2025-11-16T00:47:29Z

Ticket

N/A

Summary

New pass d2m-insert-dst-register-gc implements graph coloring register allocation for DST (destination registers). Reduces DST usage by reusing slices for non-interfering values.
Note that while this patch appears large, more than 80% of it is tests.

Motivation

Current InsertDstRegisterAccess uses sequential allocation - each value gets a new DST slice. Graph coloring identifies values with non-overlapping lifetimes and assigns them to the same slice, reducing DST pressure.

Example: Diamond pattern (4 values, 2 interference edges) needs only 2 slices instead of 4 (50% reduction).

     dst0
    /    \
  dst1   dst2
    \    /
     dst3

Interference edges: dst0 - dst1, dst1 - dst2, dst2 - dst3. No edges between dst0 - dst2 or dst1 - dst3 (early releases enable reuse). Graph coloring assigns: dst0 and dst2 → slice 0, dst1 and dst3 → slice 1.

Changes

Overview of design

The design includes the defintion of simple abstract interfaces for DstAnalysis and ColoringStrategy which enable multiple implementations outside of any specific pass defiition.

                          DstAnalysis
                             ^   ^
                   inherits /     \ inherits
                           /       \
               DstAnalysisBasic   DstAnalysisGraphColoring
                                      | composes
                                      v
                              ColoringStrategy
                                 ^       ^
                     implements /         \ implements
                               /           \
               ChaitinBriggsColoring   GreedyColoring
                              |            |
                              | uses       | uses
                              v            v
                           InterferenceGraph utils

The analysis and transformation passes use the above strategies:

D2MDstRequirementAnalysisPass
    ├─ reads option 'strategy'
    ├─ instantiates DstAnalysis implementation
    │     ├─ basic  -> DstAnalysisBasic
    │     ├─ greedy -> DstAnalysisGraphColoring + GreedyColoring
    │     └─ graph-coloring -> DstAnalysisGraphColoring + ChaitinBriggsColoring
    └─ invokes analysis->analyze(funcOp) to report required slices

D2MInsertDstRegisterGCPass
    ├─ invokes DstCapacityAnalysis(funcOp) for capacity limit
    ├─ instantiates runtime strategy based on option
    │     ├─ basic  -> DstAnalysisBasic (equivalent to D2MInsertDstRegisterAccess)
    │     ├─ greedy -> DstAnalysisGraphColoring + GreedyColoring
    │     └─ graph-coloring -> DstAnalysisGraphColoring + ChaitinBriggsColoring
    ├─ performs pre-check via selected DstAnalysis->analyze(funcOp)
    └─ on success, uses DstAnalysisGraphColoring + chosen ColoringStrategy
       to allocate slices and rewrite operations

New Operations

d2m.release_dst - Marks end of DST value lifetime:

%dst = d2m.acquire_dst() : memref<1x!ttcore.tile<32x32, f32>, #dst>
// ... use dst ...
d2m.release_dst %dst : memref<1x!ttcore.tile<32x32, f32>, #dst>

Enables precise liveness analysis and early reuse. Verifiers ensure pairing with acquire_dst and prevent use-after-release.

Pass Implementation

D2MInsertDstRegisterGCPass (lib/Dialect/D2M/Transforms/InsertDstRegisterGC.cpp):

Collects affine.load operations for DST-consuming ops (via OperandLoadStoreRegisterOpInterface)
Builds interference graph using SSA liveness analysis
Applies graph coloring to assign DST slices
Generates L1↔DST data copy loops
Inserts acquire_dst/release_dst pairs

Coloring Strategies

Two algorithms implemented in GraphColoringStrategy.{h,cpp}:

Chaitin-Briggs (default): Simplification-based coloring
Greedy: Sequential first-available-color assignment

Strategy selection via pass option:

ttmlir-opt --d2m-insert-dst-register-gc="coloring-strategy=greedy" input.mlir

Pipeline Integration

Not yet integrated into main pipeline. Current d2m-insert-dst-register-access performs multiple unrelated transformations (most notably linalg to affine conversion). These need refactoring into separate passes that would be applied before this pass.

Future Work

Keep accumulators in DST during reduction loops: For operations like matmul that accumulate results (e.g., C += A * B in a loop), the current pass loads the accumulator from L1 memory and writes it back on every iteration. This is inefficient. A better approach (used by the existing D2MInsertDSTAccess pass) keeps the accumulator in fast DST registers throughout the entire reduction loop and only writes back to L1 once at the end. This requires splitting the loop into three phases: (1) load initial values into DST, (2) compute with accumulator staying in DST, (3) write final results back to L1. This optimization could be implemented as a separate pass that runs before register allocation. (high priority)
Precise loop dependence analysis using MLIR affine utilities (medium priority)
Spill code generation for insufficient DST capacity (high priority)
PBQP-based allocation for cost modeling (may be overkill)
Move coalescing to eliminate redundant copies (medium priority)
Performance benchmarking on real workloads (medium priority)

vmilosevic

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

lib/Dialect/D2M/Transforms/GraphColoringStrategy.cpp

lib/Dialect/D2M/IR/D2MOps.cpp

codecov-commenter · 2025-11-16T01:02:48Z

Codecov Report

❌ Patch coverage is 80.78406% with 299 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.77%. Comparing base (becbe0a) to head (8a9827c).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
lib/Dialect/D2M/Utils/TileMatmulUtils.cpp	18.77%	173 Missing ⚠️
lib/Dialect/D2M/Transforms/InsertDstRegisterGC.cpp	79.57%	106 Missing ⚠️
lib/Dialect/D2M/Analysis/DstAnalysisPass.cpp	78.12%	7 Missing ⚠️
lib/Dialect/D2M/IR/D2MOps.cpp	87.87%	4 Missing ⚠️
...b/Dialect/D2M/Transforms/GraphColoringStrategy.cpp	98.17%	3 Missing ⚠️
lib/Conversion/D2MToTTKernel/D2MToTTKernel.cpp	94.59%	2 Missing ⚠️
lib/Dialect/D2M/Analysis/DstAnalysisBasic.cpp	94.59%	2 Missing ⚠️
lib/Dialect/D2M/Analysis/DstCapacityAnalysis.cpp	95.65%	1 Missing ⚠️
...Dialect/D2M/Transforms/InsertDstRegisterAccess.cpp	97.36%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5879      +/-   ##
==========================================
+ Coverage   69.34%   69.77%   +0.42%     
==========================================
  Files         334      347      +13     
  Lines       50999    52483    +1484     
==========================================
+ Hits        35367    36621    +1254     
- Misses      15632    15862     +230

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vmilosevic

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

test/unittests/D2M/TestDstCapacityAnalysis.cpp

…erAccess pass so that it can be used prior to other DST allocation passes

vmilosevic

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

include/ttmlir/Dialect/D2M/Analysis/DstAnalysis.h

vmilosevic

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

include/ttmlir/Dialect/D2M/Analysis/DstAnalysis.h

vmilosevic

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

include/ttmlir/Dialect/D2M/Analysis/DstAnalysis.h

…g-to-affine-from-dst-pass

…stRegister tests (all pass)

…eMatmul is false

…oring. - InsertDstRegisterAccess.cpp: Emit diagnostic when encountering unconverted linalg.generic operations, per LLVM guidelines on error reporting. - LinalgToAffine.cpp: Make d2m.linalg_root attribute conditional to avoid leaking internal pass state into IR. - Passes.td: Add mark-root-loops option to control attribute emission. - TTMetalPipelines.cpp: Configure pipeline to emit markers for pass coordination. - Tests: Add diagnostic verification test and comprehensive option combination coverage. Issues Addressed - Silent pass failures violate error reporting guidelines in LLVM Programmer's Manual § "Writing an LLVM Pass". - Internal marker attributes in IR violate canonical form requirements in LLVM Programmer's Manual § "The PassManager".

Root Cause: The D2MAllocate pass performed liveness analysis BEFORE inserting stream operations, so the custom liveness extension couldn't account for newly inserted streams that reference existing buffers. Solution Implemented: Split D2MAllocate into two phases: Phase 1: Allocation and stream insertion (no deallocs) Phase 2: Re-run liveness analysis on complete IR, then insert deallocs

…g-to-affine-from-dst-pass

…y loops in a future pr

### Ticket #5057 #5931 ### Problem description as descrbied in issue ### What's changed - use `get_current_system_desc` , passing input tensor device if available - on invocation, the `JitFunction` will check `SYSTEM_DESC_PATH` env var, if not set, it will run `_query_and_save_system_desc` to get it's own system desc instead of just error'ing out. - helper `_get_dispatch_core_type` and `_get_cluster_type` to get `DispatchCoreType` based off what the cluster is - in `test/ttnn-jit/conftest.py` set `DispatchCoreType` to `WORKER` for p150, if not set `ETH` for device init ### Checklist - [ ] New/Existing tests provide coverage for changes

@mvasiljevicTT

#3915 and #5899 introduce specific `permute . reshape . permute -> reshape` patterns with hard-coded values of permutation. This PR generalizes the approach such that: ``` This pattern fuses the sequence: PermuteOp -> ReshapeOp -> PermuteOp into a single ReshapeOp when the following conditions are met: Original shape: [A_1, A_2,.., A_k] permute(p_1, p_2,..., p_k) -> [A_1', A_2',.., A_k'] reshape([A_1', A_2',.., A_k']) -> [B_1, B_2,.., B_k] permute(p_1', p_2',..., p_k') -> [B_1', B_2',.., B_k'] where: - k is the rank of the input tensor; - (p_1, p_2,..., p_k) and (p_1', p_2',..., p_k') are permutations of {0, 1, ..., k-1}; - B_i = (A_r', A_r+1',..., A_r+l') where 1 <= r <= k, l >= 0 and r + l <= k, for each 1 <= i <= k; - flatten([B_1', B_2',.., B_k']) = [A_1, A_2,.., A_k]. The result of this sequence is identical to the following reshape: reshape([A_1, A_2,.., A_k]) -> [B_1', B_2',.., B_k'] ``` Special credits to @mvasiljevicTT for scrutinizing an algorithm during initial development, which revealed some edge cases that weren't previously covered.

vmilosevic

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

lib/Dialect/D2M/Transforms/InsertDstRegisterGC.cpp

vmilosevic · 2025-11-25T22:28:22Z

lib/Dialect/D2M/Transforms/InsertDstRegisterGC.cpp

+  llvm::SmallVector<int64_t, 4> tripCounts; // Trip count for each loop.
+  int64_t totalIterations;                  // Product of all trip counts.
+
+  LoopContext() : totalIterations(1) {}


⚠️ cppcoreguidelines-use-default-member-init ⚠️
use default member initializer for totalIterations

Suggested change

LoopContext() : totalIterations(1) {}

LoopContext() : {}

…mlir into bnorris/dst-gc-allocator

…ry op test

…mlir into bnorris/dst-gc-allocator

…release

vmilosevic reviewed Nov 16, 2025

View reviewed changes

lib/Dialect/D2M/Transforms/GraphColoringStrategy.cpp Outdated Show resolved Hide resolved

lib/Dialect/D2M/IR/D2MOps.cpp Outdated Show resolved Hide resolved

brnorris03 force-pushed the bnorris/dst-gc-allocator branch 7 times, most recently from 47ae5d9 to 4fd7dc6 Compare November 16, 2025 07:32

vmilosevic reviewed Nov 16, 2025

View reviewed changes

test/unittests/D2M/TestDstCapacityAnalysis.cpp Outdated Show resolved Hide resolved

extract the linalg-to-affine transformations from the InsertDstRegist…

b4b6cdd

…erAccess pass so that it can be used prior to other DST allocation passes

vmilosevic reviewed Nov 17, 2025

View reviewed changes

include/ttmlir/Dialect/D2M/Analysis/DstAnalysis.h Outdated Show resolved Hide resolved

include/ttmlir/Dialect/D2M/Analysis/DstAnalysis.h Outdated Show resolved Hide resolved

vmilosevic reviewed Nov 17, 2025

View reviewed changes

include/ttmlir/Dialect/D2M/Analysis/DstAnalysis.h Outdated Show resolved Hide resolved

brnorris03 force-pushed the bnorris/dst-gc-allocator branch from 9f22936 to 62fbc68 Compare November 18, 2025 06:11

vmilosevic reviewed Nov 18, 2025

View reviewed changes

include/ttmlir/Dialect/D2M/Analysis/DstAnalysis.h Outdated Show resolved Hide resolved

brnorris03 force-pushed the bnorris/dst-gc-allocator branch from 32e8d2b to 16d3134 Compare November 18, 2025 07:05

brnorris03 added 14 commits November 18, 2025 06:18

Merge remote-tracking branch 'origin/main' into bnorris/extract-linal…

72c37c7

…g-to-affine-from-dst-pass

rebase and update tests

ceca6ea

add temporary marker attr to converted linalg loop; update D2MInsertD…

a07d255

…stRegister tests (all pass)

do not convert linalg.generics containing tile matmul ops when useTil…

10deebb

…eMatmul is false

add d2m-linalg-to-affine to pipeline

24c4047

clean up LinalgToAffine

c6bee1a

remove explicit datamovement check, not relevant in this pass

139105b

extend the input buffer lifetime in stream_layout op

a1d3140

undo .github changes (unintended)

f733ba8

Merge remote-tracking branch 'origin/main' into bnorris/extract-linal…

e41112b

…g-to-affine-from-dst-pass

eliminate infinite recursion

a04fe61

ensure correct order of deallocs

048e858

brnorris03 and others added 16 commits November 23, 2025 16:27

add dst_index attribute to ops

1f14077

wip: correct load dst loops (before compute)

8f89cf1

hopefully correct pre- and post-compute load/store loops

715f487

use affine maps

c563846

roll back to match legacy dst functionality, will do more complex cop…

e0dd8fb

…y loops in a future pr

clean up test

4ce44ae

simplify dst index logic

71af3b4

Merge remote-tracking branch 'origin/main' into bnorris/dst-gc-allocator

0b5dc48

fix tests

d3d421e

use convertTileMatmulLinalgToBlock from both DST passes

464bb46

fix graph coloring pass; add unmarked loop handling

245a129

fix tests

5f39359

Merge remote-tracking branch 'origin/main' into bnorris/dst-gc-allocator

5ffcd03

add dst gc test for unmarked loops

b614e56

brnorris03 force-pushed the bnorris/dst-gc-allocator branch from 5d0f7fa to 571c936 Compare November 25, 2025 22:18

vmilosevic reviewed Nov 25, 2025

View reviewed changes

brnorris03 added 5 commits November 25, 2025 16:46

add reduction op guards

70a8033

add reduction test

fc04977

Merge branch 'bnorris/dst-gc-allocator' of github.com:tenstorrent/tt-…

fa3bd74

…mlir into bnorris/dst-gc-allocator

use getDstRegInPlace() consistently; clean up implementation; add una…

db490ee

…ry op test

use getDstRegInPlace() consistently; clean up implementation; add una…

6ab65ad

…ry op test

brnorris03 force-pushed the bnorris/dst-gc-allocator branch from db490ee to 6ab65ad Compare November 26, 2025 02:26

brnorris03 added 6 commits November 25, 2025 18:26

Merge branch 'bnorris/dst-gc-allocator' of github.com:tenstorrent/tt-…

200a0c0

…mlir into bnorris/dst-gc-allocator

fix linter errors

db94fdf

Merge remote-tracking branch 'origin/main' into bnorris/dst-gc-allocator

a250e88

remove unmarked loop handling in dst pass; update tests

9c7489b

remove duplicate conversion of d2m.release_dst to ttkernel.tile_regs_…

8a9827c

…release

remove d2m.release_dst; not really needed

f1b4731

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[D2M] Implement graph coloring-based DST register allocation #5879

[D2M] Implement graph coloring-based DST register allocation #5879

brnorris03 commented Nov 16, 2025 •

edited

Loading

Uh oh!

vmilosevic left a comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Nov 16, 2025 •

edited

Loading

Uh oh!

vmilosevic left a comment

Uh oh!

Uh oh!

vmilosevic left a comment

Uh oh!

Uh oh!

Uh oh!

vmilosevic left a comment

Uh oh!

Uh oh!

vmilosevic left a comment

Uh oh!

Uh oh!

vmilosevic left a comment

Uh oh!

Uh oh!

vmilosevic Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

[D2M] Implement graph coloring-based DST register allocation #5879

Are you sure you want to change the base?

[D2M] Implement graph coloring-based DST register allocation #5879

Conversation

brnorris03 commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket

Summary

Motivation

Changes

Overview of design

New Operations

Pass Implementation

Coloring Strategies

Pipeline Integration

Future Work

Uh oh!

vmilosevic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vmilosevic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vmilosevic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vmilosevic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vmilosevic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vmilosevic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vmilosevic Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

brnorris03 commented Nov 16, 2025 •

edited

Loading

codecov-commenter commented Nov 16, 2025 •

edited

Loading