A balanced traffic pattern for AG minimal. by llongTT · Pull Request #37878 · tenstorrent/tt-metal

llongTT · 2026-02-13T19:23:26Z

Ticket

Problem description

AG minimal has low fabric utilization, especially for device ring = 4 cases.
WAN 2.2 model perf requires AG to reach 92% fabric utilization to achieve the target.

What's changed

Balance the packet traffic between forward worker and backward worker such that they split the last slice. It saves 25% of fabric latency by going from "2 slice forward, 1 slice backward" to "1 and half slice both forward and backward" in device ring = 4.
If balanced traffic feature enabled, also split the local writes half/half, to reduce the noc traffic on backward worker.
Extend the feature to all device rings >2 and ring size even cases.

Checklist

New/Existing tests provide coverage for changes

Model tests

If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers models-mandatory and models-extended presets.
The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR.

# Conflicts: # ttnn/cpp/ttnn/operations/experimental/ccl/all_gather_async/device/kernels/minimal_default_writer.cpp

…Also guard the split local writes with the same condition.

llongTT and others added 21 commits January 28, 2026 23:36

balanced traffic for AG minimal on 4-device ring, opus 4.5, attempt 1.

477dac6

opus 4.5 attempt 2.

9fbcc3d

opus 4.5 attempt 3

93ecdb5

Local claude fixed the hang issue. Great.

c885302

Merge branch 'main' into llong/dit_ag_min

55be2ab

introduce max_payload_size to boost fabric bandwidth.

4f60d5e

put zone scopes in reader/writer kernels for tracyi gui.

ca45c88

remove the zone scope from kernels.

dc032c1

Merge branch 'main' into llong/dit_ag_min

276d47f

# Conflicts: # ttnn/cpp/ttnn/operations/experimental/ccl/all_gather_async/device/kernels/minimal_default_writer.cpp

Add back some statements removed by Agent. It's safer to keep it.

8eea24d

split local writes half/half between forward/backward worker.

465de17

extend the split forward feature to ring size >2 and ring size even. …

cc17165

…Also guard the split local writes with the same condition.

address the copilot suggestion to pass pipeline.

283ced2

Merge branch 'main' into llong/dit_ag_min

6b0163b

Merge branch 'main' into llong/dit_ag_min

c4d3ed5

some update from pipeline feedback/sheran.

862e68b

Merge branch 'main' into llong/dit_ag_min

5998e2c

update the fabric max payload size to precisely 8K

340ccf5

revert the local write split feature.

a1fb5fe

revert the whisper model perf change.

971b544

skip the new unit test on wormhole due to memory limit.

62d1d85

llongTT changed the title ~~Llong/dit ag min~~ A balanced traffic pattern for AG minimal. Feb 13, 2026

llongTT self-assigned this Feb 13, 2026

llongTT and others added 3 commits February 13, 2026 11:25

Merge branch 'main' into llong/dit_ag_min

9e3a7f3

Merge branch 'main' into llong/dit_ag_min

5c5ade9

some treatment on edge case when number of tile to read = 1.

b3f7d27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A balanced traffic pattern for AG minimal.#37878

A balanced traffic pattern for AG minimal.#37878
llongTT wants to merge 24 commits intomainfrom
llong/dit_ag_min

llongTT commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

llongTT commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket

Problem description

What's changed

Checklist

Model tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

llongTT commented Feb 13, 2026 •

edited

Loading