Skip to content

Revert "A balanced traffic pattern for AG minimal. (#36607)"#37832

Merged
rdraskicTT merged 1 commit intomainfrom
rdraskic/revert-039d07a
Feb 13, 2026
Merged

Revert "A balanced traffic pattern for AG minimal. (#36607)"#37832
rdraskicTT merged 1 commit intomainfrom
rdraskic/revert-039d07a

Conversation

@rdraskicTT
Copy link
Contributor

@rdraskicTT rdraskicTT commented Feb 13, 2026

@rdraskicTT rdraskicTT marked this pull request as ready for review February 13, 2026 09:11
@rdraskicTT rdraskicTT requested review from a team as code owners February 13, 2026 09:11
Copilot AI review requested due to automatic review settings February 13, 2026 09:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reverts the previously introduced “balanced traffic pattern” behavior in the minimal all-gather-async implementation, motivated by incorrect outputs and hangs observed in Llama-model workloads.

Changes:

  • Removes split-forwarding (“balanced traffic”) logic from minimal all-gather async reader/writer device kernels.
  • Removes a fabric packet-size vs page-size TT_FATAL guard in the minimal default program factory.
  • Deletes a WAN-focused ring perf/trace test block and related helper/imports from the nightly test.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
ttnn/cpp/ttnn/operations/experimental/ccl/all_gather_async/device/kernels/minimal_default_writer.cpp Reverts split-forwarding logic and restores simpler per-slice forwarding behavior.
ttnn/cpp/ttnn/operations/experimental/ccl/all_gather_async/device/kernels/minimal_default_reader.cpp Reverts split-forwarding logic and simplifies forwarding/wait behavior accordingly.
ttnn/cpp/ttnn/operations/experimental/ccl/all_gather_async/device/all_gather_async_default_program_factory.cpp Removes validation tying fabric packet payload size to tensor page size.
tests/nightly/tg/ccl/test_minimal_all_gather_async.py Removes WAN ring perf/trace test and related helper/imports; leaves some imports now unused.
Comments suppressed due to low confidence (1)

ttnn/cpp/ttnn/operations/experimental/ccl/all_gather_async/device/all_gather_async_default_program_factory.cpp:393

  • Removing the packet_size_bytes >= page_size validation can lead to num_pages_per_packet == 0, which makes num_tiles_to_write_per_packet == 0. That will result in zero-sized CBs and/or infinite loops in the reader/writer kernels (tiles_read never advances), i.e. hangs. Please restore the guard (TT_FATAL) or otherwise enforce num_tiles_to_write_per_packet >= 1 with a clear error when the fabric packet payload is smaller than a tensor page size.
    // L1 Scratch CB Creation
    const size_t packet_size_bytes = tt::tt_fabric::get_tt_fabric_channel_buffer_size_bytes();
    uint32_t l1_scratch_cb_page_size_bytes = page_size;

    // scatter-write currently supports 4 distinct noc addresses
    uint32_t max_target_noc_addresses_per_packet = 4;

    // for bfloat8_b, tile_num_per_link=6, we would need to send 2 packages, but they can be of size 3 instead of 4
    uint32_t num_pages_per_packet = packet_size_bytes / l1_scratch_cb_page_size_bytes;
    uint32_t num_tiles_to_write_per_packet = std::min(max_target_noc_addresses_per_packet, num_pages_per_packet);
    uint32_t cb_num_pages = 3 * num_tiles_to_write_per_packet;  // triple buffering

from tests.nightly.t3000.ccl.test_minimal_all_gather_async import run_all_gather_impl
from tests.ttnn.multidevice_perf_tests.sweep_all_gather_hyperparameters_t3000 import get_max_chunks_per_sync
from models.common.utility_functions import skip_for_blackhole, skip_for_wormhole_b0
from tests.tt_eager.python_api_testing.sweep_tests.comparison_funcs import comp_equal, comp_pcc
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comp_equal and comp_pcc are imported but only referenced in commented-out code, so they are currently unused. If the repo runs ruff/flake8 in CI, this will fail with an unused-import error; please remove the unused imports or re-enable the assertions that use them.

Suggested change
from tests.tt_eager.python_api_testing.sweep_tests.comparison_funcs import comp_equal, comp_pcc

Copilot uses AI. Check for mistakes.
@rdraskicTT rdraskicTT enabled auto-merge February 13, 2026 09:25
@rdraskicTT rdraskicTT requested a review from llongTT February 13, 2026 09:26
@rdraskicTT
Copy link
Contributor Author

@llongTT fyi

@dpopovTT
Copy link
Contributor

/codeowners bypass

Copy link

@tenstorrent-github-bot tenstorrent-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ CodeOwners bypass approval granted by @dpopovTT (metalium-developers-infra team)

@tenstorrent-github-bot
Copy link

CodeOwners Bypass Approval Granted\n\nThis PR has been approved by @dpopovTT (metalium-developers-infra team) using the bypass mechanism.\n\n⚠️ Note: This bypass should only be used for emergency fixes or when standard approval process is blocked.

@rdraskicTT rdraskicTT added this pull request to the merge queue Feb 13, 2026
Merged via the queue into main with commit f250fa3 Feb 13, 2026
158 of 164 checks passed
@rdraskicTT rdraskicTT deleted the rdraskic/revert-039d07a branch February 13, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants