Skip to content

Commit c7bf987

Browse files
authored
DeepSeek Blitz moe fusion (#37757)
### Ticket [Link to Github Issue](#35538) ### Problem description Fuse routed expert and shared expert into one kernel. ### What's changed Add internal looping into dram mm, for testing the same cb sharing. Add moe gather, that swaps the logic for NC/BR, as using noc1 to send is not interfering with dram read. fuse moe routed/shared expert. Leave final reduce to one fusion for another pr, as that depends on the merge of #37705 and #37411 ### Checklist - [ ] [![All post-commit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml/badge.svg?branch=yugao/moe4)](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:yugao/moe4) - [ ] [![Blackhole Post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml/badge.svg?branch=yugao/moe4)](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:yugao/moe4) - [ ] [![cpp-unit-tests](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml/badge.svg?branch=yugao/moe4)](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:yugao/moe4) - [ ] New/Existing tests provide coverage for changes
1 parent db459ba commit c7bf987

File tree

9 files changed

+5335
-28
lines changed

9 files changed

+5335
-28
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# SPDX-FileCopyrightText: © 2026 Tenstorrent AI ULC
2+
3+
# SPDX-License-Identifier: Apache-2.0

models/demos/deepseek_v3_b1/fused_ops/moe/moe_kernel.cpp

Lines changed: 1037 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)