Commit c7bf987
authored
DeepSeek Blitz moe fusion (#37757)
### Ticket
[Link to Github
Issue](#35538)
### Problem description
Fuse routed expert and shared expert into one kernel.
### What's changed
Add internal looping into dram mm, for testing the same cb sharing.
Add moe gather, that swaps the logic for NC/BR, as using noc1 to send is
not interfering with dram read.
fuse moe routed/shared expert.
Leave final reduce to one fusion for another pr, as that depends on the
merge of #37705 and
#37411
### Checklist
- [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:yugao/moe4)
- [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:yugao/moe4)
- [ ]
[](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:yugao/moe4)
- [ ] New/Existing tests provide coverage for changes1 parent db459ba commit c7bf987
File tree
9 files changed
+5335
-28
lines changed- models/demos/deepseek_v3_b1
- fused_ops/moe
- micro_ops/dram_streaming_matmul
- kernels
- tests/unit_tests
- unified_kernels
9 files changed
+5335
-28
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
Lines changed: 1037 additions & 0 deletions
Large diffs are not rendered by default.
0 commit comments