Commit 719fbb9
authored
Optimize decode for Llama3-70B for TG for stable branch (#37360)
### Problem description
Improve the decode TPS for Llama3-70B on Galaxy for stable branch
### What's changed
Increased the core count for paged SDPA
### Checklist
- [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:atupe/llama-70b-tg-optimizations)
- [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:atupe/llama-70b-tg-optimizations)
- [ ]
[](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:atupe/llama-70b-tg-optimizations)
- [ ] New/Existing tests provide coverage for changes
#### Model tests
If your changes cover model-related code, you should run tests
corresponding to affected models and platforms (Single card, T3K,
Galaxy). "Choose your pipeline" workflows facilitate running multiple
kinds of tests in a single run. Each offers `models-mandatory` and
`models-extended` presets.
The former includes a minimal set of tests, to be run always. The latter
extends that with additional ones - use your best judgement in deciding
which is the most appropriate for your PR.
- [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml?query=branch:atupe/llama-70b-tg-optimizations)
- [ ] `models-mandatory` preset (runs: [Device perf
regressions](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
and [Frequent model and ttnn
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/fast-dispatch-full-regressions-and-models.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
tests)
- [ ] other selection - specify runs
- [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml?query=branch:atupe/llama-70b-tg-optimizations)
- [ ] `models-mandatory` preset (runs: [Unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-model-perf-tests.yaml)
tests)
- [ ] other selection - specify runs
- [ ] [](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml?query=branch:atupe/llama-70b-tg-optimizations)
- [ ] `models-mandatory` preset (runs: [Quick
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml))
- [ ] `models-extended` preset (runs: the mandatory tests, plus
[Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
and [Model
perf](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-perf-tests.yaml)
tests)
- [ ] other selection - specify runs1 file changed
+2
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1173 | 1173 | | |
1174 | 1174 | | |
1175 | 1175 | | |
1176 | | - | |
| 1176 | + | |
1177 | 1177 | | |
1178 | | - | |
| 1178 | + | |
1179 | 1179 | | |
1180 | 1180 | | |
1181 | 1181 | | |
| |||
0 commit comments