Skip to content

Commit 719fbb9

Browse files
authored
Optimize decode for Llama3-70B for TG for stable branch (#37360)
### Problem description Improve the decode TPS for Llama3-70B on Galaxy for stable branch ### What's changed Increased the core count for paged SDPA ### Checklist - [ ] [![All post-commit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml/badge.svg?branch=atupe/llama-70b-tg-optimizations)](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:atupe/llama-70b-tg-optimizations) - [ ] [![Blackhole Post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml/badge.svg?branch=atupe/llama-70b-tg-optimizations)](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:atupe/llama-70b-tg-optimizations) - [ ] [![cpp-unit-tests](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml/badge.svg?branch=atupe/llama-70b-tg-optimizations)](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:atupe/llama-70b-tg-optimizations) - [ ] New/Existing tests provide coverage for changes #### Model tests If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers `models-mandatory` and `models-extended` presets. The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR. - [ ] [![(Single) Choose your pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml/badge.svg?branch=atupe/llama-70b-tg-optimizations)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml?query=branch:atupe/llama-70b-tg-optimizations) - [ ] `models-mandatory` preset (runs: [Device perf regressions](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) and [Frequent model and ttnn tests](https://github.com/tenstorrent/tt-metal/actions/workflows/fast-dispatch-full-regressions-and-models.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) tests) - [ ] other selection - specify runs - [ ] [![(T3K) Choose your pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml/badge.svg?branch=atupe/llama-70b-tg-optimizations)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml?query=branch:atupe/llama-70b-tg-optimizations) - [ ] `models-mandatory` preset (runs: [Unit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-model-perf-tests.yaml) tests) - [ ] other selection - specify runs - [ ] [![(Galaxy) Choose your pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml/badge.svg?branch=atupe/llama-70b-tg-optimizations)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml?query=branch:atupe/llama-70b-tg-optimizations) - [ ] `models-mandatory` preset (runs: [Quick tests](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-perf-tests.yaml) tests) - [ ] other selection - specify runs
2 parents 3be3486 + 7353f8c commit 719fbb9

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

models/demos/llama3_70b_galaxy/tt/model_config.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1173,9 +1173,9 @@ def prefill_xqkv_minimal_matmul_config(seq_len):
11731173
)
11741174

11751175
self.model_config["PAGED_SDPA_DECODE_PROGCFG"] = ttnn.SDPAProgramConfig(
1176-
compute_with_storage_grid_size=(8, 4),
1176+
compute_with_storage_grid_size=(8, 6),
11771177
sub_core_grids=ttnn.num_cores_to_corerangeset_in_subcoregrids(
1178-
self.start_core, 32, self.sub_core_grids, row_wise=True
1178+
self.start_core, 48, self.sub_core_grids, row_wise=True
11791179
),
11801180
exp_approx_mode=False,
11811181
q_chunk_size=0,

0 commit comments

Comments
 (0)