Commit 49ab40b
[Merge stable to main] Llama3.3-70b and 3.1-8b - Fix sampling parameters (#36476)
#36325
This PR fixes couple of different issues for Llama3.3-70b:
- Non-uniform seeding
- Penalty trap bug
- Penalty bugs for Llama3.1-8b
- batched prefill determinism
- diff between batched and non-batched prefill
- missing logprobs support for Llama3.3-70b
- Fixes same sampling parameters for Llama3.1-8b
- Bring over the log-probs support for Galaxy (optional log-softmaxed
logits output), matching the behavior already validated on stable in
TT-Metal, vLLM nightly, and Models CI.
- Integrate the deterministic seeding flow (host-side RNG +
SamplingSeedManager + `ttnn.manual_seed` usage before `ttnn.sampling`)
so prefill + decode produce deterministic sequences across repeats when
seeds are fixed.
- Ensure the penalties path matches the shared implementation, fixing
the earlier divergence across users.
- Updated matmul configs to support same behaviour across batched and
non-batched prefill with couple additional fixes for divergence.
Performance numbers on text_demo in t/s/u:
| branch | without penalties | with penalties |
|-------|-------|-------|
| branch | 71.88 t/s/u | 42.36 t/s/u |
| main | 72.05 t/s/u | - |
**TTFT**:
**68.5**ms -> **73.9**ms drop due to disabling use_2d_grid in rms norm
is expected.
- [ ] [All post-commit
tests](https://github.com/tenstorrent/tt-metal/actions/runs/21355526046)
- [x] [Galaxy
Demo](https://github.com/tenstorrent/tt-metal/actions/runs/21361481284)
- [x] [vllm
nightly](https://github.com/tenstorrent/tt-metal/actions/runs/21361542050)
- [x] [Models
CI](https://github.com/tenstorrent/tt-shield/actions/runs/21435406349/job/61728802475)
Last pipelines list 6th Feb:
- [] [vllm
Nightly](https://github.com/tenstorrent/tt-metal/actions/runs/21754091798)
- [] [Shield
CI](https://github.com/tenstorrent/tt-shield/actions/runs/21753926206/job/62758873631)
- [] [Galaxy
Demo](https://github.com/tenstorrent/tt-metal/actions/runs/21754402409)
---------
Co-authored-by: Stuti Raizada <159130512+sraizada-tt@users.noreply.github.com>
Co-authored-by: Tomasz Cheda <tcheda@tenstorrent.com>
Co-authored-by: Jonathan Su <jonathansu@tenstorrent.com>
Co-authored-by: alnah005 <salnahari@tenstorrent.com>
Co-authored-by: Alberto Perez Vicente <aperezvicente@tenstorrent.com>
Co-authored-by: handrewsTT <handrews@tenstorrent.com>
Co-authored-by: Mohamed Bahnas <mbahnas@tenstorrent.com>
Co-authored-by: Radoica Draskic <rdraskic@tenstorrent.com>
Co-authored-by: kpaigwar <kpaigwar@tenstorrent.com>
Co-authored-by: Stuti Raizada <sraizada@tenstorrent.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>1 parent f87c34a commit 49ab40b
File tree
38 files changed
+2008
-600
lines changed- .github/workflows
- models
- common
- sampling
- tests
- demos
- llama3_70b_galaxy
- demo
- tests
- tt
- multimodal/gemma3/tt
- experimental/gemma3_4b/tt
- tt_transformers
- demo
- tt
- tests/nightly/tg/ccl
- ttnn/cpp/ttnn/operations/experimental/ccl
- all_gather_async/device
- kernels
- reduce_scatter_minimal_async/device
- kernels
38 files changed
+2008
-600
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
149 | | - | |
| 149 | + | |
150 | 150 | | |
151 | 151 | | |
152 | 152 | | |
| |||
161 | 161 | | |
162 | 162 | | |
163 | 163 | | |
164 | | - | |
| 164 | + | |
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
9 | 10 | | |
| 11 | + | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
16 | | - | |
17 | | - | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
| 64 | + | |
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
66 | 68 | | |
67 | | - | |
68 | | - | |
| 69 | + | |
| 70 | + | |
69 | 71 | | |
70 | 72 | | |
71 | 73 | | |
| |||
79 | 81 | | |
80 | 82 | | |
81 | 83 | | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
| 84 | + | |
86 | 85 | | |
87 | 86 | | |
88 | 87 | | |
| |||
98 | 97 | | |
99 | 98 | | |
100 | 99 | | |
101 | | - | |
| 100 | + | |
102 | 101 | | |
103 | 102 | | |
104 | 103 | | |
| |||
107 | 106 | | |
108 | 107 | | |
109 | 108 | | |
| 109 | + | |
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
116 | | - | |
117 | | - | |
118 | | - | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
119 | 120 | | |
120 | 121 | | |
121 | 122 | | |
122 | 123 | | |
123 | 124 | | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
124 | 133 | | |
125 | 134 | | |
126 | 135 | | |
| |||
153 | 162 | | |
154 | 163 | | |
155 | 164 | | |
156 | | - | |
| 165 | + | |
157 | 166 | | |
158 | 167 | | |
159 | 168 | | |
| |||
167 | 176 | | |
168 | 177 | | |
169 | 178 | | |
170 | | - | |
| 179 | + | |
| 180 | + | |
171 | 181 | | |
172 | | - | |
| 182 | + | |
173 | 183 | | |
174 | | - | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
175 | 187 | | |
176 | 188 | | |
177 | 189 | | |
| |||
210 | 222 | | |
211 | 223 | | |
212 | 224 | | |
213 | | - | |
214 | 225 | | |
215 | 226 | | |
216 | 227 | | |
| |||
226 | 237 | | |
227 | 238 | | |
228 | 239 | | |
229 | | - | |
| 240 | + | |
| 241 | + | |
230 | 242 | | |
231 | 243 | | |
232 | 244 | | |
| |||
236 | 248 | | |
237 | 249 | | |
238 | 250 | | |
239 | | - | |
| 251 | + | |
240 | 252 | | |
241 | 253 | | |
242 | 254 | | |
| |||
253 | 265 | | |
254 | 266 | | |
255 | 267 | | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | 268 | | |
275 | 269 | | |
276 | 270 | | |
| |||
297 | 291 | | |
298 | 292 | | |
299 | 293 | | |
300 | | - | |
| 294 | + | |
301 | 295 | | |
302 | 296 | | |
303 | 297 | | |
| |||
355 | 349 | | |
356 | 350 | | |
357 | 351 | | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
358 | 355 | | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
0 commit comments