Commit 7c686ba
authored
[TRTLLM-2285][feat] Enable guided decoding with CUDA graph padding and draft model chunked prefill (NVIDIA#6774)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>1 parent b4fcd5f commit 7c686ba
File tree
2 files changed
+13
-2
lines changed- tensorrt_llm/_torch/pyexecutor
- tests/integration/defs/accuracy
2 files changed
+13
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
87 | 91 | | |
88 | 92 | | |
89 | 93 | | |
| |||
189 | 193 | | |
190 | 194 | | |
191 | 195 | | |
192 | | - | |
| 196 | + | |
| 197 | + | |
193 | 198 | | |
194 | 199 | | |
195 | 200 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
325 | 325 | | |
326 | 326 | | |
327 | 327 | | |
| 328 | + | |
328 | 329 | | |
329 | 330 | | |
330 | 331 | | |
| |||
333 | 334 | | |
334 | 335 | | |
335 | 336 | | |
| 337 | + | |
| 338 | + | |
336 | 339 | | |
337 | 340 | | |
338 | 341 | | |
| |||
344 | 347 | | |
345 | 348 | | |
346 | 349 | | |
| 350 | + | |
347 | 351 | | |
348 | 352 | | |
349 | 353 | | |
350 | 354 | | |
351 | 355 | | |
| 356 | + | |
| 357 | + | |
352 | 358 | | |
353 | 359 | | |
354 | 360 | | |
| |||
0 commit comments