Commit febb350
perf: use pure decode distribution for decode-only batches (#934)
In decode-only batches, set distribution to [num_seqs, num_seqs, num_seqs]
instead of [0, 0, num_seqs] so the FA kernel dispatches all sequences
through the dedicated decode path rather than the mixed path.
Co-authored-by: leos <leos@primatrix.ai>1 parent 32c8784 commit febb350
3 files changed
Lines changed: 8 additions & 4 deletions
File tree
- python/sgl_jax/srt
- kernels/ragged_paged_attention
- layers/attention
- test/srt
Lines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1496 | 1496 | | |
1497 | 1497 | | |
1498 | 1498 | | |
1499 | | - | |
| 1499 | + | |
| 1500 | + | |
| 1501 | + | |
1500 | 1502 | | |
1501 | 1503 | | |
1502 | 1504 | | |
| |||
1507 | 1509 | | |
1508 | 1510 | | |
1509 | 1511 | | |
1510 | | - | |
| 1512 | + | |
| 1513 | + | |
| 1514 | + | |
1511 | 1515 | | |
1512 | 1516 | | |
1513 | 1517 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
150 | 150 | | |
151 | 151 | | |
152 | 152 | | |
153 | | - | |
| 153 | + | |
154 | 154 | | |
155 | 155 | | |
156 | 156 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
108 | | - | |
| 108 | + | |
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
| |||
0 commit comments