Commit de80df2
authored
Fix MLA KV cache sizing to use latent-only factor (#233)
This PR is:
- To apply MLA-specific KV sizing (latent-only, not K+V) in cache sizing
paths
- To keep `_one_sequence_kv_bytes` consistent with paged KV block sizing
- To add a focused MLA sizing test and document the latent dimension
context
Notes
- `get_cache_block_size_bytes()` and `_one_sequence_kv_bytes()` now use
`kv_factor = 1` for MLA, `2` otherwise.
- Tests cover MLA sizing and document why `head_dim=576` (kv_lora_rank +
qk_rope_head_dim).
---------
Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>1 parent f518143 commit de80df2
File tree
3 files changed
+36
-13
lines changed- tests
- vllm_metal/v1
3 files changed
+36
-13
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
117 | 118 | | |
118 | 119 | | |
119 | 120 | | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | 121 | | |
124 | 122 | | |
| 123 | + | |
125 | 124 | | |
126 | 125 | | |
127 | 126 | | |
| |||
141 | 140 | | |
142 | 141 | | |
143 | 142 | | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | 143 | | |
148 | 144 | | |
149 | 145 | | |
| 146 | + | |
150 | 147 | | |
151 | 148 | | |
152 | 149 | | |
| |||
175 | 172 | | |
176 | 173 | | |
177 | 174 | | |
178 | | - | |
179 | | - | |
180 | 175 | | |
181 | 176 | | |
| 177 | + | |
182 | 178 | | |
183 | 179 | | |
184 | 180 | | |
| |||
200 | 196 | | |
201 | 197 | | |
202 | 198 | | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
990 | 990 | | |
991 | 991 | | |
992 | 992 | | |
993 | | - | |
994 | | - | |
995 | | - | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
996 | 996 | | |
997 | 997 | | |
998 | 998 | | |
| |||
1155 | 1155 | | |
1156 | 1156 | | |
1157 | 1157 | | |
| 1158 | + | |
1158 | 1159 | | |
1159 | | - | |
| 1160 | + | |
1160 | 1161 | | |
1161 | 1162 | | |
1162 | 1163 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
379 | 379 | | |
380 | 380 | | |
381 | 381 | | |
| 382 | + | |
382 | 383 | | |
383 | | - | |
| 384 | + | |
384 | 385 | | |
385 | 386 | | |
386 | 387 | | |
| |||
0 commit comments