Commit 9392ffd
feat(mem_cache): page-major (layer-major within a page) KV/state layout
Opt-in physical layout (--enable-page-major-kv-layout) that makes the page the
outermost axis: each page is laid out layer-major in one contiguous byte buffer
for the Mamba state, full-KV, and SWA-KV caches instead of per-layer tensors. At
page_size=1 this is a token-granularity envelope; independent of any
shared/virtual-slot allocator.
- mem_cache/layout/page_major.py: strided-view builders + byte geometry.
- PageMajorMHATokenToKVPool subclass (kv_cache_layout=page_major_layer_major)
via _store_kv_layer / _move_kv_impl template hooks on MHATokenToKVPool;
layout-incompatible methods raise instead of silently mis-indexing.
MambaPool envelope branch for the conv/temporal state.
- Triton decode/extend + store_cache_4d kernels: page-aware strides, a
byte-identical no-op at page_size=1 (PAGE_SIZE constexpr).
- GDN prefill gather/scatter in gdn_backend.forward_extend so the strided
envelope state is persisted correctly to the pool (the prefill conv /
chunk_gated_delta_rule kernels assume a contiguous slot layout).
- server_args flag + Triton-backend validator; model_runner_kv_cache_mixin
routes the layout into the plain-MHA, SWA-hybrid, and Mamba-hybrid pools.
- Removed the dead enable_kvcache_transpose param.
- Tests: store_cache_4d / decode+extend parity, CPU view/move, and e2e
page-major accuracy (gpt-oss, qwen) in the label-gated extra suite. Docs.
Co-Authored-By: lch1475369 <lch1475369@gmail.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>1 parent da802dd commit 9392ffd
25 files changed
Lines changed: 2136 additions & 124 deletions
File tree
- docs_new/docs/advanced_features
- python/sglang
- srt
- layers/attention
- linear
- mamba
- triton_ops
- mem_cache
- layout
- triton_ops
- model_executor
- test
- test/registered
- page_major
- unit/mem_cache
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
509 | 509 | | |
510 | 510 | | |
511 | 511 | | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
512 | 518 | | |
513 | 519 | | |
514 | 520 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
425 | 425 | | |
426 | 426 | | |
427 | 427 | | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
428 | 452 | | |
429 | 453 | | |
430 | 454 | | |
| |||
460 | 484 | | |
461 | 485 | | |
462 | 486 | | |
463 | | - | |
| 487 | + | |
464 | 488 | | |
465 | | - | |
| 489 | + | |
466 | 490 | | |
467 | 491 | | |
468 | 492 | | |
| |||
514 | 538 | | |
515 | 539 | | |
516 | 540 | | |
517 | | - | |
518 | | - | |
| 541 | + | |
| 542 | + | |
519 | 543 | | |
520 | 544 | | |
521 | 545 | | |
| |||
525 | 549 | | |
526 | 550 | | |
527 | 551 | | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
528 | 558 | | |
529 | 559 | | |
530 | 560 | | |
| |||
Lines changed: 7 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | | - | |
47 | | - | |
48 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
49 | 53 | | |
50 | 54 | | |
51 | 55 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
150 | 156 | | |
151 | 157 | | |
152 | 158 | | |
| |||
1306 | 1312 | | |
1307 | 1313 | | |
1308 | 1314 | | |
| 1315 | + | |
1309 | 1316 | | |
1310 | 1317 | | |
1311 | 1318 | | |
| |||
1575 | 1582 | | |
1576 | 1583 | | |
1577 | 1584 | | |
| 1585 | + | |
1578 | 1586 | | |
1579 | 1587 | | |
1580 | 1588 | | |
| |||
1710 | 1718 | | |
1711 | 1719 | | |
1712 | 1720 | | |
| 1721 | + | |
1713 | 1722 | | |
1714 | 1723 | | |
1715 | 1724 | | |
| |||
0 commit comments