feat(mem_cache): Hybrid Memory Pool system (Step 1: RecurrentStatePool) by JamesBrianD · Pull Request #1031 · sgl-project/sglang-jax

JamesBrianD · 2026-05-06T10:19:39Z

Summary

Add RecurrentStatePool with DP sharding support, migrated from epic/support_kimi_linear
Pure buffer pool for linear recurrent layers (KDA/Mamba/GDN), slot allocator lives in HybridReqToTokenPool (upcoming)
Key changes vs epic: max_num_reqs → size (align upstream MambaPool), dp_size param, slot dim sharded on P("data", ...)

Test plan

Unit test: buffer creation, clear_slot, replace_buffer, pytree round-trip
DP test: dp_size > 1 buffer sharding correctness
Integration test after full hybrid memory pool stack is complete

🤖 Generated with Claude Code

Migrated from epic/support_kimi_linear with DP support added. Pure buffer pool for linear recurrent layers (KDA/Mamba/GDN). Key changes vs epic: - max_num_reqs → size (align with upstream sglang MambaPool) - dp_size param with slot dim sharded on P("data", ...) - total_slots = ceil_to(size+1, dp_size) for DP divisibility

The first cut paired the DP-sharded RecurrentStatePool from sgl-project#1031 with a single-list slot allocator copied from epic. With dp_size > 1 the buffer's first dim is sharded along the 'data' axis (each rank physically holds a distinct slot range), so a single global free list would hand out slots that cross DP rank boundaries — read/write at those slots would land in the wrong rank's local buffer view. Switch the allocator to per-DP: maintain one free list per rank with LOCAL indices [1..slots_per_rank], and route alloc/free by req.dp_rank. Callers (prepare_for_extend / decode) iterate per-DP, so all reqs in a single alloc() call share the same dp_rank. Tests updated: dp_size=1 cases unchanged in semantics but now index into recurrent_free_slots[0]. DP test class rewritten with four per-rank cases (init local indexing, alloc routing, capacity miss isolation, free routing). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

JamesBrianD added 5 commits May 6, 2026 18:19

chore: trim verbose docstrings in RecurrentStatePool

1a8792d

chore: remove redundant parameter asserts in RecurrentStatePool

b19d8f0

refactor: simplify replace_buffer and pytree in RecurrentStatePool

b7cc7fb

revert: restore total_slots/sharding in pytree aux_data

d9ce8b1

Rodrian7 mentioned this pull request May 7, 2026

feat(mem_cache): Hybrid Memory Pool system (Step 2: HybridReqToTokenPool) #1033

Closed

3 tasks

JamesBrianD closed this May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mem_cache): Hybrid Memory Pool system (Step 1: RecurrentStatePool)#1031

feat(mem_cache): Hybrid Memory Pool system (Step 1: RecurrentStatePool)#1031
JamesBrianD wants to merge 5 commits intosgl-project:mainfrom
primatrix:feat/hybrid-memory-pool

JamesBrianD commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JamesBrianD commented May 6, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant