Skip to content

feat(mem_cache): Hybrid Memory Pool system (Step 1: RecurrentStatePool)#1031

Closed
JamesBrianD wants to merge 5 commits intosgl-project:mainfrom
primatrix:feat/hybrid-memory-pool
Closed

feat(mem_cache): Hybrid Memory Pool system (Step 1: RecurrentStatePool)#1031
JamesBrianD wants to merge 5 commits intosgl-project:mainfrom
primatrix:feat/hybrid-memory-pool

Conversation

@JamesBrianD
Copy link
Copy Markdown
Collaborator

Summary

  • Add RecurrentStatePool with DP sharding support, migrated from epic/support_kimi_linear
  • Pure buffer pool for linear recurrent layers (KDA/Mamba/GDN), slot allocator lives in HybridReqToTokenPool (upcoming)
  • Key changes vs epic: max_num_reqssize (align upstream MambaPool), dp_size param, slot dim sharded on P("data", ...)

Test plan

  • Unit test: buffer creation, clear_slot, replace_buffer, pytree round-trip
  • DP test: dp_size > 1 buffer sharding correctness
  • Integration test after full hybrid memory pool stack is complete

🤖 Generated with Claude Code

Migrated from epic/support_kimi_linear with DP support added.
Pure buffer pool for linear recurrent layers (KDA/Mamba/GDN).

Key changes vs epic:
- max_num_reqs → size (align with upstream sglang MambaPool)
- dp_size param with slot dim sharded on P("data", ...)
- total_slots = ceil_to(size+1, dp_size) for DP divisibility
@JamesBrianD JamesBrianD closed this May 7, 2026
Rodrian7 added a commit to Rodrian7/sglang-jax that referenced this pull request May 7, 2026
The first cut paired the DP-sharded RecurrentStatePool from sgl-project#1031 with a
single-list slot allocator copied from epic. With dp_size > 1 the buffer's
first dim is sharded along the 'data' axis (each rank physically holds a
distinct slot range), so a single global free list would hand out slots
that cross DP rank boundaries — read/write at those slots would land in
the wrong rank's local buffer view.

Switch the allocator to per-DP: maintain one free list per rank with
LOCAL indices [1..slots_per_rank], and route alloc/free by req.dp_rank.
Callers (prepare_for_extend / decode) iterate per-DP, so all reqs in a
single alloc() call share the same dp_rank.

Tests updated: dp_size=1 cases unchanged in semantics but now index into
recurrent_free_slots[0]. DP test class rewritten with four per-rank cases
(init local indexing, alloc routing, capacity miss isolation, free
routing).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant