Skip to content

Handling Attention Group id in KV events#510

Open
kapiljain1989 wants to merge 1 commit intollm-d:mainfrom
kapiljain1989:groupid
Open

Handling Attention Group id in KV events#510
kapiljain1989 wants to merge 1 commit intollm-d:mainfrom
kapiljain1989:groupid

Conversation

@kapiljain1989
Copy link
Copy Markdown

Add attention group tracking to KV-Cache indexer for Hybrid Multi-head Attention (HMA) support. This enables per-group cache hit scoring for models with multiple attention groups
(e.g., full attention + sliding window attention).

Changes

  • Core data model: Add StoredGroups []int field to PodEntry to track which attention groups have cached a block
  • Event schema: Add GroupIdx field to BlockStoredEvent and BlockRemovedEvent for per-group cache updates
  • vLLM adapter: Parse group_idx from vLLM KV events (msgpack fields [9] and [3])
  • Index implementations: Update all index backends (InMemory, CostAwareMemory, Redis) to:
    • Use string-based cache keys ("podID@tier") instead of struct keys for efficient in-place updates
    • Merge StoredGroups when adding duplicate entries
    • Remove specific groups on eviction (delete entry only when no groups remain)
    • Store JSON-serialized entries in Redis for group list persistence
  • Event processing: Convert single GroupIdx from events to StoredGroups list in index operations

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

@github-actions github-actions bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 10, 2026
Signed-off-by: Kapil Jain <kapiljain1989@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant