Skip to content

KV-events abstraction#356

Draft
NaomiEisen wants to merge 6 commits intollm-d:mainfrom
NaomiEisen:kvevents-abstraction
Draft

KV-events abstraction#356
NaomiEisen wants to merge 6 commits intollm-d:mainfrom
NaomiEisen:kvevents-abstraction

Conversation

@NaomiEisen
Copy link

Overview

This PR introduces abstraction layers for KV-cache events. The refactoring separates transport protocols, serialization, and engine-specific event structure into distinct layers.

See design docs for full review.

Key Changes

New Abstraction Layers

  • Transport Layer (pkg/kvevents/transport/): Abstracts communication protocols.
  • Decoder Layer (pkg/kvevents/decoder/): Abstracts serialization formats.
  • Engine Adapter Layer (pkg/kvevents/engineadapter/): Converts engine specific events to generic events.

Event Processing Refactor

  • Moved event processing logic into event structures: Each event type (BlockStoredEvent, BlockRemovedEvent, AllBlocksClearedEvent) now implements its own Process() method.
  • Removed double marshal/unmarshal: Events are decoded once by the adapter and passed as structured data to the pool.
  • Added ExtraKeys field to support vLLM's new event format (currently unused).

Testing

Tested on:

  • Unit tests: pkg/kvevents/engineadapter/vllm_adapter_test.go
  • tests/integration/kv_events_test.go
  • pkg/kvevents/subscriber_manager_test.go
  • examples/kv_events/online

In progress: Performance tests (benchmarking with llm-d stack)

@github-actions
Copy link

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

Medium: &medium,
LoraName: nil,

// Create event in vLLM msgpack array format: [tag, hashes, parent, tokens, blockSize, loraID, medium, loraName]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, test events were created using specific event structures and then converted to a tagged union format via ToTaggedUnion(). This tagged union matched the exact format vllm sends to llm-d. The tagged union structure was necessary because of double marshaling: first to extracted the event type tag, and the second for the actual event data. I avoided it so I completely removed the ToTaggedUnion().

kv_events_config=kv_events_config,
block_size=16,
prefix_caching_hash_algo="sha256_cbor",
prefix_caching_hash_algo="sha256_cbor_64bit",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had this error when running the test:
INFO 02-24 02:10:17 [__init__.py:235] Automatically detected platform cuda. usage: vllm serve [model_tag] [options] vllm serve: error: argument --prefix-caching-hash-algo: invalid choice: 'sha256_cbor' (choose from builtin, sha256, sha256_cbor_64bit)

// getHashAsUint64 converts vLLM hash formats (uint64 or []byte) to uint64.
// This handles both legacy uint64 hashes and new []byte hashes by taking
// the last 8 bytes and interpreting them as a big-endian integer.
func (v *VLLMAdapter) getHashAsUint64(raw any) (uint64, error) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be a general/utility function rather than 'vllm-specific'

// parseVLLMTopic extracts pod ID and model name from vLLM topic format.
// Expected format: "pod_id@model_name"
// TODO: Find a way to avoid it
func parseVLLMTopic(topic string) (podID, modelName string) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the same logic as before

return &events.AllBlocksClearedEvent{}, nil
}

// TODO: not sure if it best to keep or remove these
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it's better to abstract the inner structures from the subscriber (so it only uses the adapter) or to make it use those methods directly from the transport

}

// Check if pod matches our label selector
if !r.Config.PodLabelSelector.Matches(labels.Set(pod.Labels)) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to introduce an inference engine as one of the pods identifiers

{{- if .Values.kvCacheManager.enabled }}
--kv-events-config "{\"enable_kv_cache_events\":{{ .Values.kvCacheManager.enabled }},\"publisher\":\"zmq\",\"endpoint\":\"{{ include "chart.kvCacheManagerServiceUrl" . }}\",\"topic\":\"kv@${POD_IP}@{{ .Values.vllm.model.name }}\"}" \
--prefix-caching-hash-algo sha256_cbor \
--prefix-caching-hash-algo sha256_cbor_64bit \
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had this error:
INFO 02-24 02:10:17 [__init__.py:235] Automatically detected platform cuda. usage: vllm serve [model_tag] [options] vllm serve: error: argument --prefix-caching-hash-algo: invalid choice: 'sha256_cbor' (choose from builtin, sha256, sha256_cbor_64bit)

@@ -0,0 +1,145 @@
// Copyright 2025 The llm-d Authors.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is very similar to the previous zmq_subscriber.go. I'm not sure why it's not just showing as 'renamed' + the changed lines. If it's difficult to compare, I can try to fix it

@NaomiEisen NaomiEisen marked this pull request as draft February 25, 2026 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant