Conversation
…stractions for multi-engine support. Modify Pool and Subscribers to use new layers.
|
Unsigned commits detected! Please sign your commits. For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation. |
| Medium: &medium, | ||
| LoraName: nil, | ||
|
|
||
| // Create event in vLLM msgpack array format: [tag, hashes, parent, tokens, blockSize, loraID, medium, loraName] |
There was a problem hiding this comment.
Previously, test events were created using specific event structures and then converted to a tagged union format via ToTaggedUnion(). This tagged union matched the exact format vllm sends to llm-d. The tagged union structure was necessary because of double marshaling: first to extracted the event type tag, and the second for the actual event data. I avoided it so I completely removed the ToTaggedUnion().
| kv_events_config=kv_events_config, | ||
| block_size=16, | ||
| prefix_caching_hash_algo="sha256_cbor", | ||
| prefix_caching_hash_algo="sha256_cbor_64bit", |
There was a problem hiding this comment.
Had this error when running the test:
INFO 02-24 02:10:17 [__init__.py:235] Automatically detected platform cuda. usage: vllm serve [model_tag] [options] vllm serve: error: argument --prefix-caching-hash-algo: invalid choice: 'sha256_cbor' (choose from builtin, sha256, sha256_cbor_64bit)
| // getHashAsUint64 converts vLLM hash formats (uint64 or []byte) to uint64. | ||
| // This handles both legacy uint64 hashes and new []byte hashes by taking | ||
| // the last 8 bytes and interpreting them as a big-endian integer. | ||
| func (v *VLLMAdapter) getHashAsUint64(raw any) (uint64, error) { |
There was a problem hiding this comment.
Maybe it should be a general/utility function rather than 'vllm-specific'
| // parseVLLMTopic extracts pod ID and model name from vLLM topic format. | ||
| // Expected format: "pod_id@model_name" | ||
| // TODO: Find a way to avoid it | ||
| func parseVLLMTopic(topic string) (podID, modelName string) { |
There was a problem hiding this comment.
I kept the same logic as before
| return &events.AllBlocksClearedEvent{}, nil | ||
| } | ||
|
|
||
| // TODO: not sure if it best to keep or remove these |
There was a problem hiding this comment.
I'm not sure whether it's better to abstract the inner structures from the subscriber (so it only uses the adapter) or to make it use those methods directly from the transport
| } | ||
|
|
||
| // Check if pod matches our label selector | ||
| if !r.Config.PodLabelSelector.Matches(labels.Set(pod.Labels)) { |
There was a problem hiding this comment.
We might need to introduce an inference engine as one of the pods identifiers
| {{- if .Values.kvCacheManager.enabled }} | ||
| --kv-events-config "{\"enable_kv_cache_events\":{{ .Values.kvCacheManager.enabled }},\"publisher\":\"zmq\",\"endpoint\":\"{{ include "chart.kvCacheManagerServiceUrl" . }}\",\"topic\":\"kv@${POD_IP}@{{ .Values.vllm.model.name }}\"}" \ | ||
| --prefix-caching-hash-algo sha256_cbor \ | ||
| --prefix-caching-hash-algo sha256_cbor_64bit \ |
There was a problem hiding this comment.
Had this error:
INFO 02-24 02:10:17 [__init__.py:235] Automatically detected platform cuda. usage: vllm serve [model_tag] [options] vllm serve: error: argument --prefix-caching-hash-algo: invalid choice: 'sha256_cbor' (choose from builtin, sha256, sha256_cbor_64bit)
| @@ -0,0 +1,145 @@ | |||
| // Copyright 2025 The llm-d Authors. | |||
There was a problem hiding this comment.
This file is very similar to the previous zmq_subscriber.go. I'm not sure why it's not just showing as 'renamed' + the changed lines. If it's difficult to compare, I can try to fix it
Overview
This PR introduces abstraction layers for KV-cache events. The refactoring separates transport protocols, serialization, and engine-specific event structure into distinct layers.
See design docs for full review.
Key Changes
New Abstraction Layers
pkg/kvevents/transport/): Abstracts communication protocols.pkg/kvevents/decoder/): Abstracts serialization formats.pkg/kvevents/engineadapter/): Converts engine specific events to generic events.Event Processing Refactor
BlockStoredEvent,BlockRemovedEvent,AllBlocksClearedEvent) now implements its ownProcess()method.ExtraKeysfield to support vLLM's new event format (currently unused).Testing
Tested on:
pkg/kvevents/engineadapter/vllm_adapter_test.gotests/integration/kv_events_test.gopkg/kvevents/subscriber_manager_test.goexamples/kv_events/onlineIn progress: Performance tests (benchmarking with llm-d stack)