Skip to content

Add SHA256-CBOR hashing algorithm for token processor with extra keys…#587

Open
leipanhz wants to merge 2 commits into
llm-d:mainfrom
leipanhz:feat/sha256-cbor-hashing
Open

Add SHA256-CBOR hashing algorithm for token processor with extra keys…#587
leipanhz wants to merge 2 commits into
llm-d:mainfrom
leipanhz:feat/sha256-cbor-hashing

Conversation

@leipanhz
Copy link
Copy Markdown

@leipanhz leipanhz commented May 15, 2026

Add KV-cache file prefetch plugin for inference requests (experimental feature)
Part 1: changes in KV-Cache (current PR)
Part 2: changes in llm-d-router

PR Description:
Introduces a new experimental feature that aims to proactively prefetch KV-cache blocks across different storage tiers before inference requests are processed by the GPU pod. The plugin extends the precise prefix cache scorer with engine key calculation to determine the storage location (file names) of KV-cache blocks that will be needed and arrange for them to be promoted to a closer storage tier to improve inference latency. The current implementation is intended for a shared file system that includes transparent access to a remote storage tier, such as IBM Storage Scale configured to off-load cold data to remote object storage. The prefetch plugin uses a concurrent worker thread pool architecture to efficiently prefetch multiple (configurable) files in parallel from remote storage to the shared file system. In a future version of the plugin this could be extended, for example, to prefetch KV-cache blocks from the file system to CPU memory on the worker node that the request is being routed to.

For this to work correctly, the plugin must be configured to use a hash algorithm for generating engine keys that matches the algorithm used by vLLM when offloading KV-cache blocks to storage. For this purpose, this work adds a configurable hashing algorithm SHA256-CBOR to the token processor as an alternative for vLLM compatibility. The SHA256-CBOR implementation supports extra keys (multimodal features) in block hash computation. In addition, this feature relies on logic derived from the llm-d-fs-connector to generate KV file names, so it currently only works with the llm-d-fs-connector.

Changes include:

New Prefetch Plugin (prefetch_prerequest_experimental.go):

  • Implements PreRequest interface for pre-inference file prefetching
  • Converts engine keys to filesystem paths using llm-d-fs-connector format
  • Manages worker thread pool for concurrent file prefetching (configurable workers)
  • Each worker reads configurable number of blocks (BlockSize x BlockCount bytes) from KV-cache files to trigger prefetch of the rest of the file from remote storage.
  • Supports configurable prefetch parameters (block size, concurrency, queue size)

Precise Prefix Cache Scorer Enhancement (precise_prefix_cache.go):

  • Add GetEngineKeysForRequest() method to extract engine keys from requests
  • Support multimodal features in engine key computation

Add SHA256-CBOR hashing algorithm for token processor with extra keys support

  • Add configuration to choose hashing function via the field name “hashAlgorithm”: FNV64a default, SHA256-CBOR for vLLM
  • Implement SHA256-CBOR hashing matching vLLM engine-key computation
  • Extend BlockExtraFeatures for multimodal content support

@github-actions github-actions Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 15, 2026
@github-actions
Copy link
Copy Markdown

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@leipanhz leipanhz force-pushed the feat/sha256-cbor-hashing branch 3 times, most recently from 850b062 to 6d44c16 Compare May 18, 2026 16:52
… support

Add configurable hashing algorithm in token processor with FNV64a
as the default and SHA256-CBOR as an alternative for vLLM compatibility.
Add support for extra keys (multimodal features) in block hash computation.
Include comprehensive unit tests for SHA256 hashing and extra keys functionality.

Changes:
- Add HashAlgorithm configuration (FNV64a default, SHA256-CBOR for vLLM)
- Implement SHA256-CBOR hashing matching vLLM engine-key computation
- Extend BlockExtraFeatures for multimodal content support

Signed-off-by: Lei Pan <leipan@ibm.com>
Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@leipanhz leipanhz force-pushed the feat/sha256-cbor-hashing branch 2 times, most recently from 0bfdfca to 93bed83 Compare May 19, 2026 23:15
@leipanhz leipanhz force-pushed the feat/sha256-cbor-hashing branch from 93bed83 to 461c512 Compare May 19, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant