Add SHA256-CBOR hashing algorithm for token processor with extra keys… by leipanhz · Pull Request #587 · llm-d/llm-d-kv-cache

leipanhz · 2026-05-15T18:29:36Z

Add KV-cache file prefetch plugin for inference requests (experimental feature)
Part 1: changes in KV-Cache (current PR)
Part 2: changes in llm-d-router

PR Description:
Introduces a new experimental feature that aims to proactively prefetch KV-cache blocks across different storage tiers before inference requests are processed by the GPU pod. The plugin extends the precise prefix cache scorer with engine key calculation to determine the storage location (file names) of KV-cache blocks that will be needed and arrange for them to be promoted to a closer storage tier to improve inference latency. The current implementation is intended for a shared file system that includes transparent access to a remote storage tier, such as IBM Storage Scale configured to off-load cold data to remote object storage. The prefetch plugin uses a concurrent worker thread pool architecture to efficiently prefetch multiple (configurable) files in parallel from remote storage to the shared file system. In a future version of the plugin this could be extended, for example, to prefetch KV-cache blocks from the file system to CPU memory on the worker node that the request is being routed to.

For this to work correctly, the plugin must be configured to use a hash algorithm for generating engine keys that matches the algorithm used by vLLM when offloading KV-cache blocks to storage. For this purpose, this work adds a configurable hashing algorithm SHA256-CBOR to the token processor as an alternative for vLLM compatibility. The SHA256-CBOR implementation supports extra keys (multimodal features) in block hash computation. In addition, this feature relies on logic derived from the llm-d-fs-connector to generate KV file names, so it currently only works with the llm-d-fs-connector.

Changes include:

New Prefetch Plugin (prefetch_prerequest_experimental.go):

Implements PreRequest interface for pre-inference file prefetching
Converts engine keys to filesystem paths using llm-d-fs-connector format
Manages worker thread pool for concurrent file prefetching (configurable workers)
Each worker reads configurable number of blocks (BlockSize x BlockCount bytes) from KV-cache files to trigger prefetch of the rest of the file from remote storage.
Supports configurable prefetch parameters (block size, concurrency, queue size)

Precise Prefix Cache Scorer Enhancement (precise_prefix_cache.go):

Add GetEngineKeysForRequest() method to extract engine keys from requests
Support multimodal features in engine key computation

Add SHA256-CBOR hashing algorithm for token processor with extra keys support

Add configuration to choose hashing function via the field name “hashAlgorithm”: FNV64a default, SHA256-CBOR for vLLM
Implement SHA256-CBOR hashing matching vLLM engine-key computation
Extend BlockExtraFeatures for multimodal content support

github-actions · 2026-05-15T18:29:45Z

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

… support Add configurable hashing algorithm in token processor with FNV64a as the default and SHA256-CBOR as an alternative for vLLM compatibility. Add support for extra keys (multimodal features) in block hash computation. Include comprehensive unit tests for SHA256 hashing and extra keys functionality. Changes: - Add HashAlgorithm configuration (FNV64a default, SHA256-CBOR for vLLM) - Implement SHA256-CBOR hashing matching vLLM engine-key computation - Extend BlockExtraFeatures for multimodal content support Signed-off-by: Lei Pan <leipan@ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: Lei Pan <leipan@ibm.com>

leipanhz requested review from dannyharnik, kfirtoledo, liu-cong and vMaroon as code owners May 15, 2026 18:29

github-actions Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 15, 2026

github-actions Bot requested review from hyeongyun0916, sagearc and yankay May 15, 2026 18:29

leipanhz mentioned this pull request May 15, 2026

Add KV-cache file prefetch plugin llm-d/llm-d-router#1156

Open

leipanhz force-pushed the feat/sha256-cbor-hashing branch 3 times, most recently from 850b062 to 6d44c16 Compare May 18, 2026 16:52

leipanhz force-pushed the feat/sha256-cbor-hashing branch 2 times, most recently from 0bfdfca to 93bed83 Compare May 19, 2026 23:15

Set FNV as the default hashing algorithm to convert tokens to block keys

461c512

Signed-off-by: Lei Pan <leipan@ibm.com>

leipanhz force-pushed the feat/sha256-cbor-hashing branch from 93bed83 to 461c512 Compare May 19, 2026 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SHA256-CBOR hashing algorithm for token processor with extra keys…#587

Add SHA256-CBOR hashing algorithm for token processor with extra keys…#587
leipanhz wants to merge 2 commits into
llm-d:mainfrom
leipanhz:feat/sha256-cbor-hashing

leipanhz commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leipanhz commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leipanhz commented May 15, 2026 •

edited

Loading