-
Notifications
You must be signed in to change notification settings - Fork 91
Open
Description
Summary
Details in google doc.
TL;DR: Feed routing decisions into the KV-cache index as speculative block entries alongside engine-confirmed blocks. A side TTL cache expires unconfirmed entries via Index.Evict() callbacks.
Block annotations distinguish speculative from confirmed, generalizing to future use cases (HMA full-attention vs SWA). The PrecisePrefixCacheScorer gains PrepareDataPlugin and PreRequest hooks, aligning with the gateway plugin framework. This functionally unifies the IGW approximate scorer and the llm-d precise scorer.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is needed