Speculative Indexing for the KV-Cache Aware Routing

## Summary

Details in [google doc](https://docs.google.com/document/d/1Ao-XYDFBXe_kOgVWFzNR38P5LqRtF7lShPMlC9x53AQ/edit?tab=t.0).

TL;DR: Feed routing decisions into the KV-cache index as speculative block entries alongside engine-confirmed blocks. A side TTL cache expires unconfirmed entries via `Index.Evict()` callbacks.
Block annotations distinguish speculative from confirmed, generalizing to future use cases (HMA full-attention vs SWA). The `PrecisePrefixCacheScorer` gains `PrepareDataPlugin` and `PreRequest` hooks, aligning with the gateway plugin framework. This functionally unifies the IGW approximate scorer and the llm-d precise scorer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speculative Indexing for the KV-Cache Aware Routing #353

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speculative Indexing for the KV-Cache Aware Routing #353

Description

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions