6 lines (4 loc) · 258 Bytes

KVCache Aware Scorer

This example shows a reference implementation of integrating the kvcache.Indexer module in llm-d-inference-scheduler, which is a Gateway API inference extension implementation.