@@ -364,6 +364,57 @@ used for the same session.
364364
365365---
366366
367+ # ### NoHitLRUScorer
368+
369+ Scores pods based on least recently used (LRU) ordering for cold requests (requests with no KV cache hits).
370+ This helps evenly distribute cache growth across pods, since cold requests result in new KV blocks being created.
371+
372+ The scorer integrates with a prefix cache plugin to determine if a request has cache hits :
373+ - For cold requests (no cache hits) : Ranks pods by LRU order, with never-used or least recently used pods
374+ receiving higher scores (up to 1.0) and most recently used pods receiving lower scores (approaching 0.0)
375+ - For warm requests (cache hits) : Returns neutral scores (0.5) for all pods to avoid interfering with
376+ cache locality optimization
377+
378+ The LRU tracking is specific to cold requests only - pods are added to the LRU cache when they serve
379+ a cold request, not when they serve requests with cache hits.
380+
381+ - **Type**: `no-hit-lru-scorer`
382+ - **Parameters**:
383+ - `prefixPluginName` (optional) : The name of the prefix cache plugin to read state from. Defaults to `prefix-cache-scorer`.
384+ - `lruSize` (optional) : The maximum number of pods to track in the LRU cache. Defaults to 1024.
385+
386+ Example configuration :
387+
388+ ` ` ` yaml
389+ plugins:
390+ - type: prefix-cache-scorer
391+ parameters:
392+ hashBlockSize: 5
393+ maxPrefixBlocksToMatch: 256
394+ lruCapacityPerServer: 31250
395+ - type: no-hit-lru-scorer
396+ parameters:
397+ lruSize: 2048
398+ - type: decode-filter
399+ - type: max-score-picker
400+ - type: single-profile-handler
401+ schedulingProfiles:
402+ - name: default
403+ plugins:
404+ - pluginRef: decode-filter
405+ - pluginRef: max-score-picker
406+ - pluginRef: prefix-cache-scorer
407+ weight: 2
408+ - pluginRef: no-hit-lru-scorer
409+ weight: 1
410+ ` ` `
411+
412+ **Note:** This scorer is designed to work alongside a prefix cache scorer (such as `prefix-cache-scorer` or
413+ ` precise-prefix-cache-scorer` ). If no prefix cache state is available, all requests are treated as cold.
414+ When integrating with a prefix-cache scorer, the prefix-cache scorer should be defined first in the scheduling profile.
415+
416+ ---
417+
367418# ## Sample Disaggregated Prefill/Decode Configuration
368419
369420The following is an example of what a configuration for disaggregated Prefill/Decode might look like :
0 commit comments