Skip to content

Commit 673d592

Browse files
committed
Update architecture documentation to include NoHitLRU scoring.
Signed-off-by: usize <mofoster@redhat.com>
1 parent 968571c commit 673d592

File tree

1 file changed

+51
-0
lines changed

1 file changed

+51
-0
lines changed

docs/architecture.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,57 @@ used for the same session.
364364

365365
---
366366

367+
#### NoHitLRUScorer
368+
369+
Scores pods based on least recently used (LRU) ordering for cold requests (requests with no KV cache hits).
370+
This helps evenly distribute cache growth across pods, since cold requests result in new KV blocks being created.
371+
372+
The scorer integrates with a prefix cache plugin to determine if a request has cache hits:
373+
- For cold requests (no cache hits): Ranks pods by LRU order, with never-used or least recently used pods
374+
receiving higher scores (up to 1.0) and most recently used pods receiving lower scores (approaching 0.0)
375+
- For warm requests (cache hits): Returns neutral scores (0.5) for all pods to avoid interfering with
376+
cache locality optimization
377+
378+
The LRU tracking is specific to cold requests only - pods are added to the LRU cache when they serve
379+
a cold request, not when they serve requests with cache hits.
380+
381+
- **Type**: `no-hit-lru-scorer`
382+
- **Parameters**:
383+
- `prefixPluginName` (optional): The name of the prefix cache plugin to read state from. Defaults to `prefix-cache-scorer`.
384+
- `lruSize` (optional): The maximum number of pods to track in the LRU cache. Defaults to 1024.
385+
386+
Example configuration:
387+
388+
```yaml
389+
plugins:
390+
- type: prefix-cache-scorer
391+
parameters:
392+
hashBlockSize: 5
393+
maxPrefixBlocksToMatch: 256
394+
lruCapacityPerServer: 31250
395+
- type: no-hit-lru-scorer
396+
parameters:
397+
lruSize: 2048
398+
- type: decode-filter
399+
- type: max-score-picker
400+
- type: single-profile-handler
401+
schedulingProfiles:
402+
- name: default
403+
plugins:
404+
- pluginRef: decode-filter
405+
- pluginRef: max-score-picker
406+
- pluginRef: prefix-cache-scorer
407+
weight: 2
408+
- pluginRef: no-hit-lru-scorer
409+
weight: 1
410+
```
411+
412+
**Note:** This scorer is designed to work alongside a prefix cache scorer (such as `prefix-cache-scorer` or
413+
`precise-prefix-cache-scorer`). If no prefix cache state is available, all requests are treated as cold.
414+
When integrating with a prefix-cache scorer, the prefix-cache scorer should be defined first in the scheduling profile.
415+
416+
---
417+
367418
### Sample Disaggregated Prefill/Decode Configuration
368419

369420
The following is an example of what a configuration for disaggregated Prefill/Decode might look like:

0 commit comments

Comments
 (0)