Skip to content

Commit c4da54d

Browse files
committed
fix: config should use new precise-prefix-cache-scorer
- we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com>
1 parent 98a3296 commit c4da54d

File tree

2 files changed

+25
-22
lines changed

2 files changed

+25
-22
lines changed

deploy/config/sim-epp-kvcache-config.yaml

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,12 @@ kind: EndpointPickerConfig
55
plugins:
66
- type: precise-prefix-cache-scorer
77
parameters:
8-
mode: cache_tracking
9-
tokenProcessorConfig:
10-
blockSize: 16 # must match vLLM block size if not default (16)
11-
hashSeed: "42" # must match PYTHONHASHSEED in vLLM pods
128
kvEventsConfig:
139
zmqEndpoint: tcp://0.0.0.0:5557
1410
indexerConfig:
15-
prefixStoreConfig:
16-
blockSize: 16
11+
tokenProcessorConfig:
12+
blockSize: 16 # must match vLLM block size if not default (16)
13+
hashSeed: "42" # must match PYTHONHASHSEED in vLLM pods
1714
tokenizersPoolConfig:
1815
modelName: TinyLlama/TinyLlama-1.1B-Chat-v1.0 # replace value to use different model for tokenizer loading
1916
hf:

docs/architecture.md

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -161,11 +161,13 @@ A complete configuration might look like this:
161161
apiVersion: inference.networking.x-k8s.io/v1alpha1
162162
kind: EndpointPickerConfig
163163
plugins:
164-
- type: prefix-cache-scorer
164+
- type: precise-prefix-cache-scorer
165165
parameters:
166-
hashBlockSize: 5
167-
maxPrefixBlocksToMatch: 256
168-
lruCapacityPerServer: 31250
166+
indexerConfig:
167+
tokenProcessorConfig:
168+
blockSize: 5
169+
kvBlockIndexConfig:
170+
maxPrefixBlocksToMatch: 256
169171
- type: decode-filter
170172
- type: max-score-picker
171173
- type: single-profile-handler
@@ -174,7 +176,7 @@ schedulingProfiles:
174176
plugins:
175177
- pluginRef: decode-filter
176178
- pluginRef: max-score-picker
177-
- pluginRef: prefix-cache-scorer
179+
- pluginRef: precise-prefix-cache-scorer
178180
weight: 50
179181
```
180182
@@ -465,11 +467,13 @@ Example configuration:
465467

466468
```yaml
467469
plugins:
468-
- type: prefix-cache-scorer
470+
- type: precise-prefix-cache-scorer
469471
parameters:
470-
hashBlockSize: 5
471-
maxPrefixBlocksToMatch: 256
472-
lruCapacityPerServer: 31250
472+
indexerConfig:
473+
tokenProcessorConfig:
474+
blockSize: 5
475+
kvBlockIndexConfig:
476+
maxPrefixBlocksToMatch: 256
473477
- type: no-hit-lru-scorer
474478
parameters:
475479
lruSize: 2048
@@ -481,7 +485,7 @@ schedulingProfiles:
481485
plugins:
482486
- pluginRef: decode-filter
483487
- pluginRef: max-score-picker
484-
- pluginRef: prefix-cache-scorer
488+
- pluginRef: precise-prefix-cache-scorer
485489
weight: 2
486490
- pluginRef: no-hit-lru-scorer
487491
weight: 1
@@ -502,11 +506,13 @@ apiVersion: inference.networking.x-k8s.io/v1alpha1
502506
kind: EndpointPickerConfig
503507
plugins:
504508
- type: prefill-header-handler
505-
- type: prefix-cache-scorer
509+
- type: precise-prefix-cache-scorer
506510
parameters:
507-
hashBlockSize: 5
508-
maxPrefixBlocksToMatch: 256
509-
lruCapacityPerServer: 31250
511+
indexerConfig:
512+
tokenProcessorConfig:
513+
blockSize: 5
514+
kvBlockIndexConfig:
515+
maxPrefixBlocksToMatch: 256
510516
- type: prefill-filter
511517
- type: decode-filter
512518
- type: max-score-picker
@@ -519,13 +525,13 @@ schedulingProfiles:
519525
plugins:
520526
- pluginRef: prefill-filter
521527
- pluginRef: max-score-picker
522-
- pluginRef: prefix-cache-scorer
528+
- pluginRef: precise-prefix-cache-scorer
523529
weight: 50
524530
- name: decode
525531
plugins:
526532
- pluginRef: decode-filter
527533
- pluginRef: max-score-picker
528-
- pluginRef: prefix-cache-scorer
534+
- pluginRef: precise-prefix-cache-scorer
529535
weight: 50
530536
```
531537

0 commit comments

Comments
 (0)