Skip to content

Commit d537be9

Browse files
kfirtoledovMaroon
authored andcommitted
docs: clarify scorer and filter configuration reference
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
1 parent 5441111 commit d537be9

File tree

2 files changed

+4
-5
lines changed

2 files changed

+4
-5
lines changed

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,7 @@ the llm-d inference framework.
77

88
This provides an "Endpoint Picker (EPP)" component to the llm-d inference
99
framework which schedules incoming inference requests to the platform via a
10-
[Kubernetes] Gateway according to scheduler plugins (for more
11-
details, see the [Architecture Documentation]).
10+
[Kubernetes] Gateway according to scheduler plugins. For more details on the llm-d inference scheduler architecture, routing logic, and different plugins (filters and scorers), including plugin configuration, see the [Architecture Documentation]).
1211

1312
The EPP extends the [Gateway API Inference Extension (GIE)] project,
1413
which provides the API resources and machinery for scheduling. We add some

docs/architecture.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,13 +81,13 @@ These components are maintained in the `llm-d-inference-scheduler` repository an
8181
| Scorer | Description | Env Vars |
8282
|------------------|--------------------------------------------|----------|
8383
| Session-aware | Prefers pods from same session | `ENABLE_SESSION_AWARE_SCORER`, `SESSION_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_SESSION_AWARE_SCORER`, `PREFILL_SESSION_AWARE_SCORER_WEIGHT` |
84-
| Prefix-aware | Matches prompt prefix | `ENABLE_PREFIX_AWARE_SCORER`, `PREFIX_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_PREFIX_AWARE_SCORER`, `PREFILL_PREFIX_AWARE_SCORER_WEIGHT`, `PREFIX_SCORER_BLOCK_SIZE`|
85-
| KVCache-aware | Optimizes for KV reuse | `ENABLE_KVCACHE_AWARE_SCORER`, `KVCACHE_INDEXER_REDIS_ADDR`, `PREFILL_ENABLE_KVCACHE_AWARE_SCORER`, `PREFILL_KVCACHE_INDEXER_REDIS_ADDR`, `HF_TOKEN`, `KVCACHE_INDEXER_REDIS_ADDR` |
84+
| Prefix-aware | Scores based on prompt prefix history;<br>lightweight but may not reflect actual KV-cache state | `ENABLE_PREFIX_AWARE_SCORER`, `PREFIX_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_PREFIX_AWARE_SCORER`, `PREFILL_PREFIX_AWARE_SCORER_WEIGHT`, `PREFIX_SCORER_BLOCK_SIZE`|
85+
| KVCache-aware | Scores based on real KV-cache state on vLLM;<br>more accurate but requires extra computation and cycles to track the current cache state | `ENABLE_KVCACHE_AWARE_SCORER`, `KVCACHE_INDEXER_REDIS_ADDR`, `PREFILL_ENABLE_KVCACHE_AWARE_SCORER`, `PREFILL_KVCACHE_INDEXER_REDIS_ADDR`, `HF_TOKEN`, `KVCACHE_INDEXER_REDIS_ADDR` |
8686
| Load-aware | Avoids busy pods | `ENABLE_LOAD_AWARE_SCORER`, `LOAD_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_LOAD_AWARE_SCORER`, `PREFILL_LOAD_AWARE_SCORER_WEIGHT` |
8787

8888
### Prefill / Decode Configuration
8989

90-
In case Disaggrigated Prefill is enabled, you should also define the following environment variables.
90+
In case Disaggrigated Prefill is enabled, you should also define the following environment variables.
9191

9292
- Toggle P/D mode: `PD_ENABLED=true`
9393
- Threshold: `PD_PROMPT_LEN_THRESHOLD=<value>`

0 commit comments

Comments
 (0)