llm-d · vMaroon · Jun 16, 2025 · Jun 16, 2025
diff --git a/README.md b/README.md
@@ -7,8 +7,7 @@ the llm-d inference framework.
 
 This provides an "Endpoint Picker (EPP)" component to the llm-d inference
 framework which schedules incoming inference requests to the platform via a
-[Kubernetes] Gateway according to scheduler plugins (for more
-details, see the [Architecture Documentation]).
+[Kubernetes] Gateway according to scheduler plugins. For more details on the llm-d inference scheduler architecture, routing logic, and different plugins (filters and scorers), including plugin configuration, see the [Architecture Documentation]).
 
 The EPP extends the [Gateway API Inference Extension (GIE)] project,
 which provides the API resources and machinery for scheduling. We add some

diff --git a/docs/architecture.md b/docs/architecture.md
@@ -81,13 +81,13 @@ These components are maintained in the `llm-d-inference-scheduler` repository an
 | Scorer           | Description                                | Env Vars |
 |------------------|--------------------------------------------|----------|
 | Session-aware    | Prefers pods from same session             | `ENABLE_SESSION_AWARE_SCORER`, `SESSION_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_SESSION_AWARE_SCORER`, `PREFILL_SESSION_AWARE_SCORER_WEIGHT` |
-| Prefix-aware     | Matches prompt prefix                      | `ENABLE_PREFIX_AWARE_SCORER`, `PREFIX_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_PREFIX_AWARE_SCORER`, `PREFILL_PREFIX_AWARE_SCORER_WEIGHT`, `PREFIX_SCORER_BLOCK_SIZE`|
-| KVCache-aware    | Optimizes for KV reuse                     | `ENABLE_KVCACHE_AWARE_SCORER`, `KVCACHE_INDEXER_REDIS_ADDR`, `PREFILL_ENABLE_KVCACHE_AWARE_SCORER`, `PREFILL_KVCACHE_INDEXER_REDIS_ADDR`, `HF_TOKEN`, `KVCACHE_INDEXER_REDIS_ADDR` |
+| Prefix-aware     | 	Scores based on prompt prefix history;<br>lightweight but may not reflect actual KV-cache state                     | `ENABLE_PREFIX_AWARE_SCORER`, `PREFIX_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_PREFIX_AWARE_SCORER`, `PREFILL_PREFIX_AWARE_SCORER_WEIGHT`, `PREFIX_SCORER_BLOCK_SIZE`|
+| KVCache-aware    | Scores based on real KV-cache state on vLLM;<br>more accurate but requires extra computation and cycles to track the current cache state                   | `ENABLE_KVCACHE_AWARE_SCORER`, `KVCACHE_INDEXER_REDIS_ADDR`, `PREFILL_ENABLE_KVCACHE_AWARE_SCORER`, `PREFILL_KVCACHE_INDEXER_REDIS_ADDR`, `HF_TOKEN`, `KVCACHE_INDEXER_REDIS_ADDR` |
 | Load-aware       | Avoids busy pods                           | `ENABLE_LOAD_AWARE_SCORER`, `LOAD_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_LOAD_AWARE_SCORER`, `PREFILL_LOAD_AWARE_SCORER_WEIGHT` |
 
 ### Prefill / Decode Configuration
 
-In case Disaggrigated Prefill is enabled, you should also define the following environment variables. 
+In case Disaggrigated Prefill is enabled, you should also define the following environment variables.
 
 - Toggle P/D mode: `PD_ENABLED=true`
 - Threshold: `PD_PROMPT_LEN_THRESHOLD=<value>`