Skip to content

Commit c29f1cd

Browse files
committed
update prefix store environment variables info in documentation
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
1 parent 33b3063 commit c29f1cd

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

docs/architecture.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ These components are maintained in the `llm-d-inference-scheduler` repository an
8181
| Scorer | Description | Env Vars |
8282
|------------------|--------------------------------------------|----------|
8383
| Session-aware | Prefers pods from same session | `ENABLE_SESSION_AWARE_SCORER`, `SESSION_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_SESSION_AWARE_SCORER`, `PREFILL_SESSION_AWARE_SCORER_WEIGHT` |
84-
| Prefix-aware | Matches prompt prefix | `ENABLE_PREFIX_AWARE_SCORER`, `PREFIX_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_PREFIX_AWARE_SCORER`, `PREFILL_PREFIX_AWARE_SCORER_WEIGHT`, `PREFIX_SCORER_BLOCK_SIZE`, `PREFIX_SCORER_CACHE_CAPACITY`, `PREFIX_SCORER_MAX_BLOCK_CACHE_SIZE`|
84+
| Prefix-aware | Matches prompt prefix | `ENABLE_PREFIX_AWARE_SCORER`, `PREFIX_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_PREFIX_AWARE_SCORER`, `PREFILL_PREFIX_AWARE_SCORER_WEIGHT`, `PREFIX_SCORER_CACHE_CAPACITY`, `PREFIX_SCORER_CACHE_BLOCK_SIZE`|
8585
| KVCache-aware | Optimizes for KV reuse | `ENABLE_KVCACHE_AWARE_SCORER`, `KVCACHE_INDEXER_REDIS_ADDR`, `PREFILL_ENABLE_KVCACHE_AWARE_SCORER`, `PREFILL_KVCACHE_INDEXER_REDIS_ADDR`, `HF_TOKEN`, `KVCACHE_INDEXER_REDIS_ADDR` |
8686
| Load-aware | Avoids busy pods | `ENABLE_LOAD_AWARE_SCORER`, `LOAD_AWARE_SCORER_WEIGHT`, `PREFILL_ENABLE_LOAD_AWARE_SCORER`, `PREFILL_LOAD_AWARE_SCORER_WEIGHT` |
8787

@@ -92,6 +92,11 @@ In case Disaggrigated Prefill is enabled, you should also define the following e
9292
- Toggle P/D mode: `PD_ENABLED=true`
9393
- Threshold: `PD_PROMPT_LEN_THRESHOLD=<value>`
9494

95+
### Prefix Aware Scorer Configuration
96+
97+
- `PREFIX_SCORER_CACHE_CAPACITY` - the cache capacity sets the maximum number of blocks the LRU cache can store. A block maps from a chunk of a prompt to a set of pods that are estimated to have the prefix of the prompt that ends at the keyed chunk.
98+
- `PREFIX_SCORER_CACHE_BLOCK_SIZE` - the cache block size defines the length of the prompt chunk that a block is keyed by.
99+
95100
#### Prefill Scorers:
96101
```bash
97102
export PREFILL_ENABLE_SESSION_AWARE_SCORER=true

0 commit comments

Comments
 (0)