Configuration

This document describes all configuration options available in the llm-d KV Cache Manager. All configurations are JSON-serializable.

Main Configuration

This package consists of two components:

KV Cache Indexer: Manages the KV cache index, allowing efficient retrieval of cached blocks.
KV Event Processing: Handles events from vLLM to update the cache index.

See the Architecture Overview for a high-level view of how these components work and interact.

The two components are configured separately, but share the index backend for storing KV block localities. The latter is configured via the kvBlockIndexConfig field in the KV Cache Indexer configuration.

Indexer Configuration (`Config`)

The main configuration structure for the KV Cache Indexer module.

{
  "prefixStoreConfig": { ... },
  "tokenProcessorConfig": { ... },
  "kvBlockIndexConfig": { ... },
  "tokenizersPoolConfig": { ... }
}

Field	Type	Description	Default
`prefixStoreConfig`	LRUStoreConfig	Configuration for the prefix store	See defaults
`tokenProcessorConfig`	TokenProcessorConfig	Configuration for token processing	See defaults
`kvBlockIndexConfig`	IndexConfig	Configuration for KV block indexing	See defaults
`tokenizersPoolConfig`	Config	Configuration for tokenization pool	See defaults

Complete Example Configuration

Here's a complete configuration example with all options:

{
  "prefixStoreConfig": {
    "cacheSize": 500000,
    "blockSize": 256
  },
  "tokenProcessorConfig": {
    "blockSize": 16,
    "hashSeed": "12345"
  },
  "kvBlockIndexConfig": {
    "inMemoryConfig": {
      "size": 100000000,
      "podCacheSize": 10
    },
    "enableMetrics": true,
    "metricsLoggingInterval": "1m0s"
  },
  "tokenizersPoolConfig": {
    "workersCount": 8,
    "minPrefixOverlapRatio": 0.85,
    "huggingFaceToken": "your_hf_token_here",
    "tokenizersCacheDir": "/tmp/tokenizers"
  }
}

KV-Block Index Configuration

Index Configuration (`IndexConfig`)

Configures the KV-block index backend. Multiple backends can be configured, but only the first available one will be used.

{
  "inMemoryConfig": { ... },
  "costAwareMemoryConfig": { ... },
  "redisConfig": { ... },
  "enableMetrics": false
}

Field	Type	Description	Default
`inMemoryConfig`	InMemoryIndexConfig	In-memory index configuration	See defaults
`costAwareMemoryConfig`	CostAwareMemoryIndexConfig	Cost-aware memory index configuration	`null`
`redisConfig`	RedisIndexConfig	Redis index configuration	`null`
`enableMetrics`	`boolean`	Enable admissions/evictions/hits/misses recording	`false`
`metricsLoggingInterval`	`string` (duration)	Interval at which metrics are logged (e.g., `"1m0s"`). If zero or omitted, metrics logging is disabled. Requires `enableMetrics` to be `true`.	`"0s"`

In-Memory Index Configuration (`InMemoryIndexConfig`)

Configures the in-memory KV block index implementation.

{
  "size": 100000000,
  "podCacheSize": 10
}

Field	Type	Description	Default
`size`	`integer`	Maximum number of keys that can be stored	`100000000`
`podCacheSize`	`integer`	Maximum number of pod entries per key	`10`

Cost-Aware Memory Index Configuration (`CostAwareMemoryIndexConfig`)

Configures the cost-aware memory-based KV block index implementation using Ristretto cache.

{
  "size": "2GiB"
}

Field	Type	Description	Default
`size`	`string`	Maximum memory size for the cache. Supports human-readable formats like "2GiB", "500MiB", "1GB", etc.	`"2GiB"`

Redis Index Configuration (`RedisIndexConfig`)

Configures the Redis-backed KV block index implementation.

{
  "address": "redis://127.0.0.1:6379"
}

Field	Type	Description	Default
`address`	`string`	Redis server address (can include auth: `redis://user:pass@host:port/db`)	`"redis://127.0.0.1:6379"`

Token Processing Configuration

Token Processor Configuration (`TokenProcessorConfig`)

Configures how tokens are converted to KV-block keys.

{
  "blockSize": 16,
  "hashSeed": ""
}

Field	Type	Description	Default
`blockSize`	`integer`	Number of tokens per block	`16`
`hashSeed`	`string`	Seed for hash generation (should align with vLLM's PYTHONHASHSEED)	`""`

Prefix Store Configuration

LRU Store Configuration (`LRUStoreConfig`)

Configures the LRU-based prefix token store.

{
  "cacheSize": 500000,
  "blockSize": 256
}

Field	Type	Description	Default
`cacheSize`	`integer`	Maximum number of blocks the LRU cache can store	`500000`
`blockSize`	`integer`	Number of characters per block in the tokenization prefix-cache	`256`

Tokenization Configuration

Tokenization Pool Configuration (`Config`)

Configures the tokenization worker pool and cache utilization strategy.

{
  "workersCount": 5,
  "minPrefixOverlapRatio": 0.8,
  "huggingFaceToken": "",
  "tokenizersCacheDir": ""
}

Field	Type	Description	Default
`workersCount`	`integer`	Number of tokenization worker goroutines	`5`
`minPrefixOverlapRatio`	`float64`	Minimum overlap ratio to use cached prefix tokens (0.0-1.0)	`0.8`
`huggingFaceToken`	`string`	HuggingFace authentication token	`""`
`tokenizersCacheDir`	`string`	Directory for caching tokenizers	`""`

HuggingFace Tokenizer Configuration (`HFTokenizerConfig`)

Configures the HuggingFace tokenizer backend.

{
  "huggingFaceToken": "",
  "tokenizersCacheDir": ""
}

Field	Type	Description	Default
`huggingFaceToken`	`string`	HuggingFace API token for accessing models	`""`
`tokenizersCacheDir`	`string`	Local directory for caching downloaded tokenizers	`"./bin"`

KV-Event Processing Configuration

KV-Event Pool Configuration (`Config`)

Configures the ZMQ event processing pool for handling KV cache events.

{
  "zmqEndpoint": "tcp://*:5557",
  "topicFilter": "kv@",
  "concurrency": 4
}

Event Processing Configuration Example

For the ZMQ event processing pool:

{
  "zmqEndpoint": "tcp://indexer:5557",
  "topicFilter": "kv@",
  "concurrency": 8
}

Field	Type	Description	Default
`zmqEndpoint`	`string`	ZMQ address to connect to	`"tcp://*:5557"`
`topicFilter`	`string`	ZMQ subscription filter	`"kv@"`
`concurrency`	`integer`	Number of parallel workers	`4`

Notes

Hash Seed Alignment: The hashSeed in TokenProcessorConfig should be aligned with vLLM's PYTHONHASHSEED environment variable to ensure consistent hashing across the system.
Memory Considerations:
- The size parameter in InMemoryIndexConfig directly affects memory usage. Each key-value pair consumes memory proportional to the number of associated pods.
- The size parameter in CostAwareMemoryIndexConfig controls the maximum memory footprint and supports human-readable formats (e.g., "2GiB", "500MiB", "1GB").
Performance Tuning:
- Increase workersCount in tokenization config for higher tokenization throughput
- Adjust minPrefixOverlapRatio: lower values accept shorter cached prefixes, reducing full tokenization overhead
- Adjust concurrency in event processing for better event handling performance
- Tune cache sizes based on available memory and expected workload
Cache Directories: If used, ensure the tokenizersCacheDir has sufficient disk space and appropriate permissions for the application to read/write tokenizer files.
Redis Configuration: When using Redis backend, ensure Redis server is accessible and has sufficient memory. The address field supports full Redis URLs including authentication: redis://user:pass@host:port/db.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration

Main Configuration

Indexer Configuration (`Config`)

Complete Example Configuration

KV-Block Index Configuration

Index Configuration (`IndexConfig`)

In-Memory Index Configuration (`InMemoryIndexConfig`)

Cost-Aware Memory Index Configuration (`CostAwareMemoryIndexConfig`)

Redis Index Configuration (`RedisIndexConfig`)

Token Processing Configuration

Token Processor Configuration (`TokenProcessorConfig`)

Prefix Store Configuration

LRU Store Configuration (`LRUStoreConfig`)

Tokenization Configuration

Tokenization Pool Configuration (`Config`)

HuggingFace Tokenizer Configuration (`HFTokenizerConfig`)

KV-Event Processing Configuration

KV-Event Pool Configuration (`Config`)

Event Processing Configuration Example

Notes

FilesExpand file tree

configuration.md

Latest commit

History

configuration.md

File metadata and controls

Configuration

Main Configuration

Indexer Configuration (Config)

Complete Example Configuration

KV-Block Index Configuration

Index Configuration (IndexConfig)

In-Memory Index Configuration (InMemoryIndexConfig)

Cost-Aware Memory Index Configuration (CostAwareMemoryIndexConfig)

Redis Index Configuration (RedisIndexConfig)

Token Processing Configuration

Token Processor Configuration (TokenProcessorConfig)

Prefix Store Configuration

LRU Store Configuration (LRUStoreConfig)

Tokenization Configuration

Tokenization Pool Configuration (Config)

HuggingFace Tokenizer Configuration (HFTokenizerConfig)

KV-Event Processing Configuration

KV-Event Pool Configuration (Config)

Event Processing Configuration Example

Notes

Indexer Configuration (`Config`)

Index Configuration (`IndexConfig`)

In-Memory Index Configuration (`InMemoryIndexConfig`)

Cost-Aware Memory Index Configuration (`CostAwareMemoryIndexConfig`)

Redis Index Configuration (`RedisIndexConfig`)

Token Processor Configuration (`TokenProcessorConfig`)

LRU Store Configuration (`LRUStoreConfig`)

Tokenization Pool Configuration (`Config`)

HuggingFace Tokenizer Configuration (`HFTokenizerConfig`)

KV-Event Pool Configuration (`Config`)