Releases · llm-d/llm-d-kv-cache

21 Jan 20:14

vMaroon

v0.5.0-RC1

b3499df

v0.5.0-RC1 Pre-release

Pre-release

High-Level Summary

vLLM-native tokenization & chat-completions: Tokenization and chat-template rendering are now fully aligned on vLLM, improving performance, correctness, and observability.
LoRA aware KV-Cache indexing / multi-model setups: KV-block hashing and events were updated to properly isolate and reuse KV across LoRA adapters.
KVEvents maturation: Automatic vLLM pod discovery and improved ZMQ subscription management enable more reliable event-driven KV workflows, and active-active multi-replica scheduler HA.
Storage offloading connector landed: CPU and storage-based KV offloading support is now in-tree, forming the foundation for tiered KV-cache architectures.
Production hardening: Redis, indexing, CI, and build reliability were significantly improved to support stable large-scale deployments.

What's Changed

add metrics for tokenizaion and rendering chat template by @delavet in #185
refactor: Redis memory optimization and test reliability improvements by @natoscott in #196
feat: Simplify tokenization prefix store to single-model architecture by @sagearc in #182
[KV-Events] Add vLLM KV cache events demo with ZMQ listener by @sagearc in #162
build: Enable clang-format for CUDA and C++ sources by @kfirtoledo in #197
Remove redundant documentation by @vMaroon in #198
fix: Enable local tokenizer chat template rendering by @pierDipi in #186
fix: The call to the tokenizer service may be canceled in advance. by @delavet in #201
feat(kvblock): decouple key handling in index by @hhk7734 in #202
rename package name by @setsunakute in #209
fix: restore log output for examples and unit tests by @yankay in #214
cleanup: remove redundant warnings from make build by @yankay in #216
fix: Single tokenizer initialization at startup by @sagearc in #192
remove manager from docs and readme by @setsunakute in #205
feat: include model name in KVBlock key hash by @sagearc in #220
[fs_connector][feat]: Add NUMA management module by @kfirtoledo in #177
[CI] Add CI workflow to run and verify examples by @yankay in #217
fix(kvcache): move in-memory indexer to fallback to respect external configs by @hyeongyun0916 in #227
Limit task requeue retries on faiures by @alpe in #224
fix(redis): atomic engine key eviction to prevent race condition by @sagearc in #233
docs: update examples configuration by @sagearc in #231
docs: update tokenization pool configuration by @sagearc in #230
Async tokenize exec by @alpe in #225
[fs_connector][feat]: Add file management module (file_io) for fs connector by @kfirtoledo in #176
[build]: Add pre-commit config by @kfirtoledo in #239
refactor: Update BlockStored KVEvent by @sagearc in #237
Fix Example kv_cache_index pod listing returns empty by @yankay in #229
fix: incorrect yaml indentation by @kyanokashi in #238
refactor: relocate tokenProcessorConfig to enable injection into kvevents.NewPool by @hyeongyun0916 in #236
add @hyeongyun0916 as reviewer by @vMaroon in #241
Fix python version incosistency in DockerFile and use Makefile for build by @kaushik-rohit in #242
feat: salt KV block hashes with LoRA adapter name by @sagearc in #243
fix(cgo): resolve CStrings memory leaks by @sagearc in #249
[fs_connector][feat]: Add tensor copy module (tensor_copy) for GPU and CPU block transfers by @kfirtoledo in #179
[fs_connector][feat]: Add multithreaded worker pool (thread_pool) by @kfirtoledo in #178
add @sagearc as reviewer by @vMaroon in #252
[fs_connector][feat]: Add storage_offload CUDA kernel and setup config by @kfirtoledo in #180
[fs_connector][feat]: Add Python backend implementation for fs backend of offloading connector (fs_connector) by @kfirtoledo in #181
Refactor: Replace transformers with vLLM by @hyeongyun0916 in #234
[fs_connector] [fix]: Return early finish after copy to CPU by @kfirtoledo in #253
Optimize uds tokenizer performance using grpc & thread pool by @delavet in #222
fix: remove model name from kvblock Key by @sagearc in #246
docs : update llmd_fs_backend README.md by @kfirtoledo in #259
Automatic vLLM pod discovery + ZMQ subscription mgmt for KVEvents by @vMaroon in #212
refactor: convert profiling script to standard go benchmark by @kaushik-rohit in #251
Align the interface of the uds tokenizer service with Tokenizer by @delavet in #257
Improve CI workflow for faster environment setup by @yankay in #258
fix profiling test by @vMaroon in #261
edit install-python-deps by @hyeongyun0916 in #262
Fix: override block_size in storage backend by @kfirtoledo in #260
Remove elevran from owners by @elevran in #267

New Contributors

@natoscott made their first contribution in #196
@hhk7734 made their first contribution in #202
@setsunakute made their first contribution in #209
@alpe made their first contribution in #224
@kaushik-rohit made their first contribution in #242
@elevran made their first contribution in #267

Full Changelog: v0.4.0...v0.5.0-RC1

Contributors

alpe, natoscott, and 12 other contributors

Assets 2

25 Nov 22:01

vMaroon

v0.4.0

b34736d

v0.4.0 Latest

Latest

Highlights

Unified chat-template rendering interface across the different tokenizers (fixes a BOS token duplication bug), performance improvements and code improvements on templating path by @hyeongyun0916
Logger fix by @hyeongyun0916
Tiered prefix-cache scoring by @Jay-Pd
UDS-tokenizer service by @delavet and @osswangxining
Tokenizer from local disk enablement by @pierDipi
Valkey kvblock backende by @rishi-jat
General enhancements by: @zhengkezhou1 @Frapschen @my-git9 @samzong @samber @kyanokashi

What's Changed

Fix examples precalculated hashes by @vMaroon in #141
feat: make kvcache.Indexer as gRPC Service by @zhengkezhou1 in #109
Fix: division by zero in metrics logging by @yankay in #145
feat: Add Valkey and RDMA support for KV-cache indexing by @rishi-jat in #139
feat: Add UDS-based external tokenizer service by @delavet in #137
Fix LookupHits metrics not work by @Frapschen in #146
[feat] Add tolerations for chart by @my-git9 in #149
[misc] valkey doc repositioning by @vMaroon in #154
Update README.md to enhance clarity of flowchart labels and descriptions by @samzong in #157
perf(prefixstore): sync.Map are much faster in read-intensive applications by @samber in #156
Add support for local tokenizer files by @pierDipi in #142
Add @delavet as /services/uds_tokenizer owner by @vMaroon in #164
Add @osswangxining as /services/uds_tokenizer owner by @vMaroon in #168
Implementation for Tiering in KV-Cache-Manager by @Jay-Pd in #150
fix: rename LookupHits metric to MaxHitsPerPod to better reflect what's tracked by @kyanokashi in #160
fix online example chart format by @delavet in #171
Minor fix for KV Device Tier by @Jay-Pd in #172
[Fix] Ensure Correct Logger Usage by Replacing klog.FromContext with log.FromContext by @hyeongyun0916 in #169
refactor(tokenizer): Unify interface for RenderChatTemplate and eliminate object creation overhead by @hyeongyun0916 in #163
Minor logger fix by @vMaroon in #173
General refactoring for v0.4.0 by @vMaroon in #174

New Contributors

@rishi-jat made their first contribution in #139
@delavet made their first contribution in #137
@Frapschen made their first contribution in #146
@samzong made their first contribution in #157
@samber made their first contribution in #156
@pierDipi made their first contribution in #142
@Jay-Pd made their first contribution in #150
@kyanokashi made their first contribution in #160
@hyeongyun0916 made their first contribution in #169

Full Changelog: v0.3.2...v0.4.0

Contributors

yankay, samber, and 12 other contributors

Assets 2

02 Oct 11:41

vMaroon

v0.3.2

994abf2

v0.3.2

What's Changed

Bump helm-chart Image by @vMaroon in #66
Doc Enhancements by @vMaroon in #73
Update LICENSE by @vMaroon in #74
Fix README Diagram by @vMaroon in #75
Enhance README Diagram Clarity by @vMaroon in #78
Fix kv_events offline example by @irar2 in #82
fix: Redis kvblock parsing bugs and add basic unit tests by @yankay in #80
fix: correct shell command substitution syntax in Makefile by @yankay in #81
Optimized chat completions library, build support and testing infrastructure by @guygir in #79
Remove redundant keys return from Index.Lookup interface by @sagiahrac in #84
KVEvents/others minor refactoring by @vMaroon in #88
Add InMemoryIndex unit tests by @sagiahrac in #86
Add instrumentedIndex basic unit tests by @sagiahrac in #87
docs: fix mermaid chart arrow syntax by @Zerohertz in #93
Chat-Completions Enhancements: Updated Examples + Code Improvements by @guygir in #92
Tokenization unit tests by @sagiahrac in #90
feat: Add Synchronous Tokenization Support to Tokenization Pool by @sagiahrac in #95
[CI]: added some index-related test cases while refactoring the test code to be more concise. by @yankay in #102
[docs] Update KV-Events and KV-Cache examples with correct paths and commands by @yankay in #106
Add Prow GitHub Actions by @Jooho in #117
fix: Modified the download url of libtokenizers.darwin-x86_64.tar.gz by @WillardHu in #110
Update code-ownership files to best utilize PROW + auto assign by @vMaroon in #121
CI: Expand LRUStore Unit Tests for Partial and Prefix Matches by @yankay in #120
fix: remove OWNERS_ALIASES and update OWNERS by @Jooho in #122
Implement auto-assign for reviewers without write permissions by @vMaroon in #123
feat: Add a SliceMapE function for handle errors and add unit tests by @WillardHu in #119
add benchmark data by @vMaroon in #129
[feat]support specifying imagePullSecrets for chart by @my-git9 in #130
Support new KVEvents format by @vMaroon in #132
Fix indexer behavior when no kvblock-keys are generated by @vMaroon in #118
add liu-cong as reviewer by @vMaroon in #135
chore: Fix outdated golangci-lint installation URL by @zhengkezhou1 in #136
Align with recent vLLM kv-block hashing changes by @vMaroon in #138

New Contributors

@irar2 made their first contribution in #82
@yankay made their first contribution in #80
@guygir made their first contribution in #79
@sagiahrac made their first contribution in #84
@Zerohertz made their first contribution in #93
@Jooho made their first contribution in #117
@WillardHu made their first contribution in #110
@my-git9 made their first contribution in #130

Full Changelog: v0.2.1...v0.3.2-rc1

Contributors

yankay, Jooho, and 8 other contributors

Assets 2

25 Sep 23:05

vMaroon

v0.3.1

598637a

v0.3.1

What's Changed

Add Prow GitHub Actions by @Jooho in #117
fix: Modified the download url of libtokenizers.darwin-x86_64.tar.gz by @WillardHu in #110
Update code-ownership files to best utilize PROW + auto assign by @vMaroon in #121
CI: Expand LRUStore Unit Tests for Partial and Prefix Matches by @yankay in #120
fix: remove OWNERS_ALIASES and update OWNERS by @Jooho in #122
Implement auto-assign for reviewers without write permissions by @vMaroon in #123
feat: Add a SliceMapE function for handle errors and add unit tests by @WillardHu in #119
add benchmark data by @vMaroon in #129
[feat]support specifying imagePullSecrets for chart by @my-git9 in #130
Support new KVEvents format by @vMaroon in #132
Fix indexer behavior when no kvblock-keys are generated by @vMaroon in #118

New Contributors

@Jooho made their first contribution in #117
@WillardHu made their first contribution in #110
@my-git9 made their first contribution in #130

Full Changelog: v0.3.0...v0.3.1

Contributors

yankay, Jooho, and 3 other contributors

Assets 2

04 Sep 17:05

vMaroon

v0.3.0

68d56a3

v0.3.0

Summary

OpenAI production ready Chat-Completions preprocessing library
Synchronous tokenization with caching
Expanded benchmarking and stronger test coverage
General code and documentation improvements

What's Changed

Bump helm-chart Image by @vMaroon in #66
Doc Enhancements by @vMaroon in #73
Update LICENSE by @vMaroon in #74
Fix README Diagram by @vMaroon in #75
Enhance README Diagram Clarity by @vMaroon in #78
Fix kv_events offline example by @irar2 in #82
fix: Redis kvblock parsing bugs and add basic unit tests by @yankay in #80
fix: correct shell command substitution syntax in Makefile by @yankay in #81
Optimized chat completions library, build support and testing infrastructure by @guygir in #79
Remove redundant keys return from Index.Lookup interface by @sagiahrac in #84
KVEvents/others minor refactoring by @vMaroon in #88
Add InMemoryIndex unit tests by @sagiahrac in #86
Add instrumentedIndex basic unit tests by @sagiahrac in #87
docs: fix mermaid chart arrow syntax by @Zerohertz in #93
Chat-Completions Enhancements: Updated Examples + Code Improvements by @guygir in #92
Tokenization unit tests by @sagiahrac in #90
feat: Add Synchronous Tokenization Support to Tokenization Pool by @sagiahrac in #95
[CI]: added some index-related test cases while refactoring the test code to be more concise. by @yankay in #102
[docs] Update KV-Events and KV-Cache examples with correct paths and commands by @yankay in #106

New Contributors

@irar2 made their first contribution in #82
@yankay made their first contribution in #80
@guygir made their first contribution in #79
@sagiahrac made their first contribution in #84
@Zerohertz made their first contribution in #93

Full Changelog: v0.2.1...v0.3.0-rc1

Contributors

yankay, irar2, and 4 other contributors

Assets 2

24 Jul 13:00

vMaroon

v0.2.1

56b4bd5

v0.2.1

What's Changed

kvevents Package Build Data Exportation by @vMaroon in #61
Update Tokenizer Release Version by @vMaroon in #63
Remove Default StorageClass: "ocs-storagecluster-cephfs" by @dumb0002 in #64

New Contributors

@dumb0002 made their first contribution in #64

Full Changelog: v0.2.0...v0.2.1

Contributors

dumb0002 and vMaroon

Assets 2

19 Jul 21:54

vMaroon

v0.2.0

8a60b22

v0.2.0

What's Changed

Introduced vLLM-Native KV-Events processing and new indexing backends
- In-Memory index (default): KV-Events are digested and stored in memory
- Redis index
Added observability and real-time Prometheus metrics
- Tracks KV-Block admissions, evictions, lookups and hit-rates
Enhanced configurability
Updated integration in llm-d-inference-scheduler (accurate prefix-cache aware scorer)
Initial support for OpenAI-compatible Chat Completions templating (library)
Enhanced user examples and end-to-end (vLLM <-> indexer) deployment setup
General documentation improvements

PRs

(chore): typo in tokenizer file by @buraksekili in #39
[KV-Events] Introduce KV-Block Indexing Backends - Part 1 of 3 by @vMaroon in #40
fix: replace llm-d tag to 0.0.8 by @kfirtoledo in #42
docs: Add a setup documentation about examples/kv-cache-index by @buraksekili in #38
[KV-Events] KV-Events Processing - Part 3 of 3 by @vMaroon in #44
Matched Default TokenProcessorConfig.BlockSize with vLLM's by @vMaroon in #52
[KVBlock.Index] Prometheus Metrics & Logging by @vMaroon in #53
Enhance Configurability by @vMaroon in #55
Update configuration.md by @vMaroon in #56
Implement Metrics Logging Configuration in Indexer by @vMaroon in #57
Completions-Support (#50) Extension by @guygir in #58

New Contributors

@buraksekili made their first contribution in #39
@guygir made their first contribution in #58

Full Changelog: v0.1.1...v0.2.0-RC1

Contributors

kfirtoledo, buraksekili, and 2 other contributors

Assets 2

03 Jun 07:29

vMaroon

v0.1.1

c7f0332

v0.1.1 Pre-release

Pre-release

What's Changed

Update OWNERS by @vMaroon in #25
Update CONTRIBUTING.md by @clubanderson in #27
Update README.md by @clubanderson in #28
Update CONTRIBUTING.md by @clubanderson in #35
Refactor Redis config to use redis.Options struct by @relyt0925 in #37

New Contributors

@clubanderson made their first contribution in #27
@relyt0925 made their first contribution in #37

Full Changelog: v0.1.0...v0.1.1

Contributors

clubanderson, relyt0925, and vMaroon

Assets 2

20 May 09:11

vMaroon

v0.1.0

fec714f

v0.1.0 Pre-release

Pre-release

What's Changed

Merge Dev by @vMaroon in #18
[build]: Add pre-commit target by @kfirtoledo in #20
Fix link by @oglok in #21
Misc by @vMaroon in #22
Fix Dockerfile by @vMaroon in #24

New Contributors

@oglok made their first contribution in #21

Full Changelog: 0.0.3...v0.1.0

Contributors

oglok, kfirtoledo, and vMaroon

Assets 2

Releases: llm-d/llm-d-kv-cache

v0.5.0-RC1

High-Level Summary

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.0

Summary

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0

What's Changed

PRs

New Contributors

Contributors

Uh oh!

v0.1.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

What's Changed

New Contributors

Contributors

Uh oh!