Releases: llm-d/llm-d-kv-cache
Releases · llm-d/llm-d-kv-cache
v0.5.0-RC1
High-Level Summary
- vLLM-native tokenization & chat-completions: Tokenization and chat-template rendering are now fully aligned on vLLM, improving performance, correctness, and observability.
- LoRA aware KV-Cache indexing / multi-model setups: KV-block hashing and events were updated to properly isolate and reuse KV across LoRA adapters.
- KVEvents maturation: Automatic vLLM pod discovery and improved ZMQ subscription management enable more reliable event-driven KV workflows, and active-active multi-replica scheduler HA.
- Storage offloading connector landed: CPU and storage-based KV offloading support is now in-tree, forming the foundation for tiered KV-cache architectures.
- Production hardening: Redis, indexing, CI, and build reliability were significantly improved to support stable large-scale deployments.
What's Changed
- add metrics for tokenizaion and rendering chat template by @delavet in #185
- refactor: Redis memory optimization and test reliability improvements by @natoscott in #196
- feat: Simplify tokenization prefix store to single-model architecture by @sagearc in #182
- [KV-Events] Add vLLM KV cache events demo with ZMQ listener by @sagearc in #162
- build: Enable clang-format for CUDA and C++ sources by @kfirtoledo in #197
- Remove redundant documentation by @vMaroon in #198
- fix: Enable local tokenizer chat template rendering by @pierDipi in #186
- fix: The call to the tokenizer service may be canceled in advance. by @delavet in #201
- feat(kvblock): decouple key handling in index by @hhk7734 in #202
- rename package name by @setsunakute in #209
- fix: restore log output for examples and unit tests by @yankay in #214
- cleanup: remove redundant warnings from make build by @yankay in #216
- fix: Single tokenizer initialization at startup by @sagearc in #192
- remove manager from docs and readme by @setsunakute in #205
- feat: include model name in KVBlock key hash by @sagearc in #220
- [fs_connector][feat]: Add NUMA management module by @kfirtoledo in #177
- [CI] Add CI workflow to run and verify examples by @yankay in #217
- fix(kvcache): move in-memory indexer to fallback to respect external configs by @hyeongyun0916 in #227
- Limit task requeue retries on faiures by @alpe in #224
- fix(redis): atomic engine key eviction to prevent race condition by @sagearc in #233
- docs: update examples configuration by @sagearc in #231
- docs: update tokenization pool configuration by @sagearc in #230
- Async tokenize exec by @alpe in #225
- [fs_connector][feat]: Add file management module (file_io) for fs connector by @kfirtoledo in #176
- [build]: Add pre-commit config by @kfirtoledo in #239
- refactor: Update BlockStored KVEvent by @sagearc in #237
- Fix Example kv_cache_index pod listing returns empty by @yankay in #229
- fix: incorrect yaml indentation by @kyanokashi in #238
- refactor: relocate tokenProcessorConfig to enable injection into kvevents.NewPool by @hyeongyun0916 in #236
- add @hyeongyun0916 as reviewer by @vMaroon in #241
- Fix python version incosistency in DockerFile and use Makefile for build by @kaushik-rohit in #242
- feat: salt KV block hashes with LoRA adapter name by @sagearc in #243
- fix(cgo): resolve CStrings memory leaks by @sagearc in #249
- [fs_connector][feat]: Add tensor copy module (tensor_copy) for GPU and CPU block transfers by @kfirtoledo in #179
- [fs_connector][feat]: Add multithreaded worker pool (thread_pool) by @kfirtoledo in #178
- add @sagearc as reviewer by @vMaroon in #252
- [fs_connector][feat]: Add storage_offload CUDA kernel and setup config by @kfirtoledo in #180
- [fs_connector][feat]: Add Python backend implementation for fs backend of offloading connector (fs_connector) by @kfirtoledo in #181
- Refactor: Replace transformers with vLLM by @hyeongyun0916 in #234
- [fs_connector] [fix]: Return early finish after copy to CPU by @kfirtoledo in #253
- Optimize uds tokenizer performance using grpc & thread pool by @delavet in #222
- fix: remove model name from kvblock Key by @sagearc in #246
- docs : update llmd_fs_backend README.md by @kfirtoledo in #259
- Automatic vLLM pod discovery + ZMQ subscription mgmt for KVEvents by @vMaroon in #212
- refactor: convert profiling script to standard go benchmark by @kaushik-rohit in #251
- Align the interface of the uds tokenizer service with Tokenizer by @delavet in #257
- Improve CI workflow for faster environment setup by @yankay in #258
- fix profiling test by @vMaroon in #261
- edit install-python-deps by @hyeongyun0916 in #262
- Fix: override block_size in storage backend by @kfirtoledo in #260
- Remove elevran from owners by @elevran in #267
New Contributors
- @natoscott made their first contribution in #196
- @hhk7734 made their first contribution in #202
- @setsunakute made their first contribution in #209
- @alpe made their first contribution in #224
- @kaushik-rohit made their first contribution in #242
- @elevran made their first contribution in #267
Full Changelog: v0.4.0...v0.5.0-RC1
v0.4.0
Highlights
- Unified chat-template rendering interface across the different tokenizers (fixes a BOS token duplication bug), performance improvements and code improvements on templating path by @hyeongyun0916
- Logger fix by @hyeongyun0916
- Tiered prefix-cache scoring by @Jay-Pd
- UDS-tokenizer service by @delavet and @osswangxining
- Tokenizer from local disk enablement by @pierDipi
- Valkey kvblock backende by @rishi-jat
- General enhancements by: @zhengkezhou1 @Frapschen @my-git9 @samzong @samber @kyanokashi
What's Changed
- Fix examples precalculated hashes by @vMaroon in #141
- feat: make kvcache.Indexer as gRPC Service by @zhengkezhou1 in #109
- Fix: division by zero in metrics logging by @yankay in #145
- feat: Add Valkey and RDMA support for KV-cache indexing by @rishi-jat in #139
- feat: Add UDS-based external tokenizer service by @delavet in #137
- Fix LookupHits metrics not work by @Frapschen in #146
- [feat] Add tolerations for chart by @my-git9 in #149
- [misc] valkey doc repositioning by @vMaroon in #154
- Update README.md to enhance clarity of flowchart labels and descriptions by @samzong in #157
- perf(prefixstore): sync.Map are much faster in read-intensive applications by @samber in #156
- Add support for local tokenizer files by @pierDipi in #142
- Add @delavet as /services/uds_tokenizer owner by @vMaroon in #164
- Add @osswangxining as /services/uds_tokenizer owner by @vMaroon in #168
- Implementation for Tiering in KV-Cache-Manager by @Jay-Pd in #150
- fix: rename LookupHits metric to MaxHitsPerPod to better reflect what's tracked by @kyanokashi in #160
- fix online example chart format by @delavet in #171
- Minor fix for KV Device Tier by @Jay-Pd in #172
- [Fix] Ensure Correct Logger Usage by Replacing
klog.FromContextwithlog.FromContextby @hyeongyun0916 in #169 - refactor(tokenizer): Unify interface for RenderChatTemplate and eliminate object creation overhead by @hyeongyun0916 in #163
- Minor logger fix by @vMaroon in #173
- General refactoring for v0.4.0 by @vMaroon in #174
New Contributors
- @rishi-jat made their first contribution in #139
- @delavet made their first contribution in #137
- @Frapschen made their first contribution in #146
- @samzong made their first contribution in #157
- @samber made their first contribution in #156
- @pierDipi made their first contribution in #142
- @Jay-Pd made their first contribution in #150
- @kyanokashi made their first contribution in #160
- @hyeongyun0916 made their first contribution in #169
Full Changelog: v0.3.2...v0.4.0
v0.3.2
What's Changed
- Bump helm-chart Image by @vMaroon in #66
- Doc Enhancements by @vMaroon in #73
- Update LICENSE by @vMaroon in #74
- Fix README Diagram by @vMaroon in #75
- Enhance README Diagram Clarity by @vMaroon in #78
- Fix kv_events offline example by @irar2 in #82
- fix: Redis kvblock parsing bugs and add basic unit tests by @yankay in #80
- fix: correct shell command substitution syntax in Makefile by @yankay in #81
- Optimized chat completions library, build support and testing infrastructure by @guygir in #79
- Remove redundant keys return from Index.Lookup interface by @sagiahrac in #84
- KVEvents/others minor refactoring by @vMaroon in #88
- Add InMemoryIndex unit tests by @sagiahrac in #86
- Add instrumentedIndex basic unit tests by @sagiahrac in #87
- docs: fix mermaid chart arrow syntax by @Zerohertz in #93
- Chat-Completions Enhancements: Updated Examples + Code Improvements by @guygir in #92
- Tokenization unit tests by @sagiahrac in #90
- feat: Add Synchronous Tokenization Support to Tokenization Pool by @sagiahrac in #95
- [CI]: added some index-related test cases while refactoring the test code to be more concise. by @yankay in #102
- [docs] Update KV-Events and KV-Cache examples with correct paths and commands by @yankay in #106
- Add Prow GitHub Actions by @Jooho in #117
- fix: Modified the download url of libtokenizers.darwin-x86_64.tar.gz by @WillardHu in #110
- Update code-ownership files to best utilize PROW + auto assign by @vMaroon in #121
- CI: Expand LRUStore Unit Tests for Partial and Prefix Matches by @yankay in #120
- fix: remove OWNERS_ALIASES and update OWNERS by @Jooho in #122
- Implement auto-assign for reviewers without write permissions by @vMaroon in #123
- feat: Add a SliceMapE function for handle errors and add unit tests by @WillardHu in #119
- add benchmark data by @vMaroon in #129
- [feat]support specifying imagePullSecrets for chart by @my-git9 in #130
- Support new KVEvents format by @vMaroon in #132
- Fix indexer behavior when no kvblock-keys are generated by @vMaroon in #118
- add liu-cong as reviewer by @vMaroon in #135
- chore: Fix outdated golangci-lint installation URL by @zhengkezhou1 in #136
- Align with recent vLLM kv-block hashing changes by @vMaroon in #138
New Contributors
- @irar2 made their first contribution in #82
- @yankay made their first contribution in #80
- @guygir made their first contribution in #79
- @sagiahrac made their first contribution in #84
- @Zerohertz made their first contribution in #93
- @Jooho made their first contribution in #117
- @WillardHu made their first contribution in #110
- @my-git9 made their first contribution in #130
Full Changelog: v0.2.1...v0.3.2-rc1
v0.3.1
What's Changed
- Add Prow GitHub Actions by @Jooho in #117
- fix: Modified the download url of libtokenizers.darwin-x86_64.tar.gz by @WillardHu in #110
- Update code-ownership files to best utilize PROW + auto assign by @vMaroon in #121
- CI: Expand LRUStore Unit Tests for Partial and Prefix Matches by @yankay in #120
- fix: remove OWNERS_ALIASES and update OWNERS by @Jooho in #122
- Implement auto-assign for reviewers without write permissions by @vMaroon in #123
- feat: Add a SliceMapE function for handle errors and add unit tests by @WillardHu in #119
- add benchmark data by @vMaroon in #129
- [feat]support specifying imagePullSecrets for chart by @my-git9 in #130
- Support new KVEvents format by @vMaroon in #132
- Fix indexer behavior when no kvblock-keys are generated by @vMaroon in #118
New Contributors
- @Jooho made their first contribution in #117
- @WillardHu made their first contribution in #110
- @my-git9 made their first contribution in #130
Full Changelog: v0.3.0...v0.3.1
v0.3.0
Summary
- OpenAI production ready Chat-Completions preprocessing library
- Synchronous tokenization with caching
- Expanded benchmarking and stronger test coverage
- General code and documentation improvements
What's Changed
- Bump helm-chart Image by @vMaroon in #66
- Doc Enhancements by @vMaroon in #73
- Update LICENSE by @vMaroon in #74
- Fix README Diagram by @vMaroon in #75
- Enhance README Diagram Clarity by @vMaroon in #78
- Fix kv_events offline example by @irar2 in #82
- fix: Redis kvblock parsing bugs and add basic unit tests by @yankay in #80
- fix: correct shell command substitution syntax in Makefile by @yankay in #81
- Optimized chat completions library, build support and testing infrastructure by @guygir in #79
- Remove redundant keys return from Index.Lookup interface by @sagiahrac in #84
- KVEvents/others minor refactoring by @vMaroon in #88
- Add InMemoryIndex unit tests by @sagiahrac in #86
- Add instrumentedIndex basic unit tests by @sagiahrac in #87
- docs: fix mermaid chart arrow syntax by @Zerohertz in #93
- Chat-Completions Enhancements: Updated Examples + Code Improvements by @guygir in #92
- Tokenization unit tests by @sagiahrac in #90
- feat: Add Synchronous Tokenization Support to Tokenization Pool by @sagiahrac in #95
- [CI]: added some index-related test cases while refactoring the test code to be more concise. by @yankay in #102
- [docs] Update KV-Events and KV-Cache examples with correct paths and commands by @yankay in #106
New Contributors
- @irar2 made their first contribution in #82
- @yankay made their first contribution in #80
- @guygir made their first contribution in #79
- @sagiahrac made their first contribution in #84
- @Zerohertz made their first contribution in #93
Full Changelog: v0.2.1...v0.3.0-rc1
v0.2.1
v0.2.0
What's Changed
- Introduced vLLM-Native KV-Events processing and new indexing backends
- In-Memory index (default): KV-Events are digested and stored in memory
- Redis index
- Added observability and real-time Prometheus metrics
- Tracks KV-Block admissions, evictions, lookups and hit-rates
- Enhanced configurability
- Updated integration in llm-d-inference-scheduler (accurate prefix-cache aware scorer)
- Initial support for OpenAI-compatible Chat Completions templating (library)
- Enhanced user examples and end-to-end (vLLM <-> indexer) deployment setup
- General documentation improvements
PRs
- (chore): typo in tokenizer file by @buraksekili in #39
- [KV-Events] Introduce KV-Block Indexing Backends - Part 1 of 3 by @vMaroon in #40
- fix: replace llm-d tag to 0.0.8 by @kfirtoledo in #42
- docs: Add a setup documentation about examples/kv-cache-index by @buraksekili in #38
- [KV-Events] KV-Events Processing - Part 3 of 3 by @vMaroon in #44
- Matched Default TokenProcessorConfig.BlockSize with vLLM's by @vMaroon in #52
- [KVBlock.Index] Prometheus Metrics & Logging by @vMaroon in #53
- Enhance Configurability by @vMaroon in #55
- Update configuration.md by @vMaroon in #56
- Implement Metrics Logging Configuration in Indexer by @vMaroon in #57
- Completions-Support (#50) Extension by @guygir in #58
New Contributors
- @buraksekili made their first contribution in #39
- @guygir made their first contribution in #58
Full Changelog: v0.1.1...v0.2.0-RC1
v0.1.1
What's Changed
- Update OWNERS by @vMaroon in #25
- Update CONTRIBUTING.md by @clubanderson in #27
- Update README.md by @clubanderson in #28
- Update CONTRIBUTING.md by @clubanderson in #35
- Refactor Redis config to use redis.Options struct by @relyt0925 in #37
New Contributors
- @clubanderson made their first contribution in #27
- @relyt0925 made their first contribution in #37
Full Changelog: v0.1.0...v0.1.1