Releases · ModelEngine-Group/unified-cache-management

13 May 09:35

qyh111

v0.5.0

52b0f54

v0.5.0 Latest

Latest

Highlights

1. DeepSeek V4 Flash-oriented integration: HMA + FAWA

HMA with FAWA: Strengthens HMA for large-model deployments (including DeepSeek V4 Flash–class setups). #953

2. Performance: Layerwise & Cache Store

Layerwise KV load patch (vLLM-Ascend v0.18.0 / DSA-oriented paths). #911
Pipeline-friendly shard task submission in CacheStore. #888
Buffer reservation to mitigate load starvation for high-priority traffic. #895
Full-graph replay for GSA sparse attention scenarios. #907
Backend-only KV cache load mode. #951
KV cache storage threshold and related storage controls. #925

3. Compress Store

UCM store compression module & compression config. #940
Compress Store logging & UX improvements. #949

4. SGLang

SGLang + UCM integration via dynamic backend loading. #886
SGLang quickstart docs & Dockerfile updates. #891

5. Other (stability / observability / tooling / engineering)

Profiling switch for performance testing. #875
DP fix: non-DP0 ranks updating file hotness correctly. #884
Connector metadata assert fix. #897
Duplicate metrics registration fix. #901
Hit-rate recording and follow-ups (1TP, partial-hit dump, etc.). #917 · #926 · #927
Trace generation support. #912
KV cache calculator: new model support. #899
TransBuffer / dump edge cases (reservation & ring bounds). #924
KVCacheLayout row-count fix. #920
MTP layer duplicate wait_for_layer_load fix. #932
Stricter save/load exception handling & cleanup. #929
vLLM 0.18.x–related patches (e.g. finalize_kv_cache, local_cache_hit). #934 · #941
Multi-node DP / multi-process GC fix. #937
Docs: distributed PD disaggregation on Ascend. #921
CI / packaging: Dockerfile refactor, metrics config in package, long e2e removed from PR gate. #902 · #916 · #947
Logging rate-limit switch, TPOT test + config updates. #931 · #944
Docs: quickstart & supported compute platforms. #890 · #896
Release housekeeping: 0.5.0rc2 & bump to 0.5.0. #915 · #956
Packaging (__init__.py, etc.). #935 · #943

Full Changelog: v0.5.0rc1...v0.5.0

What's Changed

[Feat]: add profiling switch for performance test by @harrisonyhq in #875
[BugFix] Fix the bug where non-DP0 processes fail to update file hotness in DP scenarios. by @UESTC-AHao in #884
[Perf] Pipeline-friendly shard task submission in CacheStore by @mag1c-h in #888
[doc] Update quickstart and calculator config by @yuanzhg078 in #890
[doc] Update supported compute platforms by @yuanzhg078 in #896
[Bugfix] Fix the bug of connector matadata assert failure by @sumingZero in #897
[Fix] Duplicate metrics registration by @flesher0813 in #901
[opt] Prevent Load starvation by reserving buffers for high-priority operations by @mag1c-h in #895
[CI] refactor dockerfile by @dante159753 in #902
[opt] Support full-graph replay for GSA-enabled sparse attention by @wangwenxin0312 in #907
feat(kv_cache_calculator): Add New Model Supports by @Potterluo in #899
[doc] Update SGLang quickstart docs & Update SGLang Dockerfile by @pyxyzc in #891
[Feature] Add SGLang UCM integration via dynamic backend loading by @pyxyzc in #886
[Opt] Support generate traces by @flesher0813 in #912
release 0.5.0rc2 by @qyh111 in #915
[CI] add metrics config into package by @dante159753 in #916
[Feat] add layerwise KV load patch for vllm-ascend v0.18.0 on DSA model by @sumingZero in #911
[Feat] Support recording hit rate by @flesher0813 in #917
[Fix]TransBuffer: shared freeHead let Dump bypass the reserved limit and ring bounds by @qyh111 in #924
[Opt] Support setting a threshold for kv_cache storage by @flesher0813 in #925
[Fix] Not print hit rate when using 1tp by @flesher0813 in #926
[Fix] Fix dump err when 0 < len(load_blocks) < fully hit by @flesher0813 in #927
[Fix] Update KVCacheLayout to set row counts. by @hmy98213 in #920
[bugfix]fix mtp_layer calls wait_for_layer_load multiple times by @qyh111 in #932
[Fix] Strengthen save/load exception check and remove unused codes by @flesher0813 in #929
[fix]Add patch for finalize_kv_cache in 0.18.0rc1 by @qyh111 in #934
Add init.py by @qyh111 in #935
[BugFix] Fix the issue of multiple processes enabling GC in multi-node DP scen… by @UESTC-AHao in #937
[feat] Add switch to log rate limit by @dante159753 in #931
[BugFix] Add patch for local_cache_hit calculation in vllm 0.18.0 met… by @sumingZero in #941
[Doc] Distributed PD Disaggregation on Ascend by @sumingZero in #921
[Bugfix] Add init.py by @sumingZero in #943
[CI] Remove long-running e2e tests from PR gate by @mag1c-h in #947
feat(test): Add TPOT metric calculation and update test configuration by @Potterluo in #944
[Feat] Add UCM store compression module and compression config parametersF...

Contributors

dante159753, mag1c-h, and 13 other contributors

Assets 2

28 Apr 08:05

qyh111

v0.5.0rc1

82467ce

v0.5.0rc1 Pre-release

Pre-release

What's Changed

[Feat] Adding an environment variable input by @Menglths in #837
[opt] Add parallel size check in CI by @flesher0813 in #844
[fix]Add timeout in PosixStore working thread by @qyh111 in #849
[feat] Ascend: add mmap-based Host memory for O_DIRECT support by @mag1c-h in #850
[opt] Remove redundant step(Install GTest) from the workflow. by @mag1c-h in #851
[opt] Cancel the Load/Dump task proactively after it times out. by @mag1c-h in #853
[Feat] Add mindie-llm support by @nrj868 in #848
[opt] Cancel all timed-out tasks at once. by @mag1c-h in #859
[bugfix] Update seq_lens_list only on NPU path by @wangwenxin0312 in #860
[feat] Store provides the ability to configure CPU affinity. by @mag1c-h in #852
[opt] Optimize GSA by fusing operators by @Fengli5355 in #861
[feat] Support kvcsstore by @ayaka836 in #855
[feat] Add NUMA-aware CPU core split for vllm worker and store threads by @wangwenxin0312 in #854
[Bugfix] fix offline patch by @Infinite666 in #864
[fix] Update patch to track mindie changes by @nrj868 in #863
[CI] Add gsa online test by @dante159753 in #857
[opt] stack-protector on EmptyStore by @mag1c-h in #866
[opt] Enhance NPU CPU affinity resolution with NUMA fallback by @wangwenxin0312 in #865
[bugfix] Get all CPUs for the device's local socket by @wangwenxin0312 in #869
add timeout for PR gate pipeline by @dante159753 in #862
[refactor] Remove NVML-based CPU affinity setting by @wangwenxin0312 in #870
[feat] Add tensor parallel size support and update GPU memory utilization for online inference tests by @hmy98213 in #874
[fix]Submit layerwise KV load tasks one layer at a time by @qyh111 in #783
[feat] log with rate limit by @Lijiachen1018 in #821
[Feat] adapt for DSA model on CUDA platform by @sumingZero in #871
[Build] Update UCM Dockerfiles for vLLM/vLLM-Ascend v0.17.0 by @yuanzhg078 in #876
[Usage] Move use layerwise and hit ratio into config file by @harrisonyhq in #784
[test] multi-processor test on AIO and Shm by @mag1c-h in #873
[Feature] Garbage Collection by @UESTC-AHao in #777
[CI] add test set config by @dante159753 in #877
[bugfix] Fix MLA block_table row mapping by @wangwenxin0312 in #882
[doc] Update support matrix. by @yuanzhg078 in #880
release 0.5.0rc1 by @yuanzhg078 in #881
[doc] Update framework version compatibility notes by @yuanzhg078 in #885

New Contributors

@nrj868 made their first contribution in #848

Full Changelog: v0.4.0...v0.5.0rc1

Contributors

dante159753, Infinite666, and 14 other contributors

Assets 2

20 Mar 10:13

sumingZero

v0.4.0

bff8e9b

v0.4.0

Highlight

Support SGLang

UCM is now integrated with SGLang, enabling prefix cache offloading to Posix Store to reduce redundant computation and lower TTFT
(see Quickstart-SGLang: https://ucm.readthedocs.io/en/latest/getting-started/quickstart_sglang.html)
(#757)

Refactor PipelineStore for Scalability and Performance

Refactor PipelineStore into a modular, plugin-based architecture with automatic registration and runtime loading (#689)
Improve overall performance through optimized Store implementations (e.g., cache store, posix store) and execution flow (#722, #744, #787)

UCM Connector

UCM now additionally supports advanced parallel paradigms, including PCP / DCP and PP, enabling more flexible and scalable distributed execution (#750)
Improve UCM connector performance by introducing optional event synchronization control (#768)

Inference Enhancement Features

GSAOnDevice sparse attention algorithm has been upgraded with improved performance and accuracy, now fully supporting vLLM / vLLM-Ascend 0.11.0 (#659, #746, #729)
Add support for Rerope in vLLM version 0.11.0 (#686)
Enhance UCM logger compatibility (#760)

Document

Add feature and model support matrix
Extend UCM Store with scalable storage, persistence, and efficient data handling
(see: https://ucm.readthedocs.io/en/latest/developer-guide/extending_store.html)

What's Changed

[fix] Adapt ESA to the LayerWiseConnector by @wangwenxin0312 in #681
[doc] Add Code of Conduct by @yuanzhg078 in #684
[Opt] GsaOnDevice cuda bugfix & optimization by @wangwenxin0312 in #659
[CI] Modify pull request template by @yuanzhg078 in #687
[Feature] rewrite logger module by @Lijiachen1018 in #608
[refactor] Rename global rank and remove broadcast function by @harrisonyhq in #685
[fix] clean code by @Lijiachen1018 in #688
[opt] Refactor PipelineStore for Enhanced Scalability by @mag1c-h in #689
[fix] Fix logger by @Lijiachen1018 in #690
[doc] How to extend UCM Store by @mag1c-h in #692
[CI] logger use zlibstatic by @Lijiachen1018 in #698
[Bugfix] Cherry-pick modify worker_id to distinguish diff workers(#691) by @flesher0813 in #701
[bugfix] rm unavailable lib and fix doc and update patch by @wuhuxiao in #699
[feat] rerope feature for vllm0.11.0 by @xinSky00 in #686
[perf] Reduce directory lock conflicts during batch dumps in PosixStore by @mag1c-h in #707
[bugfix] fix debug log printing by @Lijiachen1018 in #706
[bugfix] Fixed the issue of invalid LocalBuffer pointers in PCStore by @mag1c-h in #715
[bugfix] rerope feature for vllm0.9.2 and git apply merging by @xinSky00 in #708
[CICD] run e2e test in docker by @dante159753 in #712
[Feature] Add readme and dataset in performance and evaluation test by @zzycode1005 in #721
[bugfix] Adaptive modification of llmperf by @Menglths in #719
[Feat] sparse patch for vllm-ascend v0.11.0 by @Infinite666 in #718
[Bugfix]Fix garbled output when tp > 1 by @qyh111 in #716
[perf] Copy Bandwidth Optimize: Multi-Stream parallelism supported in CacheStore by @mag1c-h in #722
[Feat] sparse patch for gsa on device(GQA) va0.11.0rc1 by @Infinite666 in #726
[Feat]Add layerwise and log_path config in run.sh by @qyh111 in #724
[opt] Default depth of the waiting queue needs to be increased by nShard times for layer-wise by @mag1c-h in #731
[Feat] Reuse-aware layer skipping under dynamic KV sparsification by @tedi20 in #725
[opt] Increase the default running queue depth to support greater concurrent requests. by @qyh111 in #733
[Feat]: Monkey patch framework for vllm 0.11.0, fix graph mode + UCM bugs by @NaganooMei in #735
[Feat] Add csrc/ascend NPU custom ops for GSA by @leideng in #729
[feat] Variable length IO supported in CacheStore by @mag1c-h in #734
[Opt] Enable concurrent prefix lookup for posixstore by @sumingZero in #739
[CI] refine docker file to use in yellow field by @dante159753 in #741
[Feat]: Implement load failure recovery via monkey patch by @NaganooMei in #738
[Opt]Split the thread pool into separate load and dump pools to prevent them from interfering with each other. by @qyh111 in #744
[opt] Print TaskId in the CacheStore Error Log by @mag1c-h in #742
[opt]Adapt variable io size by @qyh111 in #745
[Opt] Add log timestamp in run_vllm.sh by @qyh111 in #747
[bugfix & opt] gsaOnDevice for CUDA Graph mode by @wangwenxin0312 in #732
[test & bugfix] fix low dump performance in posixstore e2e test by @NaganooMei in #751
[Fix] Modify the config files of gsaondevice. by @AooooooA-C in #749
[Test] Remove memory manager abstraction in PosixStore e2e test by @NaganooMei in #753
[opt] CUDA Hamming Distance Kernel Optimization for GQA by @wangwenxin0312 in #755
[fix] fix zlib gitcode url by @Lijiachen1018 in #758
[Feature] Integrate UnifiedCache (UCM) into SGLang for Multi-Level Caching System by @pyxyzc in #757
chore(test): Ensure that unnecessary import failures do not affect test execution by @Potterluo in #754
[feat] GSAOnDevice for MLA Models Like DeepSeek V2/V3 in Ascend NPU by @leideng in #746
[Feat] sparse patch for gsa on device(MLA) va0.11.0 by @Infinite666 in #761
[Fix] fix save_speed core dump and loaded blocks num when task failed by @flesher0813 in #763
Fix batch_size_for_hamming bug when slice is disabled (vllm-ascend 0.11.0) by @leideng in #765
[Feat] adapt dcp&pcp by @flesher0813 in #750
[Fix] Add init.py for rerope. by @AooooooA-C in #769
[Refactor]monkey patch sparse feature in v0.11.0 by @ayaka836 in #743
[Opt] update deepseek r1 config by @leideng in #770
[feat] Introduce platform-specific sparse trigger thresholds for GPU and NPU by @wangwenxin0312 in #762
[opt] Define UCM_ROOT_DIR to ensure safety when used UCM as a sub-repository by @mag1c-h in #772
[opt] enable Ascend register pin optimization by @mag1c-h in #775
[fix] remove imports that specific to platform by @dante159753 in #771
[opt] supports lo...

Contributors

leideng, dante159753, and 23 other contributors

Assets 2

30 Jan 08:47

flesher0813

v0.3.0

8dd98d1

v0.3.0

HighLights

Refinement of PipelineStore Architecture and Enhancement of Core Capabilities #653 #711
Now supports 3FS for scalable and efficient storage backends #622
Features the new GSAOnDevice sparse attention algorithm, enabling high-performance HBM utilization across both CUDA and Ascend platforms.#647 #638
Aligned CacheBlend with the new UCM storage and sparse engine updates to support vLLM 0.9.2. #664

Known Issues

Layerwise is not supported when using vllm 0.11.0
- Currently, installing with pip install uc-manager does not support using vllm 0.11.0.
- If you need to use vLLM 0.11.0+ with UCM layerwise, please refer to vllm-project/vllm#26675 for modifications.

What's Changed

[bugfix]cherry-pick from 0.2.0release Fix KeyError by @qyh111 in #573
[bugfix] cherry-pick from 0.2.0release patch update by @wangwenxin0312 in #574
[fix]cherry pick from 0.2.0-release fix monitor issue (#572) by @qyh111 in #575
[bugfix] build hamming dist by @wangwenxin0312 in #577
[feat]Update data file layout to adapt to garbage collection by @qyh111 in #579
[bugfix]cherry pick from 0.2.0-release sparse patch & cmake by @wangwenxin0312 in #581
[bugfix] kvcomp config by @wangwenxin0312 in #584
[feat] KvCompOnDevice: per-KV-head Top-K for Qwen by @wangwenxin0312 in #589
feature for triton rerope by @xinSky00 in #497
[bugfix] kvcomp for qwen by @wangwenxin0312 in #594
[bugfix] share buffer used out (cherry-picked from #592) by @mag1c-h in #598
[fix]cherry-pick clean code and set local_rank_size to tp_size (#596) by @qyh111 in #600
[misc] split dependency preparation logic into individual dependency files for enhanced configuration flexibility by @mag1c-h in #597
[fix]fix clean code (#601) by @qyh111 in #602
Modify blend and rerope docs by @xinSky00 in #593
[docs] Modify blend introduction by @wuhuxiao in #605
add qiongwu as codeownner by @Infinite666 in #610
KVComp in NPU -- HBM version by @leideng in #599
[bugfix] bugfix in PCStore, cherry-pick from release by @mag1c-h in #609
[docs]Add doc for pipeline store by @qyh111 in #607
[fix] remove request_succeed_dumped_blocks() in monkey patch by @xinSky00 in #613
[fix]Sync changes from the release branch to develop. including docs、version and dockerfile by @qyh111 in #621
[feat] Cherry-pick updates from 0.2.0-release to develop (patches and docs) by @wangwenxin0312 in #623
[bugfix] ] Cherry-pick updates from 0.2.0-release (hamming compile) by @wangwenxin0312 in #625
[doc]rename pipline_store to pipeline_store by @qyh111 in #626
[bugfix] fix register_kv_caches patch by @Clarence-1103 in #629
Unify xSA name as GSA by @leideng in #631
[Feature] 3FS Store by @UESTC-AHao in #622
[optimize]Optimized LLMPerf Test Cases by @Potterluo in #634
[Doc] 3FS Document by @UESTC-AHao in #637
[Feat] Basic scripts for deployment best practices by @sumingZero in #556
[feature]Add LLM connection base components and OpenAI connector by @Potterluo in #636
[Bugfix] Fix 3FS by @UESTC-AHao in #650
[feat] PipelineStore Architecture Refresh and Capability Enhancement by @mag1c-h in #653
[doc] Add contributing guide by @yuanzhg078 in #648
[doc]Implement the function of a kv cache calculator html in User Guide by @Potterluo in #652
[Opt] New gsa config by @leideng in #646
[Feat] Support C++/Python to use same metrics singleton within a process by @flesher0813 in #654
[feat]Add Layerwise Connector by @qyh111 in #656
[Fix] Modify ucm_connector to adapt metrics by @flesher0813 in #658
[doc] Update quickstart section in README_zh by @yuanzhg078 in #663
[Feat] Update sparse method patches for vllm 0.11.0 by @AooooooA-C in #638
[CI] add pr gate workflow by @dante159753 in #662
[Opt] Gsa npu performance optimize by @leideng in #647
[misc] Reduce gpu utilization to 6GB in test for 1.5B model by @dante159753 in #665
[feat] add monkey patch for gsa on device v0.9.2 by @Clarence-1103 in #618
[Fix] coredump if add new c++ metrics by @flesher0813 in #666
[opt] adapt cache blend for store and sparse's new version by @wuhuxiao in #664
[Doc] Update documents related to sparse. by @AooooooA-C in #672
[CI] use requirements file to prepare test env by @dante159753 in #673
[test]Evaluate model performance and accuracy with UCM by @ayaka836 in #642
[Fix] Failed to start vLLM service using multi-node launch scripts under CUDA data parallelism by @sumingZero in #670
[CI] remove logger, check branch up-to-date, fast fail e2e test by @dante159753 in #674
release 0.3.0 by @flesher0813 in #677
[bugfix] Fix compilation error due to missing atomic include by @harrisonyhq in #693
[Bugfix] Modify worker_id set to separate different worker by @flesher0813 in #691
[bugfix] rm unavailable lib and fix doc and update patch by @wuhuxiao in #700
[perf] Reduce directory lock conflicts during batch dumps in PosixStore by @mag1c-h in #711

New Contributors

@Infinite666 made their first contribution in #610
@dante159753 made their first contribution in #662
@ayaka836 made their first contribution in #642

Full Changelog: v0.2.0...v0.3.0

Contributors

leideng, dante159753, and 15 other contributors

Assets 2

05 Jan 12:28

qyh111

v0.2.0

39d46c7

v0.2.0

Hightlights

Support Model Window Extrapolation:Rectified Rotary Position Embeddings (ReRoPE)(#497)
Support sparse attention algorithms in HBM on both CUDA GPUs and Ascend NPUs. It sparsifies attention by hashing KV states and using Hamming distance Top-K selection.(#559)
Add Pipeline Store composed of Cache Store and POSIX Store(#553).
Improved KV cache transfer performance for NfsStore.(#393)

Known Issues

Sparse is not supported when installing via pip
- Currently, installing with pip install uc-manager does not support Sparse.
- Before installing via pip, please make sure to set the platform explicitly:
```
export PLATFORM=xxx
```
- To use Sparse, please install via the Docker image or build from source.

What's Changed

[Feature] Add performance and evaluation testing tools using the pytest framework by @zzycode1005 in #462
[Feature] Added environment pre-check by @Menglths in #498
[docs] fix links in docs and add clarifications (#499) by @Lijiachen1018 in #502
[build] rewrite setup.py by @ygwpz in #501
[bugfix] Adapt the patch to support YAML sections. by @wangwenxin0312 in #480
[bugfix] fix pip install -e no so by @ygwpz in #508
[Feature] Cache Blend by @wuhuxiao in #467
merge Feature_store_next to develop by @qyh111 in #518
[bugfix]fix setup.py by @qyh111 in #520
[bugfix]fix setup.py (#520) by @qyh111 in #521
feat(test): Add PostgreSQL support and optimize database write logic by @Potterluo in #507
[fix] move init to intergration/vllm directory by @Lijiachen1018 in #535
[Fix]Add PLATFORM reminder by @zhou-haitao in #526
cherry-pick from 0.1.0-release by @Lijiachen1018 in #552
[Feat] New Store Impl: CacheStore - PosixStore - PipelineStore by @mag1c-h in #553
[Perf] parallel block-existence checks + timeout exception by @mag1c-h in #550
[feat] Shard block files into subdirs by hash prefix, with opt-out switch by @mag1c-h in #561
[feat]use numpy to calculate addrs by @qyh111 in #564
[Bugfix] use-after-free in LookupBatch by @mag1c-h in #565
[Bugfix] skip fresh shm files to avoid race between multiple instances by @mag1c-h in #566
[Bugfix] Fix incorrect fallback in GetHostBuffer: use MakeHostBuffer instead of MakeDeviceBuffer by @mag1c-h in #568
[feat] kvcomp on device by @wangwenxin0312 in #559
[fix]Add exception handling by @qyh111 in #569
[bugfix]Fix KeyError when VLLM_HASH_ATTENTION environment variable is not set by @qyh111 in #570
[bugfix] patch update by @wangwenxin0312 in #571
[fix]fix monitor issue by @qyh111 in #572
[bugfix] build hamming dist by @wangwenxin0312 in #578
[feat] Update data file layout to adapt to garbage collection by @qyh111 in #576
[bugfix] sparse patch & cmake by @wangwenxin0312 in #580
[build]fix spdlog use ext fmt by @Lijiachen1018 in #585
[bugfix] kvcomp fix by @wangwenxin0312 in #586
[feat] KvCompOnDevice: per-KV-head Top-K for Qwen by @wangwenxin0312 in #588
[bugfix] share buffer used out by @mag1c-h in #592
[bugfix] kvcomp for qwen by @wangwenxin0312 in #595
[fix]clean code and set local_rank_size to tp_size by @qyh111 in #596
[fix]fix clean code by @qyh111 in #601
[Bugfix] update block dir permission & double-free fix by @mag1c-h in #603
[bugfix] double-release shared-block while make reader failed by @mag1c-h in #604
[docs]add doc for pipeline store by @qyh111 in #612
[feat] cherry-pick to 0.2.0-release to add rerope by @xinSky00 in #614
fix ascend patch and change version by @qyh111 in #615
add patch in dokerfile-npu by @qyh111 in #617
[feat] cherry-pick KVComp in NPU -- HBM version into the 0.2.0-release branch by @wangwenxin0312 in #619
[feat] update all patch and docs by @wangwenxin0312 in #620
[bugfix] hamming compile by @wangwenxin0312 in #624

New Contributors

@zzycode1005 made their first contribution in #462

Full Changelog: v0.1.2...v0.2.0

Contributors

mag1c-h, Lijiachen1018, and 9 other contributors

Assets 2

13 Dec 13:43

qyh111

v0.2.0rc1

bad9354

v0.2.0rc1 Pre-release

Pre-release

Hightlights

Improved Prefix Cache offload/load performance.
Support Cache Blend.

Core:

Support Cache Blend in (#467)
Add V1 Store Interface in (#510) and (#518)

Known Issues

When using the Ascend platform:
- Broadcasting is not supported.
- load_only_first_rank must be set to false in the configuration.
When compiling from source, make sure to set the PLATFORM environment variable.

What's Changed

[Feature] Add performance and evaluation testing tools using the pytest framework by @zzycode1005 in #462
[Feature] Added environment pre-check by @Menglths in #498
[docs] fix links in docs and add clarifications (#499) by @Lijiachen1018 in #502
[build] rewrite setup.py by @ygwpz in #501
[bugfix] Adapt the patch to support YAML sections. by @wangwenxin0312 in #480
[bugfix] fix pip install -e no so by @ygwpz in #508
[Feature] Cache Blend by @wuhuxiao in #467
merge Feature_store_next to develop by @qyh111 in #518
[bugfix]fix setup.py by @qyh111 in #520

New Contributors

@zzycode1005 made their first contribution in #462
@wuhuxiao made their first contribution in #467

Full Changelog: v0.1.2...v0.2.0rc1

Contributors

Lijiachen1018, ygwpz, and 5 other contributors

Assets 2

10 Dec 07:56

Lijiachen1018

v0.1.2

aa31619

v0.1.2

Some small fixes in this release.

[Docs] Documents are now easier to read.
[Docs] PD disaggregation documentation update : Update the PD disaggregation documentation to remove the --enforce-eager argument when starting the vllm service, so that graph mode is enabled by default at startup.
[Feat] Completely remove UCconnector, please use UCMConnector from now on.
[Feat] UCM supports recovery form load failure：Implement the get_block_ids_with_load_errors interface in the KVConnectorBase_V1 class, enabling vLLM to reexecute inference for requests whose KV cache failed to load from UCM.
[Build] Use pip install uc-manager==0.1.2 and the install will build from source for both vllm and vllm-ascend.
[Build] Sparse module are now built and used only if set environment variable export ENABLE_SPARSE=TRUE.

What's Changed

[cleancode]rm video by @Lijiachen1018 in #459
[fix] pick fixes from Release to develop by @Lijiachen1018 in #465
[cleancode]remove uc connector by @Lijiachen1018 in #460
[build] project docs for pypi by @Lijiachen1018 in #466
[build]build sparse only if enabled by @Lijiachen1018 in #470
[Misc] fetch dependence from gitcode as backup by @mag1c-h in #469
[docs] renew docs by @Lijiachen1018 in #476
release v0.1.1 by @Lijiachen1018 in #478
feat: add MetaX MACA device support for PC by @simshi in #387
[Docs] PD disaggregation documentation update by @sumingZero in #479
[Feat] UCM supports recovery form load failure by @sumingZero in #477
[feat]Add configurable scattergatter by @qyh111 in #483
[bugfix]add synchronize on ascend platform by @qyh111 in #485
[build] fix build by source distribution by @Lijiachen1018 in #484
release v0.1.2 by @Lijiachen1018 in #491
develop merge into main by @ygwpz in #492
[docs] fix links in docs and add clarifications by @Lijiachen1018 in #499

New Contributors

@simshi made their first contribution in #387

Full Changelog: v0.1.0...v0.1.2

Contributors

simshi, mag1c-h, and 4 other contributors

Assets 2

02 Dec 08:42

Lijiachen1018

v0.1.0

5ba2684

v0.1.0

We are excited to announce the first official release of Unified Cache Manager.

Hightlights

Offload Prefix Cache to storage.
Homogeneous/ Heterogeneos PD disaggregation.
Training-Free sparsity in accelerating inference.（vllm==0.9.2, vllm-ascend==0.9.2rc1）in #199, #335, #190, #451

Core:

Garbage collection for store in #315 and #312
Adapt to vllm and vllm-ascend in #13, #292, #415 and #362
UCM supports metrics display online via Grafana and Promethues in #414 and docs in #416

Known Issues

If using Ascend platform, please be mind of

not compatible with broadcast
load_only_first_rank: false in config

Others

Update documents
Tools for performance tuning, hyperparameter optimization in #418

What's Changed

[opt] Share Infra implementation and unify status codes by @mag1c-h in #399
[bugfix] Fix ESA to be compatible with the latest NFSStore. by @wangwenxin0312 in #401
release v0.1.0rc4 by @Lijiachen1018 in #402
[opt] Remove unused cc impl of dramstore by @mag1c-h in #406
[Fix]remove dram docs and modify quick-start doc by @hero0307 in #411
[Feature] Added performance testing tool based on the PyTest testing framework by @Menglths in #295
[Misc] Add cpp-linter.yml by @mag1c-h in #422
[docs]add metrics doc by @hero0307 in #416
[perf] Modify CUDA SIMD and add Triton hash encoder by @Clarence-1103 in #408
[bugfix] batch trans on cuda with SM return 700 error by @mag1c-h in #434
[Misc] set default logger backend to spdlog by @mag1c-h in #440
[rebase]Dev-ucm-v1 rebase to develop by @Lijiachen1018 in #453
[cleancode] remove dramstore by @Lijiachen1018 in #455
Fix metrics by @Lijiachen1018 in #456

New Contributors

@Menglths made their first contribution in #295

Full Changelog: v0.1.0rc4...v0.1.0

Contributors

mag1c-h, Lijiachen1018, and 4 other contributors

Assets 4

22 Nov 10:16

Lijiachen1018

v0.1.0rc4

5779ce9

v0.1.0rc4 Pre-release

Pre-release

What's Changed

[feat] ucmtrans: Unify API for Device-Host Memory Transfers by @mag1c-h in #379
[feat] Add support for Ascend device memory transfers by @mag1c-h in #382
[Fix] fix build, fix no save kv layer by @Lijiachen1018 in #390
[feat] Add pcstore for enhanced PrefixCache performance by @FangRun2 in #393
[fix] fix ascend attention by @Lijiachen1018 in #394
release v0.1.0rc3 by @Lijiachen1018 in #395
[fix] fix sparse attention by @Lijiachen1018 in #397

New Contributors

@FangRun2 made their first contribution in #393

Full Changelog: v0.1.0rc2...v0.1.0rc4

Contributors

mag1c-h, Lijiachen1018, and FangRun2

Assets 3

19 Nov 08:01

Lijiachen1018

v0.1.0rc2

16ed5da

v0.1.0rc2 Pre-release

Pre-release

What's Changed

[docs] update docs for v0.1.0rc1 by @Lijiachen1018 in #365
[bug fix] Dev patch fix for sparse by @Lijiachen1018 in #371
[build] auto patch for ascend by @Lijiachen1018 in #372
feat: add Mthreads MUSA device support -stage 1 by @superleo in #370
release v0.1.0rc2 by @Lijiachen1018 in #373
prefetch bug by @zbb200819 in #360
[Feat]Adapt to vllm-ascend0.9.1 and vllm-ascend0.11.0 by @hero0307 in #362
[bugfix] add cmake option to bypass NUMA binding by @Clarence-1103 in #368
[Feat] Update the data items saved by trace replay by @sumingZero in #366

New Contributors

@superleo made their first contribution in #370

Full Changelog: v0.1.0rc1...v0.1.0rc2

Contributors

superleo, zbb200819, and 4 other contributors

Assets 3

Releases: ModelEngine-Group/unified-cache-management

v0.5.0

Highlights

1. DeepSeek V4 Flash-oriented integration: HMA + FAWA

2. Performance: Layerwise & Cache Store

3. Compress Store

4. SGLang

5. Other (stability / observability / tooling / engineering)

What's Changed

Contributors

Uh oh!

v0.5.0rc1

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0

Highlight

Document

What's Changed

Contributors

Uh oh!

v0.3.0

HighLights

Known Issues

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0

Hightlights

Known Issues

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0rc1

Hightlights

Core:

Known Issues

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

Hightlights

Core:

Known Issues

Others

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0rc4

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0rc2

What's Changed

New Contributors

Contributors

Uh oh!