v0.2.0
Hightlights
- Support Model Window Extrapolation:Rectified Rotary Position Embeddings (ReRoPE)(#497)
- Support sparse attention algorithms in HBM on both CUDA GPUs and Ascend NPUs. It sparsifies attention by hashing KV states and using Hamming distance Top-K selection.(#559)
- Add Pipeline Store composed of Cache Store and POSIX Store(#553).
- Improved KV cache transfer performance for NfsStore.(#393)
Known Issues
- Sparse is not supported when installing via pip
- Currently, installing with
pip install uc-managerdoes not support Sparse. - Before installing via pip, please make sure to set the platform explicitly:
export PLATFORM=xxx - To use Sparse, please install via the Docker image or build from source.
- Currently, installing with
What's Changed
- [Feature] Add performance and evaluation testing tools using the pytest framework by @zzycode1005 in #462
- [Feature] Added environment pre-check by @Menglths in #498
- [docs] fix links in docs and add clarifications (#499) by @Lijiachen1018 in #502
- [build] rewrite setup.py by @ygwpz in #501
- [bugfix] Adapt the patch to support YAML sections. by @wangwenxin0312 in #480
- [bugfix] fix pip install -e no so by @ygwpz in #508
- [Feature] Cache Blend by @wuhuxiao in #467
- merge Feature_store_next to develop by @qyh111 in #518
- [bugfix]fix setup.py by @qyh111 in #520
- [bugfix]fix setup.py (#520) by @qyh111 in #521
- feat(test): Add PostgreSQL support and optimize database write logic by @Potterluo in #507
- [fix] move init to intergration/vllm directory by @Lijiachen1018 in #535
- [Fix]Add PLATFORM reminder by @zhou-haitao in #526
- cherry-pick from 0.1.0-release by @Lijiachen1018 in #552
- [Feat] New Store Impl: CacheStore - PosixStore - PipelineStore by @mag1c-h in #553
- [Perf] parallel block-existence checks + timeout exception by @mag1c-h in #550
- [feat] Shard block files into subdirs by hash prefix, with opt-out switch by @mag1c-h in #561
- [feat]use numpy to calculate addrs by @qyh111 in #564
- [Bugfix] use-after-free in LookupBatch by @mag1c-h in #565
- [Bugfix] skip fresh shm files to avoid race between multiple instances by @mag1c-h in #566
- [Bugfix] Fix incorrect fallback in GetHostBuffer: use MakeHostBuffer instead of MakeDeviceBuffer by @mag1c-h in #568
- [feat] kvcomp on device by @wangwenxin0312 in #559
- [fix]Add exception handling by @qyh111 in #569
- [bugfix]Fix KeyError when VLLM_HASH_ATTENTION environment variable is not set by @qyh111 in #570
- [bugfix] patch update by @wangwenxin0312 in #571
- [fix]fix monitor issue by @qyh111 in #572
- [bugfix] build hamming dist by @wangwenxin0312 in #578
- [feat] Update data file layout to adapt to garbage collection by @qyh111 in #576
- [bugfix] sparse patch & cmake by @wangwenxin0312 in #580
- [build]fix spdlog use ext fmt by @Lijiachen1018 in #585
- [bugfix] kvcomp fix by @wangwenxin0312 in #586
- [feat] KvCompOnDevice: per-KV-head Top-K for Qwen by @wangwenxin0312 in #588
- [bugfix] share buffer used out by @mag1c-h in #592
- [bugfix] kvcomp for qwen by @wangwenxin0312 in #595
- [fix]clean code and set local_rank_size to tp_size by @qyh111 in #596
- [fix]fix clean code by @qyh111 in #601
- [Bugfix] update block dir permission & double-free fix by @mag1c-h in #603
- [bugfix] double-release shared-block while make reader failed by @mag1c-h in #604
- [docs]add doc for pipeline store by @qyh111 in #612
- [feat] cherry-pick to 0.2.0-release to add rerope by @xinSky00 in #614
- fix ascend patch and change version by @qyh111 in #615
- add patch in dokerfile-npu by @qyh111 in #617
- [feat] cherry-pick KVComp in NPU -- HBM version into the 0.2.0-release branch by @wangwenxin0312 in #619
- [feat] update all patch and docs by @wangwenxin0312 in #620
- [bugfix] hamming compile by @wangwenxin0312 in #624
New Contributors
- @zzycode1005 made their first contribution in #462
Full Changelog: v0.1.2...v0.2.0