Skip to content

v0.2.0

Choose a tag to compare

@qyh111 qyh111 released this 05 Jan 12:28
· 258 commits to develop since this release
39d46c7

Hightlights

  • Support Model Window Extrapolation:Rectified Rotary Position Embeddings (ReRoPE)(#497)
  • Support sparse attention algorithms in HBM on both CUDA GPUs and Ascend NPUs. It sparsifies attention by hashing KV states and using Hamming distance Top-K selection.(#559)
  • Add Pipeline Store composed of Cache Store and POSIX Store(#553).
  • Improved KV cache transfer performance for NfsStore.(#393)

Known Issues

  • Sparse is not supported when installing via pip
    • Currently, installing with pip install uc-manager does not support Sparse.
    • Before installing via pip, please make sure to set the platform explicitly:
      export PLATFORM=xxx
    • To use Sparse, please install via the Docker image or build from source.

What's Changed

New Contributors

Full Changelog: v0.1.2...v0.2.0