Skip to content

microsoft/G-KV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

G-KV

Introduction

This library provides comprehensive support for various KV cache compression algorithms, including H2O, SnapKV, R-KV, StreamingLLM, and the proposed G-KV. It is compatible with a wide range of models, such as the Qwen 2 series, Qwen3 (inference only), and the Llama series (versions 1 to 3).

The library also supports post-training for KV cache compression models. It includes a complete GRPO reinforcement learning pipeline, enabling generation with KV cache compression and constructing sparse attention masks for training. Additionally, the library offers pipelines for supervised fine-tuning (SFT) and distillation training, ensuring adaptability and optimization of models under KV cache compression settings.

Environment

python >= 3.10

pip install -r requirement.txt
pip install flash-attn==2.7.4.post1 --no-cache-dir

Quick Start

The scripts contain detailed descriptions of parameter settings.

Inference

bash scripts/inference.sh

Train (SFT or Distillation)

bash scripts/sft.sh

Train (RL)

bash scripts/rl.sh

evaluate on LiveCodeBench

python datasets/lcb_precess.py

bash scripts/lcb_eval.sh

Citation

@misc{liao2025gkvdecodingtimekvcache,
      title={G-KV: Decoding-Time KV Cache Eviction with Global Attention}, 
      author={Mengqi Liao and Lu Wang and Chaoyun Zhang and Zekai Shen and Xiaowei Mao and Si Qin and Qingwei Lin and Saravan Rajmohan and Dongmei Zhang and Huaiyu Wan},
      year={2025},
      eprint={2512.00504},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.00504}, 
}

About

G-KV, a KV cache eviction method that employs a global scoring mechanism, combining local and historical attention scores to more accurately assess token importance.

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors