Evolving Sparsity: Leveraging Token Importance Dynamics for Efficient LLM Decoding with Sparse Attention
Dynamic sparse attention that evolves across steps and layers to deliver high-performance, low-latency long-context LLM inference.
Ruizi Han1, 2, Miao Zhang1*, Ziyue Qiao2*, Liqiang Nie1
1 Harbin Institute of Technology (Shenzhen)
2 Great Bay University
* Co-corresponding Authors
- Code Repository:
GitHub
- [04/2026] Initial release
- Complete the code repository
This project is released under the Apache License 2.0.
