Skip to content

Conversation

@l-bat
Copy link
Contributor

@l-bat l-bat commented Oct 10, 2025

Changes

Integration of KV Cache Token Eviction algorithm to GenAI Optimizations Module in OpenVINO Contrib.
Token Eviction is designed to optimize KV cache memory usage during autoregressive generation in LLMs. It selectively removes less important cached tokens while preserving those crucial for contextual understanding, enabling efficient long-sequence inference under constrained memory.

Supported modes:

  • H2O Mode – Evicts tokens using the Heavy-Hitter Oracle strategy, which accumulates attention scores to identify and retain high-impact tokens. Paper: https://arxiv.org/pdf/2306.14048
  • SnapKV Mode – Modifies the H2O approach by computing token importance within a small sliding window of the most recent queries during the prefill stage, then reverting to the H2O strategy during decoding. Paper: https://arxiv.org/pdf/2404.14469

Related tickets

169957

@l-bat l-bat requested a review from a team as a code owner October 10, 2025 11:04
@apaniukov apaniukov enabled auto-merge (squash) October 10, 2025 11:40
@apaniukov apaniukov merged commit fda6fb5 into openvinotoolkit:master Oct 10, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants