[GenAI] Support Token Eviction #1010

l-bat · 2025-10-10T11:04:19Z

Changes

Integration of KV Cache Token Eviction algorithm to GenAI Optimizations Module in OpenVINO Contrib.
Token Eviction is designed to optimize KV cache memory usage during autoregressive generation in LLMs. It selectively removes less important cached tokens while preserving those crucial for contextual understanding, enabling efficient long-sequence inference under constrained memory.

Supported modes:

H2O Mode – Evicts tokens using the Heavy-Hitter Oracle strategy, which accumulates attention scores to identify and retain high-impact tokens. Paper: https://arxiv.org/pdf/2306.14048
SnapKV Mode – Modifies the H2O approach by computing token importance within a small sliding window of the most recent queries during the prefill stage, then reverting to the H2O strategy during decoding. Paper: https://arxiv.org/pdf/2404.14469

Related tickets

169957

l-bat added 2 commits October 9, 2025 16:33

[GenAI] Support Token Eviction

d5469a8

Update score aggragation

1d284fb

l-bat requested a review from a team as a code owner October 10, 2025 11:04

apaniukov approved these changes Oct 10, 2025

View reviewed changes

apaniukov enabled auto-merge (squash) October 10, 2025 11:40

apaniukov merged commit fda6fb5 into openvinotoolkit:master Oct 10, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GenAI] Support Token Eviction #1010

[GenAI] Support Token Eviction #1010

Uh oh!

l-bat commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[GenAI] Support Token Eviction #1010

[GenAI] Support Token Eviction #1010

Uh oh!

Conversation

l-bat commented Oct 10, 2025

Changes

Related tickets

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants