New feature about KV cache truncate and sliding window attention for all models #12388

ai-easy-cpu · 2025-10-30T08:45:24Z

ai-easy-cpu
Oct 30, 2025

KV cache truncate at runtime (sliding window attention) seems to be a easy workaround to trade some quality with efficiency in long-context, where RAM is limited and caches are all evicted.

Currently I do not see any server config on this. Is this a possible new feature?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New feature about KV cache truncate and sliding window attention for all models #12388

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

New feature about KV cache truncate and sliding window attention for all models #12388

Uh oh!

ai-easy-cpu Oct 30, 2025

Replies: 0 comments

ai-easy-cpu
Oct 30, 2025