You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The fixed ~4 KB prefix doubled a 1K-token input. Now the prefix
adapts: ~5% of prompt length, clamped to [128, 4096] chars
(~32–1024 tokens). Short prompts get a small prefix, long
prompts still span enough KV-cache blocks to reliably miss.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
4
4
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/), and this project adheres to [Semantic Versioning](https://semver.org/).
6
6
7
+
## [Unreleased]
8
+
9
+
### Fixed
10
+
- Cache hit rate: prefix size now adapts to prompt length (~5%, clamped 128–4096 chars) to avoid inflating short prompts — previously a fixed ~4 KB prefix would double a 1K-token input
0 commit comments