Commit 9081de3
authored
Fix prefix cache restore to set KV offset explicitly (#144)
This PR is:
- To make prefix-cache restore robust by explicitly restoring
`KVCache.offset` from cached KV tensor length.
- To avoid relying on `KVCache.state` setter side-effects for position
state.
- To keep RoPE position continuity correct after prefix-cache hits.
- To add a focused regression test that fails if offset is not
explicitly restored.
### Additional note
Restoring only `state` is not sufficient if the cache implementation
does not update `offset` as a side-effect. If `offset` remains `0` after
restore, subsequent decode can use incorrect positions after a prefix
cache hit.
### Reproduce code
```python
from unittest.mock import MagicMock
import mlx.core as mx
import vllm_metal.v1.model_runner as mr
class KVNoOffsetSideEffect:
# Simulate a cache object where assigning .state does NOT update .offset.
def __init__(self):
self._state = [None, None]
self.offset = 0
@Property
def state(self):
return self._state
@state.setter
def state(self, value):
self._state = value
def fake_make_prompt_cache(_):
# Restore will create fresh cache layers from this factory.
return [KVNoOffsetSideEffect()]
orig_kv, orig_make = mr.KVCache, mr.make_prompt_cache
mr.KVCache, mr.make_prompt_cache = KVNoOffsetSideEffect, fake_make_prompt_cache
try:
# Note: token_ids length (3) is intentionally different from KV seq_len (7).
# This shows offset restore comes from KV shape, not token_ids metadata.
k = mx.zeros((1, 2, 7, 8), dtype=mx.float32)
v = mx.zeros((1, 2, 7, 8), dtype=mx.float32)
cached = mr.CachedPrefix(token_ids=[1, 2, 3], cache_state=[(k, v)])
restored = mr.PrefixCacheManager(max_bytes=1024 * 1024).restore_cache(
cached, model=MagicMock(), is_vlm=False
)
# Expected output after fix: restored_offset=7
print("restored_offset=", restored[0].offset)
finally:
mr.KVCache, mr.make_prompt_cache = orig_kv, orig_make
```
Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>1 parent 97a2844 commit 9081de3
2 files changed
Lines changed: 38 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
96 | 131 | | |
97 | 132 | | |
98 | 133 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
278 | 278 | | |
279 | 279 | | |
280 | 280 | | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
281 | 284 | | |
282 | 285 | | |
283 | 286 | | |
| |||
0 commit comments