Is your feature request related to a problem? Please describe.
I'd like to be able to roll back a small number of KV cache updates, without having to checkpoint the full cache. Arises in different situations:
- Evaluation: Compute different metrics, given the same prompt. Or sample different responses
- Speculative decoding
Describe the solution you'd like
This is essentially just a special case of the annotations used in autograd_hooks.py. We need to store index and delta for every update, so that cache buffers can be restored. Ideally, this is supposed by some methods in the KVCache base class.
Is your feature request related to a problem? Please describe.
I'd like to be able to roll back a small number of KV cache updates, without having to checkpoint the full cache. Arises in different situations:
Describe the solution you'd like
This is essentially just a special case of the annotations used in
autograd_hooks.py. We need to storeindexanddeltafor every update, so that cache buffers can be restored. Ideally, this is supposed by some methods in theKVCachebase class.