Generic KvCache #3188

ivarflakstad · 2025-11-14T18:42:46Z

This should let us (and users of candle) easily switch out KvCache implementations in models.

I explored using dyn KvCache instead, but it quickly became a pain, so generics it is.

…tions in models

…default behaviour

…dle into generic-kvcache-proposal

McPatate · 2025-11-21T14:36:42Z

candle-nn/src/kv_cache.rs

+pub trait KvCache {
+    type Mask;
+    fn new(dim: usize, max_seq_len: usize) -> Self;
+    fn append(&mut self, k: &Tensor, v: &Tensor) -> Result<(Tensor, Tensor)>;


Does append insert into the cache?

Probably worth adding some doc strings here

I guess it can depend on the cache implementation, but typically you append to the kv cache. It's not like your typical cache ala hashmap where you insert by key.

McPatate · 2025-11-21T14:37:52Z

candle-nn/src/kv_cache.rs

    dim: usize,
    current_seq_len: usize,
-    grow_by: usize,
+    increment: usize,


why the change?

McPatate · 2025-11-21T14:40:27Z

candle-nn/src/kv_cache.rs

    pub fn append(&mut self, k: &Tensor, v: &Tensor) -> Result<(Tensor, Tensor)> {
-        self.k.append(k)?;
-        self.v.append(v)?;
+        self.k.append(&k.contiguous()?)?;


Curious to know what the rationale for adding contiguous here is 👀

It's the default behaviour in the model code I used as reference. Is there any reason it shouldn't call contiguous?
I could omit it but it improves performance for the cache impls I tested, and calling contiguous multiple times has almost zero cost (if tensor is already contiguous it just clones an Arc)

McPatate · 2025-11-21T14:42:13Z

candle-nn/src/kv_cache.rs

+    fn append_with_mask(
+        &mut self,
+        k: &Tensor,
+        v: &Tensor,
+        _mask: Option<&Self::Mask>,
+    ) -> Result<(Tensor, Tensor)> {


What's the point of having mask being an Option here when we already have append[_without_mask]?

True.
I initially added optional mask to append and later realized it is better expressed as a separate fn.

McPatate · 2025-11-21T14:45:30Z

candle-nn/src/kv_cache.rs

+    }
+
+    fn append(&mut self, k: &Tensor, v: &Tensor) -> Result<(Tensor, Tensor)> {
+        self.append_with_mask(k, v, None)


see here we could just call candle::bail! instead of having to do that in append_with_mask. feels a bit clunky

McPatate · 2025-11-21T14:47:19Z

candle-nn/src/kv_cache.rs

+
 #[derive(Debug, Clone)]
-pub struct Cache {
+pub struct InnerCache {


I think Cache was ok as a name, although I get the sentiment

Yeah it's only used in one kv cache variant, and not even the default one, so it feels wrong that it should be the Cache

McPatate · 2025-11-21T14:51:39Z

candle-transformers/src/models/quantized_gemma3.rs

 }

-impl LayerWeights {
+impl<C: KvCache> LayerWeights<C> {


Does it need to be generic? It feels like we're expecting a specific behaviour from the cache

Sometimes you want the KvCache variant that strictly has the highest throughput. Sometimes you want one that is more careful about how it consumes memory. Etc

McPatate · 2025-11-21T14:52:09Z

candle-transformers/src/models/quantized_gemma3.rs


 #[derive(Debug, Clone)]
-pub struct ModelWeights {
+pub struct ModelWeights<C: KvCache = DefaultKvCache> {


I assume this means "if not specified, use DefaultKvCache"?

Yes!
By default you get ConcatKvCache, which has the highest throughput but grows indefinitely (until you reset() in the model impl)

Introduce KvCacheTrait so we can easily switch out KvCache implementa…

b68f8cb

…tions in models

ivarflakstad requested a review from EricLBuehler November 14, 2025 18:42

ivarflakstad mentioned this pull request Nov 14, 2025

Support Review of ConcatKvCache (#3143) and Plan for Future Adoption #3181

Open

ivarflakstad added 11 commits November 15, 2025 13:48

Add associated Mask type to KvCache and new fn append_with_mask with …

62353eb

…default behaviour

Rename KvCache -> IncrementalKvCache

572c09f

Rename KvCacheTrait -> KvCache

459539f

Rename Cache -> InnerCache

60e1a01

Add KvCache impl RotatingKvCache

e8c6563

Impl KvCache for ScatteredKvCache

6aa56be

Merge branch 'main' into generic-kvcache-proposal

459f191

Finish wiring up generic KvCache in Qwen and quantized Qwen models

e50656d

Merge branch 'generic-kvcache-proposal' of github.com:huggingface/can…

e3feb24

…dle into generic-kvcache-proposal

Add another example of how to rewrite a model to use generic KvCache

1f0cae4

Merge branch 'main' into generic-kvcache-proposal

9cee4bd

ivarflakstad marked this pull request as ready for review November 20, 2025 11:33

ivarflakstad requested a review from McPatate November 21, 2025 08:50

McPatate reviewed Nov 21, 2025

View reviewed changes

Generic KvCache #3188

Are you sure you want to change the base?

Generic KvCache #3188

Uh oh!

Conversation

ivarflakstad commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ivarflakstad commented Nov 14, 2025 •

edited

Loading