Gemma 4: checkpoint revert on reasoning / thinking content #21836

emansom · 2026-04-13T02:15:10Z

emansom
Apr 13, 2026

Following from discussion in #21760

I'm analyzing Google's LiteRT-LM source code on how they handle things, to ensure llama.cpp handles the Gemma 4 model correctly.

They are "filtering" reasoning / thinking content from the KV cache, is llama.cpp doing the same?
Filtering as in, reverting checkpoints afaict?
And should llama.cpp be doing that? Or is this just a hack/awful workaround in LiteRT-LM?

https://github.com/google-ai-edge/LiteRT-LM/blob/176953bf882e25f67f2f7e089e9326f8ddd262f9/runtime/conversation/conversation.cc#L437-L448
https://github.com/google-ai-edge/LiteRT-LM/blob/176953bf882e25f67f2f7e089e9326f8ddd262f9/runtime/conversation/conversation.cc#L495-L506
https://github.com/google-ai-edge/LiteRT-LM/blob/176953bf882e25f67f2f7e089e9326f8ddd262f9/runtime/conversation/conversation.cc#L548-L558

https://github.com/google-ai-edge/LiteRT-LM/blob/176953bf882e25f67f2f7e089e9326f8ddd262f9/runtime/conversation/conversation.h#L169-L171

https://github.com/google-ai-edge/LiteRT-LM/blob/176953bf882e25f67f2f7e089e9326f8ddd262f9/runtime/conversation/conversation_test.cc#L312
https://github.com/google-ai-edge/LiteRT-LM/blob/176953bf882e25f67f2f7e089e9326f8ddd262f9/runtime/conversation/conversation_test.cc#L940
https://github.com/google-ai-edge/LiteRT-LM/blob/176953bf882e25f67f2f7e089e9326f8ddd262f9/runtime/conversation/conversation_test.cc#L1552

Source code comments that explain their thinking/reasoning checkpointing/reverting/filtering:

https://github.com/google-ai-edge/LiteRT-LM/blob/176953bf882e25f67f2f7e089e9326f8ddd262f9/runtime/conversation/conversation.cc#L495C10-L499C48

// If the assistant message contains channel content, set the
// checkpoint message index. This indicates the session should be
// rewound to this message and prefilled again when another user
// message is sent to the model. The session checkpoint itself was
// already saved right before decode.
if (config_.filter_channel_content_from_kv_cache() &&
    session_checkpoint_supported_ &&
    !checkpoint_message_index_.has_value() &&
    complete_message.contains(kChannelsKey)) {
  checkpoint_message_index_ = history_.size() - 1;
}

https://github.com/google-ai-edge/LiteRT-LM/blob/176953bf882e25f67f2f7e089e9326f8ddd262f9/runtime/conversation/conversation.cc#L548C17-L549C30

// Before running decode, save a checkpoint for channel content
// filtering.
if (config_.filter_channel_content_from_kv_cache() &&
    session_checkpoint_supported_ &&
    !checkpoint_message_index_.has_value()) {
  // Save checkpoint in case we need to rewind later.
  if (!session_->SaveCheckpoint(kChannelContentCheckpoint)
           .ok()) {
    session_checkpoint_supported_ = false;
  }
}

emansom · 2026-04-13T02:55:39Z

emansom
Apr 13, 2026
Author

Another idea by @aldehir here #21760 (comment) was to mask logits in adherence to a model format specific state machine.

That could be a more model agnostic approach to what LiteRT-LM does?

0 replies

emansom · 2026-04-13T09:13:45Z

emansom
Apr 13, 2026
Author

@osanseviero ping

0 replies

matthewchan-g · 2026-04-21T17:30:15Z

matthewchan-g
Apr 21, 2026

The KV cache "filtering" is how LiteRT-LM handles this part of the Gemma 4 documentation:

Managing Thought Context Between Turns

Standard Multi-Turn Conversations: You must remove (strip) the model's generated thoughts from the previous turn before passing the conversation history back to the model for the next turn. If you want to disable thinking mode mid-conversation, you can remove the <|think|> token when you strip the previous thoughts.

The Gemma 4 chat template assumes the entire conversation history will be re-run on every turn. But LiteRT-LM maintains the KV cache from turn-to-turn, which will include tokens generated by the model. In order to go back and remove the thinking content from previous turns, we rewind the KV cache and re-prefill the model and tool turns, with thinking content stripped out.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma 4: checkpoint revert on reasoning / thinking content #21836

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Gemma 4: checkpoint revert on reasoning / thinking content #21836

Uh oh!

Uh oh!

emansom Apr 13, 2026

Replies: 3 comments

Uh oh!

emansom Apr 13, 2026 Author

Uh oh!

emansom Apr 13, 2026 Author

Uh oh!

matthewchan-g Apr 21, 2026

emansom
Apr 13, 2026

emansom
Apr 13, 2026
Author

emansom
Apr 13, 2026
Author

matthewchan-g
Apr 21, 2026