🚨 [generate] Never use cache_position anymore in generation#44816
🚨 [generate] Never use cache_position anymore in generation#44816Cyrilvallez wants to merge 11 commits intomainfrom
cache_position anymore in generation#44816Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
cache_position anymore in generationcache_position anymore in generation
cache_position anymore in generationcache_position anymore in generation
zucchini-nlp
left a comment
There was a problem hiding this comment.
While I still remember it, let's kick it out from docs as well pls and if needed add correct examples with cache
|
run-slow: dia |
|
This comment contains models: ["models/dia"] |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: csm, dia, ernie4_5_vl_moe, glm46v, glm4v, glm4v_moe, glm_image, glm_ocr, janus, paddleocr_vl, qwen2_5_omni, qwen2_5_vl, qwen2_vl, qwen3_5, qwen3_5_moe, qwen3_vl |
vasqu
left a comment
There was a problem hiding this comment.
Not approving yet because I want to discuss the deprecation a bit more
- I still found references where imo they shouldnt be there
- Do we keep cache positions in the generate preparation (alias for position ids): I think we will get a remote model apocalypse otherwise and vLLM already showed how brittle it is
| # build `cache_position` on the fly | ||
| seq_length = inputs["input_ids"].shape[1] | ||
| inputs = self.model._get_initial_cache_position(seq_length, self.model.device, inputs) |
There was a problem hiding this comment.
Just for my mind, can we run-slow with whisper
| # Cache position (always 1D) | ||
| if (cache_position := model_kwargs.get("cache_position")) is not None: | ||
| next_cache_position = ( | ||
| torch.arange(num_new_tokens, dtype=cache_position.dtype, device=cache_position.device) | ||
| + cache_position[-1] | ||
| + 1 | ||
| ) | ||
| next_cache_position = torch.cat((cache_position, next_cache_position)) | ||
| model_kwargs["cache_position"] = next_cache_position |
There was a problem hiding this comment.
I think complete removal for cache position will be too breaking for remote code and there is still quite a lot, just looking at all the vLLM stuff we had to fix 😭
Shouldn't pos ids be the same as cache positions now? What do you think about passing this as alias kwarg as well - we really need to check with a remote model, e.g. deepseek v3 remote code maybe?
There was a problem hiding this comment.
General question: Are we deprecating it everywhere?
I think I still see a few ocurrences:
- Mask creation
- Within this executorch integration
- Models
- Lfm2
- Ministral3
- Mistral4
- Tests
Imo, only the mask might be critical and might be kept a bit longer. Wdyt?
What does this PR do?
As per the title. This is the last of many PR to remove the
cache_position. At this point, all the models were already updated to not use them, and they are fully ignored in all the modelings. So this removes their creation and usage ingenerate, so they are not passed askwarganywhere anymore.This is fully safe as all the models already ignore them.
Note: the 🚨 marker is ONLY FOR REMOTE CODE. On the main repo, all models were previously adapted as explained, so no BC issues. For remote code however, as most things, it can break if the code is using
cache_positionin weird way and do not provide a creation fallback inside the model.