scheduler: preserve prompt checkpoints in chunked prefill resume path#221
scheduler: preserve prompt checkpoints in chunked prefill resume path#221krystophny wants to merge 2 commits intowaybarrios:mainfrom
Conversation
|
Critical review found a blocking scheduler issue: the chunked-prefill path still does not invoke |
|
This is the right approach. We had #194 open for the same crash (one-line unpack fix) but just closed it -- your PR is the only one of the four submissions that actually preserves checkpoint semantics through the chunked prefill lifecycle instead of silently discarding them. The checkpoint-aware finalization step (feeding remaining checkpoint tokens through the model before generation) is what the others all miss. Happy to test once the callback gap is resolved. We run chunked prefill on Qwen3.5-122B (M2 Ultra 128GB) with 2048-token prefill steps. |
The upstream BatchGenerator contract requires prompt_checkpoint_callback to fire after cache finalization, before the checkpoint tail model call. The chunked-prefill monkeypatch preserved the checkpoint field but never invoked the callback, breaking the upstream checkpoint contract. Wire _lazy_extract_cache from mlx-lm and invoke the callback at the correct semantic boundary. Add regression test verifying the callback fires with the correct uid and checkpoint offset.
Summary
Preserve
prompt_checkpointsacross chunked-prefill partial resume and finalization formlx-lmprompt tuples, and invoke the upstreamprompt_checkpoint_callbackcontract.Why
Recent
mlx-lmprompt tuples carry a seventhprompt_checkpointsfield. Upstream one-line fixes cover the immediate unpack crash in_chunked_next, but the chunked-prefill monkeypatch also needs to retain that field when it stores partial progress and resumes generation later.Additionally, the upstream
BatchGeneratorinvokesprompt_checkpoint_callbackafter cache finalization. The chunked-prefill monkeypatch was missing this callback invocation, breaking the checkpoint contract.What changed
_lazy_extract_cachefrom mlx-lm and invokeprompt_checkpoint_callbackat the correct semantic boundary (afterc.finalize(), before the checkpoint tail model call)Files to review
vllm_mlx/scheduler.pytests/test_batching.pyRelated PRs
#194and#156, which address only the immediate unpack failure in_chunked_nextValidation