scheduler: preserve prompt checkpoints in chunked prefill resume path by krystophny · Pull Request #221 · waybarrios/vllm-mlx

krystophny · 2026-03-24T17:56:07Z

Summary

Preserve prompt_checkpoints across chunked-prefill partial resume and finalization for mlx-lm prompt tuples, and invoke the upstream prompt_checkpoint_callback contract.

Why

Recent mlx-lm prompt tuples carry a seventh prompt_checkpoints field. Upstream one-line fixes cover the immediate unpack crash in _chunked_next, but the chunked-prefill monkeypatch also needs to retain that field when it stores partial progress and resumes generation later.

Additionally, the upstream BatchGenerator invokes prompt_checkpoint_callback after cache finalization. The chunked-prefill monkeypatch was missing this callback invocation, breaking the checkpoint contract.

What changed

update the chunked-prefill scheduler monkeypatch to accept 7-field prompt tuples
preserve prompt-checkpoint state through partial-prefill resume and finalization
import _lazy_extract_cache from mlx-lm and invoke prompt_checkpoint_callback at the correct semantic boundary (after c.finalize(), before the checkpoint tail model call)
add regression tests: one for partial chunked-prefill with prompt checkpoints, one verifying the callback fires with correct uid and checkpoint offset

Files to review

vllm_mlx/scheduler.py
tests/test_batching.py

Related PRs

broader than #194 and #156, which address only the immediate unpack failure in _chunked_next

Validation

$ python -m pytest tests/test_batching.py::TestSchedulerBasic -v
9 passed in 2.33s

krystophny · 2026-03-24T19:08:11Z

Critical review found a blocking scheduler issue: the chunked-prefill path still does not invoke prompt_checkpoint_callback, so it does not preserve the upstream checkpoint contract yet. I am leaving this PR in draft until that callback path is fixed.

Thump604 · 2026-03-25T01:18:17Z

This is the right approach. We had #194 open for the same crash (one-line unpack fix) but just closed it -- your PR is the only one of the four submissions that actually preserves checkpoint semantics through the chunked prefill lifecycle instead of silently discarding them.

The checkpoint-aware finalization step (feeding remaining checkpoint tokens through the model before generation) is what the others all miss.

Happy to test once the callback gap is resolved. We run chunked prefill on Qwen3.5-122B (M2 Ultra 128GB) with 2048-token prefill steps.

The upstream BatchGenerator contract requires prompt_checkpoint_callback to fire after cache finalization, before the checkpoint tail model call. The chunked-prefill monkeypatch preserved the checkpoint field but never invoked the callback, breaking the upstream checkpoint contract. Wire _lazy_extract_cache from mlx-lm and invoke the callback at the correct semantic boundary. Add regression test verifying the callback fires with the correct uid and checkpoint offset.

Fix chunked prefill for mlx-lm prompt checkpoints

fa14460

krystophny marked this pull request as draft March 24, 2026 19:08

Thump604 mentioned this pull request Mar 25, 2026

fix: unpack prompt_checkpoints in _chunked_next for mlx-lm >= 0.31.0 #194

Closed

4 tasks

This was referenced Mar 25, 2026

[Tracking] Upstream backlog and merge plan computor-org/vllm-mlx#12

Open

scheduler: finish prompt checkpoint callback support in chunked prefill computor-org/vllm-mlx#13

Closed

krystophny marked this pull request as ready for review March 25, 2026 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler: preserve prompt checkpoints in chunked prefill resume path#221

scheduler: preserve prompt checkpoints in chunked prefill resume path#221
krystophny wants to merge 2 commits intowaybarrios:mainfrom
computor-org:fix/chunked-prefill-prompt-checkpoints-upstream

krystophny commented Mar 24, 2026 •

edited

Loading

Uh oh!

krystophny commented Mar 24, 2026

Uh oh!

Thump604 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

krystophny commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

Files to review

Related PRs

Validation

Uh oh!

krystophny commented Mar 24, 2026

Uh oh!

Thump604 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

krystophny commented Mar 24, 2026 •

edited

Loading