Skip to content

feat: add mlx chunked prefill support#469

Merged
gufengc merged 2 commits into
mainfrom
codex/mac-chunked-prefill
Jun 1, 2026
Merged

feat: add mlx chunked prefill support#469
gufengc merged 2 commits into
mainfrom
codex/mac-chunked-prefill

Conversation

@gufengc
Copy link
Copy Markdown
Collaborator

@gufengc gufengc commented Jun 1, 2026

Summary

  • Adds MLX/mac chunked prefill support with chunk-aware request length tracking and prefix-cache backed chunk progression.
  • Makes prefix cache enabled by default and adds --chunked-prefill-size with a default of 1024; 0 disables chunking.
  • Materializes MLX linear caches after prefill chunks to avoid lazy update graph accumulation across chunks.
  • Preserves downstream pipeline chunk ordering when multiple chunks for the same request id arrive before the previous chunk finishes.

Why

Long-prefill requests on macOS MLX could exceed memory because the prefill path ran the full prompt at once. During chunked testing, downstream peers could also drop queued chunks with the same request id, causing two-node pipeline requests to hang. This PR chunks the MLX prefill work, reuses prefix cache state across chunks, and keeps distinct same-rid prefill chunks queued until their turn.

Validation

  • pre-commit run --all-files
  • pytest -> 143 passed, 5 skipped
  • Manual macOS two-node validation with mlx-community/Qwen3.5-0.8B-MLX-bf16: node0 hosted layers [0, 12), node1 hosted layers [12, 24), --chunked-prefill-size 128; a 1675-token prompt completed successfully with 8 generated tokens.

@gufengc gufengc changed the title [codex] Add MLX chunked prefill support feat: add MLX chunked prefill support Jun 1, 2026
@gufengc gufengc changed the title feat: add MLX chunked prefill support feat: add mlx chunked prefill support Jun 1, 2026
@gufengc gufengc marked this pull request as ready for review June 1, 2026 14:29
@gufengc gufengc requested a review from a team June 1, 2026 14:29
@gufengc gufengc merged commit 3888eed into main Jun 1, 2026
19 of 21 checks passed
@gufengc gufengc deleted the codex/mac-chunked-prefill branch June 1, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant