Skip to content

feat: add sglang chunked prefill support#470

Merged
gufengc merged 1 commit into
mainfrom
codex/sglang-chunked-prefill
Jun 1, 2026
Merged

feat: add sglang chunked prefill support#470
gufengc merged 1 commit into
mainfrom
codex/sglang-chunked-prefill

Conversation

@gufengc
Copy link
Copy Markdown
Collaborator

@gufengc gufengc commented Jun 1, 2026

Summary

  • Add minimal SGLang chunked-prefill glue that reuses native SGLang Req, PrefillAdder, ScheduleBatch, and unfinished chunk cache lifecycle.
  • Align MLX middle-chunk behavior with SGLang semantics through a shared backend-neutral request-state helper.
  • Pass chunked_prefill_size explicitly into the scheduler and stop reading chunk size indirectly from the cache manager.

Why

Parallax constructs ScheduleBatch directly instead of using the upstream SGLang scheduler, so GPU chunked prefill needed a thin adapter around SGLang's native chunk selection and lifecycle primitives. This keeps the SGLang changes small while avoiding divergent MLX/SGLang chunk semantics.

Validation

  • uv run --no-sync pre-commit run --all-files
  • uv run --no-sync pytest
  • Manual SGLang single-node streaming test on my-windows with --chunked-prefill-size 128
  • Manual Mac + Windows/H200 two-node pipeline test with layers split [0,14) and [14,28), validating long-prompt semantic output

@gufengc gufengc changed the title feat: add SGLang chunked prefill support feat: add sglang chunked prefill support Jun 1, 2026
@gufengc gufengc marked this pull request as ready for review June 1, 2026 16:02
@gufengc gufengc requested a review from a team June 1, 2026 16:02
@gufengc gufengc merged commit ce6ef50 into main Jun 1, 2026
11 of 12 checks passed
@gufengc gufengc deleted the codex/sglang-chunked-prefill branch June 1, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant