Feat/data parallelism#832
Conversation
Summary of ChangesHello @JamesBrianD, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the SGLang JAX inference engine by introducing comprehensive Data Parallelism (DP) support. This architectural change allows the system to distribute requests and KV cache across multiple devices, improving scalability and resource utilization for large-scale inference workloads. The modifications span across core components, from memory management and request scheduling to model execution and benchmarking utilities, ensuring a robust and efficient DP implementation. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
ce35b74 to
24ffa11
Compare
feat: allocator with dp (#591) refactor allocator (#608) change schedule batch (#613) change logitprocessor and fix scheduler bugs feat: rpa dp (#638) * feat: rpa dp * add debug logs fix overlap dp (#663) fix: merge cache loc (#665) fix: overlap schedule req over max batch size of per dp rank (#669) fix: precompile dp (#671) fix: communicator fan-out always be 1 if dp_size > 1 (#672) fix: memory leak (#673) feat: multi host support dp (#678) * mhdp fix: sharding not match when enable dp (#680) Co-Authored-By: Prayer <prayer@primatrix.ai> Co-authored-by: leos <leos@primatrix.ai> bench mark fused moe and change decode block size min_running_queue schedule for dp change bench fused scripts
* dp schedule * chore: ignore .worktrees/ directory * Add DP-safe mixin scheduling and FA metadata checks
78d04e5 to
e3b40ff
Compare
bff50e8 to
34c9671
Compare
b6d7606 to
19cec01
Compare
19cec01 to
464d10f
Compare
Refactor get_top_logprobs and get_token_ids_logprobs to return flat tensors instead of per-request nested structures. In DP mode, padding slots caused shape mismatches when jnp.array() tried to concatenate arrays with different dimensions. Now logits_processor returns flat tensors and consumers slice by logprob_pt offset on CPU side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Apply the same flat tensor pattern to sampler.py's get_top_logprobs and get_token_ids_logprobs (decode stage). Padding slots with k=0 caused ragged shape mismatches. Now return [batch_size, max_k] flat tensors and truncate to per-request k on CPU side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d5058d9 to
2ef0833
Compare
eec2620 introduced _gather_next_token_ids which gathers sharded JAX arrays to replicated sharding, but did not convert the result to CPU. This left next_token_ids as a JAX on-device array, causing downstream unhashable type errors when token ids were used in set lookups (check_finished). Add device_get + tolist at the source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
e56c621 to
1e7e066
Compare
Motivation
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist