[Grug] Add assigned-token DeepEP MoE dispatch by dlwh · Pull Request #6251 · marin-community/marin

dlwh · 2026-06-07T07:55:38Z

Add an assigned-token Grug MoE EP backend and a DeepEP-backed CUDA path that avoids ring global activation buffers. Include focused correctness tests and an issue-shape benchmark harness; 4-GPU accelerator confirmation shows DeepEP median 1.94 ms versus ring 2.32 ms with dropped-count parity.

Fixes #6215

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c6fc409f4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-07T07:59:22Z

    "ring",  # Expert-parallel all-gather + psum-scatter backend.
-    "ragged_all_to_all",  # Expert-parallel ragged all-to-all backend.
-    "deepep",  # Expert-parallel DeepEP intranode dispatch/combine backend.
+    "assigned_token",  # Expert-parallel plain-XLA assigned-token backend.


Update stale ragged_all_to_all contract test

When the Grug variant contract tests run, tests/test_grug_variant_contracts.py still does dataclasses.replace(cfg, moe_implementation="ragged_all_to_all") and later asserts that string is in the jaxpr. This diff removes ragged_all_to_all from MoeImplementation, so GrugModelConfig.__post_init__ now calls resolve_moe_implementation() and raises for that test input before exercising the new backend. Please update that test to use the new assigned_token implementation (and expected jaxpr string) or otherwise keep the old alias valid.

Useful? React with 👍 / 👎.

# Conflicts: # lib/levanter/src/levanter/grug/_moe/ep_deepep.py # lib/levanter/src/levanter/grug/_moe/ep_ragged_all_to_all.py

dlwh added 3 commits June 3, 2026 22:21

Fix Grug MoE optimizer clipping

750bce7

Split DeepEP scopes from optimizer PR

e0b6a60

Implement assigned-token DeepEP MoE path

7c6fc40

chatgpt-codex-connector Bot reviewed Jun 7, 2026

View reviewed changes

dlwh added the agent-generated Created by automation/agent label Jun 8, 2026 — with ChatGPT Codex Connector

dlwh mentioned this pull request Jun 8, 2026

[grug] Replace ring EP global buffers with assigned-token dispatch #6215

Open

dlwh added 5 commits June 7, 2026 22:19

Merge remote-tracking branch 'origin/main' into agent/20260606-fix-6215

31695df

# Conflicts: # lib/levanter/src/levanter/grug/_moe/ep_deepep.py # lib/levanter/src/levanter/grug/_moe/ep_ragged_all_to_all.py

Fix Grug MoE variant contract test

337f580

Merge remote-tracking branch 'origin/main' into agent/20260606-fix-6215

d7ab6e4

Fix pyrefly errors after main merge

3b5730d

Fix assigned-token backward TPU test shape

dae2b73

rjpower mentioned this pull request Jun 10, 2026

[grug] Drive 90B-5.3BA MoE MFU on cw-us-east-02a from 3% to ~25% #6304

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Grug] Add assigned-token DeepEP MoE dispatch#6251

[Grug] Add assigned-token DeepEP MoE dispatch#6251
dlwh wants to merge 8 commits into
mainfrom
agent/20260606-fix-6215

dlwh commented Jun 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dlwh commented Jun 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant