Correctly trigger node limited routing in DeepSeek-V3 JAX path. by gpolovets1 · Pull Request #1891 · vllm-project/tpu-inference

gpolovets1 · 2026-03-10T00:02:56Z

Description

Previously, node limited routing was being skipped if fused MoE backends were being called.
Also fixed the following issues:

Applying router activations just once (previously applied an extra time in fused_moe_func).
Masking expert scores to negative value instead of 0 before top-K.
Using float32 precision during topk selection (vLLM reference implementation is also doing this)

Tests

Locally tested that MMLU went up from 67 to 80 while maintaining perf to <1%.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

github-actions · 2026-03-10T00:03:08Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a Github issue, please include a link, e.g.,:
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

…Also instead of applying sigmoids on the router outputs, passing the router assignments to the MoE layer and applying the activations on the router within the MoE layer. Signed-off-by: George Polovets <gpolovets@gmail.com>

…sn't being triggered. Signed-off-by: George Polovets <gpolovets@gmail.com>

…ring requant, correctly applying yarn scale, and using FP32 and correct order of operations in get_topk_indices (since it also applies activations and bias term). Signed-off-by: George Polovets <gpolovets@gmail.com>

…rge negative rather than 0. Signed-off-by: George Polovets <gpolovets@gmail.com>

…e factor to be applied on the expert weights instead of hidden_states. This improved perf by about 3% and now perf is within 1% of mainline but with much improved MMLU. Signed-off-by: George Polovets <gpolovets@gmail.com>

Signed-off-by: George Polovets <gpolovets@gmail.com>

lk-chen · 2026-03-10T16:55:48Z

tpu_inference/models/jax/deepseek_v3.py

-            e_sharding=P(None, ),
+            activation_ffw_td=(ShardingAxisName.MLP_DATA, None),
+            ed_sharding=(None, None),
+            e_sharding=(None, ),


I guess this line is unnecessary, and you can revert line 1078?

lk-chen · 2026-03-10T16:57:09Z

tpu_inference/models/jax/deepseek_v3.py

+        x_TD = jax.lax.with_sharding_constraint(x_TD,
+                                                P(*self.activation_ffw_td))
+
+        logits_TE = super().__call__(x_TD).astype(jnp.float32)


Could you add comment with reference of https://github.com/vllm-project/vllm/blob/e89a91d9275cd8ac086fe04476b41675a9ebbd5c/vllm/model_executor/layers/fused_moe/cpu_fused_moe.py#L59 here?

kyuyeunk · 2026-03-10T17:28:24Z

tpu_inference/layers/common/fused_moe_gmm.py

    use_ep: bool,
    activation: str,
    scoring_fn: str,
+    topk_weights: jax.Array | None = None,


not sure if this change is necessary? wouldn't it be possible to make change DeepSeekV3Router to return a value that fused_moe_func expects?

fused_moe_func applies global top-k routing whereas DeepSeek needs to use the custom grouped top-k routing.

Would you prefer if I passed get_topk_func as an argument? It could replace jax.lax.top_k if passed and I think retain the same logical flow.

so vllm has a concept called monolithic vs. non-monolithic. and i was wondering if we can leverage this api: https://github.com/vllm-project/vllm/blob/bdd8981dab8d8c6ae88a3f605d04ec5243088e5a/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py#L505-L525

gpolovets1 requested review from bzgoogle, jrplatin and vipannalla as code owners March 10, 2026 00:02

gpolovets1 requested review from gxd3, kyuyeunk and lk-chen and removed request for bzgoogle and vipannalla March 10, 2026 00:03

gpolovets1 force-pushed the gpolovets/fix_ds_router_logic branch from 0babcc6 to d160a78 Compare March 10, 2026 00:21

gpolovets1 added 6 commits March 10, 2026 00:23

Reorganized router grouped topk router logic because it previously wa…

eee1cb6

…sn't being triggered. Signed-off-by: George Polovets <gpolovets@gmail.com>

Add router bias after router activation and updated router mask to la…

3418718

…rge negative rather than 0. Signed-off-by: George Polovets <gpolovets@gmail.com>

Presubmit format changes.

2515a0d

Signed-off-by: George Polovets <gpolovets@gmail.com>

gpolovets1 force-pushed the gpolovets/fix_ds_router_logic branch from d160a78 to 2515a0d Compare March 10, 2026 00:24

gpolovets1 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 10, 2026

lk-chen reviewed Mar 10, 2026

View reviewed changes

kyuyeunk reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly trigger node limited routing in DeepSeek-V3 JAX path.#1891

Correctly trigger node limited routing in DeepSeek-V3 JAX path.#1891
gpolovets1 wants to merge 6 commits intomainfrom
gpolovets/fix_ds_router_logic

gpolovets1 commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

lk-chen Mar 10, 2026

Uh oh!

lk-chen Mar 10, 2026

Uh oh!

kyuyeunk Mar 10, 2026

Uh oh!

gpolovets1 Mar 10, 2026

Uh oh!

gpolovets1 Mar 10, 2026

Uh oh!

kyuyeunk Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gpolovets1 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

github-actions bot commented Mar 10, 2026

Description

Tests

Checklist

Uh oh!

lk-chen Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

lk-chen Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

kyuyeunk Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

gpolovets1 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

gpolovets1 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

kyuyeunk Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gpolovets1 commented Mar 10, 2026 •

edited

Loading