fix(moe): align EP expert weight dtype with activation dtype by jQizhang · Pull Request #1913 · NVIDIA-NeMo/Automodel

jQizhang · 2026-04-20T07:21:15Z

What does this PR do ?

Fix issue #1863 : EP-sharded MoE expert weights stay in fp32 while the surrounding block's activations are cast to bf16 by FSDP2's MixedPrecisionPolicy, causing grouped_mm to crash with a dtype mismatch. Aligns expert weight dtype to the input activation dtype at the call site.

Changelog

nemo_automodel/components/moe/experts.py:
- GroupedExperts.forward (around L315): cast gate_and_up_projs and down_projs to x.dtype before the EP all-gather, so they match whatever dtype the surrounding block passed in.
- GroupedExpertsDeepEP.forward (around L696): cast gate_and_up_projs and down_projs to permuted_local_hidden_states.dtype right after .to_local(), so torch._grouped_mm / ops.gmm receive same-dtype operands.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to Gemma4 26B-A4B MoE Expert Weights float32 After FSDP2 Prefetch PR #1863

…NeMo#1863) FSDP2's MixedPrecisionPolicy only casts params on its wrap mesh, so cross-mesh DTensors (EP-sharded MoE experts) stay in fp32 while the surrounding block's activations are cast to bf16. grouped_mm then raises `Expected b.scalar_type() == torch::kBFloat16 to be true, but got false` (see NVIDIA-NeMo#1863). Fix: in `GroupedExperts.forward` and `GroupedExpertsDeepEP.forward`, cast the local expert weights to the input activation dtype right before grouped_mm. This matches grouped_mm's own requirement that both operands share a dtype. The .data is not mutated, so fp32 master weights remain available to the optimizer. Validation: smoke tests on NeMo-RL GRPO 1n8g + Gemma4 26B-A4B MoE against Automodel main @ bd942f2. * Without this patch: DTensorPolicyWorkerV2.get_logprobs() crashes at grouped_gemm.backend.gmm with the exact NVIDIA-NeMo#1863 error string. * With this patch: 3 GRPO steps complete, loss -0.007/0.008/-0.014, reward 0.70/0.60/0.84. Signed-off-by: larkzhang-nv <larkz@nvidia.com>

copy-pr-bot · 2026-04-20T07:21:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ZhiyuLi-Nvidia · 2026-04-20T07:33:39Z

/ok to test 31f4157

ZhiyuLi-Nvidia

Thank you @jQizhang. LGTM!

ZhiyuLi-Nvidia · 2026-04-20T07:37:13Z

@hemildesai could you also take a review?

jQizhang requested review from HuiyingLi, ZhiyuLi-Nvidia, adil-a, akoumpa, hemildesai and pthombre as code owners April 20, 2026 07:21

Merge branch 'main' into fix/moe-expert-dtype-1863

31f4157

copy-pr-bot bot temporarily deployed to nemo-ci April 20, 2026 07:34 Inactive

copy-pr-bot bot deployed to test April 20, 2026 07:34 Active

ZhiyuLi-Nvidia approved these changes Apr 20, 2026

View reviewed changes

ZhiyuLi-Nvidia assigned hemildesai Apr 20, 2026

copy-pr-bot bot temporarily deployed to nemo-ci April 20, 2026 07:45 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 20, 2026 07:51 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 20, 2026 08:16 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(moe): align EP expert weight dtype with activation dtype#1913

fix(moe): align EP expert weight dtype with activation dtype#1913
jQizhang wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
jQizhang:fix/moe-expert-dtype-1863

jQizhang commented Apr 20, 2026

Uh oh!

copy-pr-bot bot commented Apr 20, 2026

Uh oh!

ZhiyuLi-Nvidia commented Apr 20, 2026

Uh oh!

ZhiyuLi-Nvidia left a comment

Uh oh!

ZhiyuLi-Nvidia commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jQizhang commented Apr 20, 2026

What does this PR do ?

Changelog

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Apr 20, 2026

Uh oh!

ZhiyuLi-Nvidia commented Apr 20, 2026

Uh oh!

ZhiyuLi-Nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

ZhiyuLi-Nvidia commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants