fix: position-ids in qwen3-next by akoumpa · Pull Request #1767 · NVIDIA-NeMo/Automodel

akoumpa · 2026-04-10T06:19:03Z

What does this PR do ?

torchrun --nproc-per-node 8 nemo_automodel/recipes/llm/benchmark.py --config examples/benchmark/configs/qwen3_next_te_deepep.yaml --model.config.num_hidden_layers 8 --distributed.ep_size 8

What it does: Benchmarks Qwen3-Next-80B-A3B-Instruct with TE + DeepEP, 8 GPUs. Crashes on the very first forward pass.

Traceback:

benchmark.py:246      → _forward_backward_step(...)
train_ft.py:1343      → out = model(**batch)
qwen3_next/model.py:296 → self.model(...)
qwen3_next/model.py:202 → layer(...)
qwen3_next/model.py:77  → self.linear_attn(
    ..., position_ids=position_ids)   ← CRASH

TypeError: Qwen3NextGatedDeltaNet.forward() got an unexpected keyword argument 'position_ids'
Root cause: In nemo_automodel/components/models/qwen3_next/model.py:77, the decoder layer passes position_ids to self.linear_attn(). But Qwen3NextGatedDeltaNet.forward() doesn't accept that parameter — gated delta-nets encode position implicitly and have a different signature than standard attention.

So Qwen3NextGatedDeltaNet.forward() accepts:

(self, hidden_states, cache_params=None, cache_position=None, attention_mask=None)

Changelog

Add specific line by line info of high level changes in this PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

copy-pr-bot · 2026-04-10T06:19:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-04-10T06:35:59Z

/ok to test 66a2804

The megatron_fsdp_strategy_parallelize function was missing the call to _update_attention_head_counts_for_tp after applying tensor parallelism via parallelize_module. The FSDP2 path (DefaultParallelizationStrategy.parallelize) already performs this update, but the MegatronFSDP path did not. Without this update, attention modules retain global head counts after TP sharding, which can cause incorrect GQA behavior and DTensor shape mismatches that manifest as NCCL collective hangs. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

akoumpa · 2026-04-12T01:55:53Z

/ok to test db1f660

akoumpa · 2026-04-13T00:24:15Z

/ok to test ce0d6ab

fix position-ids in qwen3-next

66a2804

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

akoumpa added the r0.4.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. label Apr 10, 2026

copy-pr-bot bot temporarily deployed to test April 10, 2026 06:36 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 10, 2026 06:36 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 10, 2026 06:50 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 10, 2026 07:10 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 10, 2026 07:30 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 10, 2026 07:30 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 10, 2026 07:30 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 12, 2026 01:56 Inactive

copy-pr-bot bot temporarily deployed to test April 12, 2026 01:56 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 12, 2026 02:55 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 12, 2026 03:02 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 12, 2026 03:02 Inactive

Merge branch 'main' into akoumparouli/fix_pos_ids_in_qwen3_next

ce0d6ab

copy-pr-bot bot temporarily deployed to test April 13, 2026 00:24 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 13, 2026 00:24 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 13, 2026 01:45 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 13, 2026 01:52 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 13, 2026 02:10 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: position-ids in qwen3-next#1767

fix: position-ids in qwen3-next#1767
akoumpa wants to merge 3 commits intomainfrom
akoumparouli/fix_pos_ids_in_qwen3_next

akoumpa commented Apr 10, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

akoumpa commented Apr 10, 2026

Uh oh!

akoumpa commented Apr 12, 2026

Uh oh!

akoumpa commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akoumpa commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

akoumpa commented Apr 10, 2026

Uh oh!

akoumpa commented Apr 12, 2026

Uh oh!

akoumpa commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

akoumpa commented Apr 10, 2026 •

edited

Loading