Skip to content

fix cg acess issue by using dict instead of list to iteratively acces…#3867

Merged
yaox12 merged 2 commits intoNVIDIA:devfrom
ilml:fix-moe-cudagraph-nested-attrs
Mar 18, 2026
Merged

fix cg acess issue by using dict instead of list to iteratively acces…#3867
yaox12 merged 2 commits intoNVIDIA:devfrom
ilml:fix-moe-cudagraph-nested-attrs

Conversation

@ilml
Copy link
Contributor

@ilml ilml commented Mar 13, 2026

Fix nested CUDA-graph attribute access for Flex MoE token dispatchers.

Summary

  • Use dotted-path helpers for token_dispatcher.cudagraph_attrs, so entries like _comm_manager.token_probs, _comm_manager.token_indices, and _comm_manager.routing_map are handled correctly.
  • Replace flat getattr / setattr usage in MoE CUDA-graph capture and replay paths with shared nested-attribute helpers.
  • Add unit tests covering nested cudagraph attribute reads and writes.

This keeps existing flat attribute handling unchanged while fixing Flex dispatcher backends such as deepep and hybridep.

What does this PR do ?

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

  1. When your PR is ready, click Ready for Review.
  2. An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
    • Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

@ilml ilml requested review from a team as code owners March 13, 2026 21:28
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ilml ilml requested a review from jiemingz March 13, 2026 21:29
@jiemingz
Copy link
Contributor

jiemingz commented Mar 16, 2026

https://github.com/NVIDIA/Megatron-LM/pull/3625/changes

There is an equivalent change on main, why not reflect this PR?

update: main has the exact code now except it misses the test, and doesnt refactor the setter and getter into their own functions. When we go to merge dev into main I can take an AI to make sure these get into main as well

@ilml ilml enabled auto-merge March 16, 2026 15:33
@yaox12
Copy link
Member

yaox12 commented Mar 16, 2026

/ok to test bddcdb3

@svcnvidia-nemo-ci svcnvidia-nemo-ci added this to the Core 0.16 milestone Mar 16, 2026
@ilml ilml added this pull request to the merge queue Mar 16, 2026
@svcnvidia-nemo-ci
Copy link

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23161562822

@svcnvidia-nemo-ci
Copy link

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23165894341

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 16, 2026
@ilml ilml added this pull request to the merge queue Mar 16, 2026
@svcnvidia-nemo-ci
Copy link

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23167652445

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 16, 2026
@ilml ilml added this pull request to the merge queue Mar 16, 2026
@svcnvidia-nemo-ci
Copy link

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23170581013

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 17, 2026
@yaox12 yaox12 enabled auto-merge March 17, 2026 21:16
@yaox12
Copy link
Member

yaox12 commented Mar 17, 2026

/ok to test 00d2b4b

@yaox12 yaox12 added this pull request to the merge queue Mar 17, 2026
@svcnvidia-nemo-ci
Copy link

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23220473186

Merged via the queue into NVIDIA:dev with commit 74124ba Mar 18, 2026
47 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants