-
Notifications
You must be signed in to change notification settings - Fork 345
Track Qwen3.5-related issues #2281
Copy link
Copy link
Open
Description
zpqiu
opened on Apr 17, 2026
Issue body actions
- MCore Path
- Add / track CP support: [main] feat(moe): Support packed sequence for gated delta net (GDN) NVIDIA/Megatron-LM#2645
- AutoModel Path
- Move the FLA dependency from the dev group ([dependency-groups]) to optional extras ([project.optional-dependencies]) so that NeMo-RL can install it downstream via pkg[extra]. If FLA is not installed
- No CP support
- Worse performance
- build: move flash-linear-attention back to optional-dependencies Automodel#1894
- Fix the default config path where Torch Adam is used without FP32 master weights, as this can slow down convergence.
- TE FusedAdam can be used as a workaround.
- AutoModel should correctly support / apply the FP32 master weight setting. fix: fp32 master weights for custom MoE models under FSDP2 Automodel#1896
- Move the FLA dependency from the dev group ([dependency-groups]) to optional extras ([project.optional-dependencies]) so that NeMo-RL can install it downstream via pkg[extra]. If FLA is not installed
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels