Skip to content

[trainer] feat: Add Nemo-Automodel as alternative training engine#5407

Open
HuiyingLi wants to merge 19 commits intoverl-project:mainfrom
HuiyingLi:add_automodel_sft_backend
Open

[trainer] feat: Add Nemo-Automodel as alternative training engine#5407
HuiyingLi wants to merge 19 commits intoverl-project:mainfrom
HuiyingLi:add_automodel_sft_backend

Conversation

@HuiyingLi
Copy link

@HuiyingLi HuiyingLi commented Feb 26, 2026

What does this PR do?

Add NeMo-Automodel as a training engine. The SFT trainer is tested with Qwen2.5-0.5B.

  • automodel engine matches exactly with FSDP engine for SFT trainer (TP1/TP2/rmpad=True/False)
  • use_remove_padding=True matches use_remove_padding=False
  • EP support tested with kimi moonlight 16B

Relevant PRs:

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

automodel backend 1gpu & 4gpu tp1/tp2 against FSDP backend 1gpu, rmpad true/false.
image

automodel backend finetuning moonlight 16B with ep8 8H100
image

automodel backend finetuning Qwen3 30B with ep8 8H100
image

automodel backend finetuning qwen2.5-7b with 4H100 fsdp2
image

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
…ompatibility

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
@CLAassistant
Copy link

CLAassistant commented Feb 26, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new automodel SFT backend, which leverages nemo_automodel for distributed training. The changes include adding the engine implementation, configuration files, and test scripts. I've identified a configuration issue in the test script and a maintainability concern in the engine implementation. Overall, this is a significant feature addition.

Comment on lines +549 to +552
if isinstance(output, torch.Tensor):
from types import SimpleNamespace

output = SimpleNamespace(logits=output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The model's output is conditionally wrapped in a SimpleNamespace if it's a raw tensor. This suggests an inconsistent return type from self.module, which can make the code harder to maintain and reason about. It would be more robust to enforce a consistent, structured return type (like CausalLMOutput) from the model to avoid such conditional handling.

@HuiyingLi HuiyingLi changed the title Add automodel sft backend [trainer] feat: Add Nemo-Automodel as alternative training engine Feb 27, 2026
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
@HuiyingLi HuiyingLi marked this pull request as ready for review February 27, 2026 10:10
@ISEEKYAN
Copy link
Collaborator

hi @HuiyingLi , thanks to your great contribution.
I found the MFU of automodel is lower than FSDP on 0.5B model and the MFU is less than 1% on 16B MoE model, is this supposed to be right? Could you provide some fair comparison on popular models such as 7B dense or 30B MoE?

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
@HuiyingLi
Copy link
Author

hi @HuiyingLi , thanks to your great contribution. I found the MFU of automodel is lower than FSDP on 0.5B model and the MFU is less than 1% on 16B MoE model, is this supposed to be right? Could you provide some fair comparison on popular models such as 7B dense or 30B MoE?

Hi @ISEEKYAN ,
Thank you!

  • for 0.5B model, FSDP engine is single GPU while automodel is 4 GPU. I've updated the chart with automodel single GPU for comparison.
  • for the low mfu on 16B model: due to very small seqlen and batch size. I've updated the chart with larger seqlen and gbs, and added charts for Qwen 30B MoE and Qwen 7B dense.

@ISEEKYAN
Copy link
Collaborator

ISEEKYAN commented Mar 2, 2026

great, is your exp on H100? If true, the MFU looks good but it will be better to have a fair comparison with FSDP or Megatron. But this is not a blocker of merging this PR. It is just a good reference for the users to adopt AutoModel, and it will be better to add a doc to show the comparison and an example for the user to easily hand on

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
@ETOgaosion
Copy link
Collaborator

@HuiyingLi Thanks for your great contribution! Could you please sign the CLA~


**Requirements**

- Automodel r0.3.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in another PR, we should refactor docs/start/install.rst to support all model engines and rollout engine install methods and use some displays for better choices.

@ETOgaosion
Copy link
Collaborator

We should also prepare some CI tests for Nemo-Automodel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants