[trainer] feat: Add Nemo-Automodel as alternative training engine by HuiyingLi · Pull Request #5407 · verl-project/verl

HuiyingLi · 2026-02-26T06:57:22Z

What does this PR do?

Add NeMo-Automodel as a training engine. The SFT trainer is tested with Qwen2.5-0.5B.

automodel engine matches exactly with FSDP engine for SFT trainer (TP1/TP2/rmpad=True/False)
use_remove_padding=True matches use_remove_padding=False
EP support tested with kimi moonlight 16B

Relevant PRs:

RFC [RFC] support Nemo-AutoModel as an alternative training backend #5245
Add VeOmni [trainer] feat: Implemented VeomniEngine as a alternative training backend #4072
Add TorchTitan [trainer] feat: Add Torchtitan as alternative training engine #5051

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

automodel backend 1gpu & 4gpu tp1/tp2 against FSDP backend 1gpu, rmpad true/false.

automodel backend finetuning moonlight 16B with ep8 8H100

automodel backend finetuning Qwen3 30B with ep8 8H100

automodel backend finetuning qwen2.5-7b with 4H100 fsdp2

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

…ompatibility Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

CLAassistant · 2026-02-26T06:57:30Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces a new automodel SFT backend, which leverages nemo_automodel for distributed training. The changes include adding the engine implementation, configuration files, and test scripts. I've identified a configuration issue in the test script and a maintainability concern in the engine implementation. Overall, this is a significant feature addition.

tests/special_e2e/sft/run_sft_engine.sh

gemini-code-assist · 2026-02-26T07:04:28Z

verl/workers/engine/automodel/transformer_impl.py

+        if isinstance(output, torch.Tensor):
+            from types import SimpleNamespace
+
+            output = SimpleNamespace(logits=output)


The model's output is conditionally wrapped in a SimpleNamespace if it's a raw tensor. This suggests an inconsistent return type from self.module, which can make the code harder to maintain and reason about. It would be more robust to enforce a consistent, structured return type (like CausalLMOutput) from the model to avoid such conditional handling.

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

verl/workers/engine/automodel/__init__.py

ISEEKYAN · 2026-02-28T03:29:45Z

hi @HuiyingLi , thanks to your great contribution.
I found the MFU of automodel is lower than FSDP on 0.5B model and the MFU is less than 1% on 16B MoE model, is this supposed to be right? Could you provide some fair comparison on popular models such as 7B dense or 30B MoE?

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

HuiyingLi · 2026-03-02T09:28:37Z

hi @HuiyingLi , thanks to your great contribution. I found the MFU of automodel is lower than FSDP on 0.5B model and the MFU is less than 1% on 16B MoE model, is this supposed to be right? Could you provide some fair comparison on popular models such as 7B dense or 30B MoE?

Hi @ISEEKYAN ,
Thank you!

for 0.5B model, FSDP engine is single GPU while automodel is 4 GPU. I've updated the chart with automodel single GPU for comparison.
for the low mfu on 16B model: due to very small seqlen and batch size. I've updated the chart with larger seqlen and gbs, and added charts for Qwen 30B MoE and Qwen 7B dense.

ISEEKYAN · 2026-03-02T10:03:13Z

great, is your exp on H100? If true, the MFU looks good but it will be better to have a fair comparison with FSDP or Megatron. But this is not a blocker of merging this PR. It is just a good reference for the users to adopt AutoModel, and it will be better to add a doc to show the comparison and an example for the user to easily hand on

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

ETOgaosion · 2026-03-13T08:11:43Z

@HuiyingLi Thanks for your great contribution! Could you please sign the CLA~

ETOgaosion · 2026-03-13T08:15:07Z

docs/workers/automodel_workers.rst

+
+**Requirements**
+
+- Automodel r0.3.0


Maybe in another PR, we should refactor docs/start/install.rst to support all model engines and rollout engine install methods and use some displays for better choices.

ETOgaosion · 2026-03-13T08:18:59Z

We should also prepare some CI tests for Nemo-Automodel

HuiyingLi added 6 commits February 23, 2026 20:04

init version with fsdp2

07554c9

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

add mp policy config

11eb1a9

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

add ep and expose more configs

4641e75

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

fix(dataset): call .tolist() before tokenizer.decode() for tiktoken c…

697bf68

…ompatibility Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

add test

41dd4a8

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

format

c33321b

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

HuiyingLi changed the title ~~Add automodel sft backend~~ [trainer] feat: Add Nemo-Automodel as alternative training engine Feb 27, 2026

HuiyingLi added 3 commits February 27, 2026 01:42

revert some format changes

9a14478

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

fix eval ctx

4d7a193

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

fix exp name

6b3f061

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

HuiyingLi marked this pull request as ready for review February 27, 2026 10:10

HuiyingLi requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners February 27, 2026 10:10

ISEEKYAN reviewed Feb 28, 2026

View reviewed changes

verl/workers/engine/automodel/__init__.py Outdated Show resolved Hide resolved

HuiyingLi added 2 commits March 2, 2026 01:14

add expert torch_mm backend to config

3208fbd

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

change copyright

a0b51f8

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

HuiyingLi added 7 commits March 5, 2026 22:18

Merge branch 'main' into add_automodel_sft_backend

d2eec66

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

upgrade to automodel r0.3.0 with transformers v5.0.0

ec3b283

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

add automodel examples scripts

c1e8025

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

add docs

6060737

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

update optimizer integration

20cd9dc

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

update example scripts

1b9c6aa

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

add dependency req to examples

db0d6ca

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

format

48b7315

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

ETOgaosion reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[trainer] feat: Add Nemo-Automodel as alternative training engine#5407

[trainer] feat: Add Nemo-Automodel as alternative training engine#5407
HuiyingLi wants to merge 19 commits intoverl-project:mainfrom
HuiyingLi:add_automodel_sft_backend

HuiyingLi commented Feb 26, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Feb 26, 2026

Uh oh!

Uh oh!

ISEEKYAN commented Feb 28, 2026

Uh oh!

HuiyingLi commented Mar 2, 2026

Uh oh!

ISEEKYAN commented Mar 2, 2026

Uh oh!

ETOgaosion commented Mar 13, 2026

Uh oh!

ETOgaosion Mar 13, 2026

Uh oh!

ETOgaosion commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

HuiyingLi commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ISEEKYAN commented Feb 28, 2026

Uh oh!

HuiyingLi commented Mar 2, 2026

Uh oh!

ISEEKYAN commented Mar 2, 2026

Uh oh!

ETOgaosion commented Mar 13, 2026

Uh oh!

ETOgaosion Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

ETOgaosion commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuiyingLi commented Feb 26, 2026 •

edited

Loading

CLAassistant commented Feb 26, 2026 •

edited

Loading