Feat: Eagle3 HF Online - support nemotron models #463

h-guo18 · 2025-10-25T01:07:11Z

What does this PR do?

Type of change: New feature

Overview:

Support the nano and nano-VL in eagle3 online mode:
- Added submodule path detection for base model, lm_head, and embeddings to adapt different base model naming structure;
- Refactored data loading/preprocessing to support VLM;
Attn backend improvement:
- Added option of sdpa in case flex_attn doesn't work.
- Added a unified TTT mask function that produce either BlockMask for flex_attn or tensor masks for regular attn.
Logging improvements:
- Added estimated AR validation during training. This is available for both online and offline.
- Plot estimated AR and training acc to wandb for better training visualization;

Usage

For VLM as base model, pass in extra arguments --vlm_processor <hf_model_path> --vlm_img_dir <path to images> in original launching commands. Other usage unchanged.
E.g.

./launch_train.sh --model $MODEL \
            --output_dir $OUTPUT_DIR \
            --data $DATA \
            --num_gpu 1 \
            --num_epochs 2 \
            --train_bs 2 \
            --lr 3e-5 \
            --eagle_config eagle_config.json \
            --training_seq_len 4096 \
            --vlm_processor $MODEL \
            --vlm_img_dir  <path to images>

Testing

Tested short training with HF Online training on following models:

llama-3.2-1b - data: daring-anteater
The new nano (Hyrbid LLM) - data: daring-anteater
The nano-VL - data: Llama-Nemotron-VLM-Dataset-v1/ocr_1

See loss decreasing and AR > 1.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

copy-pr-bot · 2025-10-25T01:07:14Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

h-guo18 · 2025-10-27T23:28:24Z

modelopt/torch/speculative/plugins/transformers.py

-            device = self.model.layers[-1].self_attn.q_proj.weight.device
-        elif hasattr(self.model.layers[-1].self_attn, "qkv_proj"):
-            device = self.model.layers[-1].self_attn.qkv_proj.weight.device
-        self.eagle_module.to(self.dtype).to(device)


TODO: confirm this device detection with @yeyu-nvidia

Signed-off-by: h-guo18 <[email protected]>

codecov · 2025-10-27T23:53:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.38%. Comparing base (41de55f) to head (9c791d9).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #463   +/-   ##
=======================================
  Coverage   73.38%   73.38%           
=======================================
  Files         180      180           
  Lines       18110    18110           
=======================================
  Hits        13290    13290           
  Misses       4820     4820

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: h-guo18 <[email protected]>

yeyu-nvidia · 2025-10-30T16:27:04Z

examples/speculative_decoding/eagle_utils.py

+        input_ids = output.input_ids[0]
+        attention_mask = output.attention_mask[0]
+        loss_mask = torch.ones_like(input_ids)
+        labels = torch.full_like(input_ids, IGNORE_TOKEN_ID)


So all labels are IGNORE_TOKEN_ID?

yeyu-nvidia · 2025-10-30T16:40:58Z

examples/speculative_decoding/eagle_utils.py

        return ret


 class OfflineSupervisedDataset(Dataset):


Does this support VLM data?

yeyu-nvidia · 2025-10-30T16:51:36Z

examples/speculative_decoding/eagle_utils.py

+        if wandb and is_master():
            wandb.init()

+    def on_log(self, args, state, control, **kwargs):


Can you explain how you estimate AR? I'm not sure it's a good idea to expose "estimated AR" as it may mislead users.

yeyu-nvidia · 2025-10-30T16:52:31Z

examples/speculative_decoding/main.py

        metadata={"help": "Path to the d2t cache directory."},
    )
+    vlm_img_dir: str = field(default=None, metadata={"help": "Path to the VLM image directory."})
+    vlm_processor: str = field(default=None, metadata={"help": "Path to the VLM processor."})


what is VLM processor?

yeyu-nvidia · 2025-10-30T16:59:31Z

modelopt/torch/speculative/plugins/transformers.py

-        for param in self.model.embed_tokens.parameters():
+        # find base model, lm head, and embeddings paths
+        self._find_base_model_parts()
+        self.eagle_module.to(self._base_model.dtype).to(self._base_model_lm_head.weight.device)


Need to check if ptq/inference fails. We want to make sure eagle_module.device is the same as last base model decoder layer, but this is not necessarily the same as lm_head.device.

yeyu-nvidia · 2025-10-30T17:02:09Z

modelopt/torch/speculative/plugins/transformers.py


+        dtypemin = torch.finfo(self._base_llm_config.dtype).min
+        q_len = seq_length
+        kv_len = seq_length * (2 + ttt_step)


Why 2 + ttt_step?

h-guo18 self-assigned this Oct 25, 2025

h-guo18 commented Oct 27, 2025

View reviewed changes

h-guo18 added 3 commits October 27, 2025 23:40

add support for nanov3

20b8d9d

Signed-off-by: h-guo18 <[email protected]>

support nano2-vlm; other improvement

75a4dc1

Signed-off-by: h-guo18 <[email protected]>

minor: read attn impl from config json

a85d473

Signed-off-by: h-guo18 <[email protected]>

h-guo18 force-pushed the haoguo/support-nano branch from 8eb6abf to a85d473 Compare October 27, 2025 23:40

h-guo18 changed the title ~~Feat: eagle3 support for nanov3~~ Feat: eagle3 support for nano2-vlm and nano3 Oct 27, 2025

h-guo18 changed the title ~~Feat: eagle3 support for nano2-vlm and nano3~~ Feat: Eagle3 HF Online - support nano2-vlm and nano3 Oct 27, 2025

h-guo18 marked this pull request as ready for review October 27, 2025 23:56

h-guo18 requested a review from a team as a code owner October 27, 2025 23:56

h-guo18 requested a review from yeyu-nvidia October 27, 2025 23:56

minor: revert irrelevant change

9c791d9

Signed-off-by: h-guo18 <[email protected]>

h-guo18 changed the title ~~Feat: Eagle3 HF Online - support nano2-vlm and nano3~~ Feat: Eagle3 HF Online - support nemotron models Oct 28, 2025

yeyu-nvidia reviewed Oct 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Eagle3 HF Online - support nemotron models #463

Feat: Eagle3 HF Online - support nemotron models #463

h-guo18 commented Oct 25, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 25, 2025

Uh oh!

h-guo18 Oct 27, 2025

Uh oh!

codecov bot commented Oct 27, 2025 •

edited

Loading

Uh oh!

yeyu-nvidia Oct 30, 2025

Uh oh!

yeyu-nvidia Oct 30, 2025

Uh oh!

yeyu-nvidia Oct 30, 2025

Uh oh!

yeyu-nvidia Oct 30, 2025

Uh oh!

yeyu-nvidia Oct 30, 2025

Uh oh!

yeyu-nvidia Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat: Eagle3 HF Online - support nemotron models #463

Are you sure you want to change the base?

Feat: Eagle3 HF Online - support nemotron models #463

Conversation

h-guo18 commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Oct 25, 2025

Uh oh!

h-guo18 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yeyu-nvidia Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

yeyu-nvidia Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

yeyu-nvidia Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

yeyu-nvidia Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

yeyu-nvidia Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

yeyu-nvidia Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

h-guo18 commented Oct 25, 2025 •

edited

Loading

codecov bot commented Oct 27, 2025 •

edited

Loading