Low Precision Recipes for LLama3-8B #1178

adityavavreNVDA · 2025-11-04T04:33:39Z

This PR adds recommended args for LLama3-8B low precision recipes. Following recipes are verified (long convergence 1T tokens):

FP8 current scaling
MXFP8
NVFP4

Attaching a short convergence loss curve for reference:

copy-pr-bot · 2025-11-04T04:33:42Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

adityavavreNVDA · 2025-11-05T05:43:02Z

/ok to test 0c67d7b

Signed-off-by: Aditya Vavre <[email protected]>

ananthsub · 2025-11-05T21:10:00Z

src/megatron/bridge/recipes/llama/llama3.py


-def llama3_8b_low_precision_pretrain_config(mixed_precision_recipe: str, **user_kwargs: Unpack[Llama3CommonKwargs]) -> ConfigContainer:
+def llama3_8b_low_precision_pretrain_config(
+    mixed_precision_recipe: str, **user_kwargs: Unpack[Llama3CommonKwargs]


n00b question: what is the advantage of having a dedicated function for this vs. users passing in or overriding the default precision setting to use the ones listed below?

Wanted to make it clear to users that these recipes are well tested for convergence and specify the hyperparams used in long convergence testing. The default params specified for bf16 precision don't seem to work well with low precision. For example, we found that AdamW epsilon has a significant effect on convergence.

what do you think about exposing these as individual recipe configs? e.g. llama3_8b_bf16_mxfp8_mixed_pretrain_config, llama3_8b_bf16_fp8_cs_mixed_pretrain_config, llama3_8b_bf16_nvfp4_mixed_pretrain_config

this might be clearer for which low-precision recipes have been tested for long convergence, and can be easier to follow when hyperparams vary across recipes

Apart from the precision configs, there are no other differences in the recipes. Hence I decided to just define one function for all. To be very clear with with which recipes have been tested I have an assert statement inside the function which checks if its one of FP8CS, MXFP8, and NVFP4 or not. @cuichenx / @yaoyu-33 any thoughts?

Maybe in the assert statement instead of saying "Invalid Recipe" I can just say the recipe has not been verified for long convergence?

Signed-off-by: Aditya Vavre <[email protected]>

adityavavreNVDA · 2025-11-05T21:15:00Z

/ok to test 8ab27c8

Signed-off-by: Aditya Vavre <[email protected]>

adityavavreNVDA · 2025-11-05T21:45:33Z

/ok to test 402563b

adityavavreNVDA added 2 commits November 3, 2025 16:52

Adding low precision recipes for LLama3-8B

4ecac56

Exposing recipe in init file

0c67d7b

adityavavreNVDA requested review from ananthsub and yaoyu-33 November 4, 2025 04:34

copy-pr-bot bot temporarily deployed to nemo-ci November 5, 2025 05:43 Inactive

Fixing lint issues

5905cc5

Signed-off-by: Aditya Vavre <[email protected]>

copy-pr-bot bot temporarily deployed to nemo-ci November 5, 2025 19:50 Inactive

ananthsub reviewed Nov 5, 2025

View reviewed changes

Fixing lint issues

8ab27c8

Signed-off-by: Aditya Vavre <[email protected]>

copy-pr-bot bot temporarily deployed to nemo-ci November 5, 2025 21:15 Inactive

copy-pr-bot bot temporarily deployed to test November 5, 2025 21:15 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 5, 2025 21:20 Failure

Fixing test for low precision recipes

402563b

Signed-off-by: Aditya Vavre <[email protected]>

copy-pr-bot bot temporarily deployed to nemo-ci November 5, 2025 21:45 Inactive

copy-pr-bot bot temporarily deployed to test November 5, 2025 21:46 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 5, 2025 23:18 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 5, 2025 23:41 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low Precision Recipes for LLama3-8B #1178

Low Precision Recipes for LLama3-8B #1178

Uh oh!

adityavavreNVDA commented Nov 4, 2025

Uh oh!

copy-pr-bot bot commented Nov 4, 2025

Uh oh!

adityavavreNVDA commented Nov 5, 2025

Uh oh!

ananthsub Nov 5, 2025

Uh oh!

adityavavreNVDA Nov 5, 2025 •

edited

Loading

Uh oh!

ananthsub Nov 6, 2025

Uh oh!

adityavavreNVDA Nov 6, 2025 •

edited

Loading

Uh oh!

adityavavreNVDA Nov 6, 2025

Uh oh!

adityavavreNVDA commented Nov 5, 2025

Uh oh!

adityavavreNVDA commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Low Precision Recipes for LLama3-8B #1178

Are you sure you want to change the base?

Low Precision Recipes for LLama3-8B #1178

Uh oh!

Conversation

adityavavreNVDA commented Nov 4, 2025

Uh oh!

copy-pr-bot bot commented Nov 4, 2025

Uh oh!

adityavavreNVDA commented Nov 5, 2025

Uh oh!

ananthsub Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

adityavavreNVDA Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ananthsub Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

adityavavreNVDA Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adityavavreNVDA Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

adityavavreNVDA commented Nov 5, 2025

Uh oh!

adityavavreNVDA commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adityavavreNVDA Nov 5, 2025 •

edited

Loading

adityavavreNVDA Nov 6, 2025 •

edited

Loading