Skip to content

Update omr receipe#973

Merged
Kipok merged 9 commits into
mainfrom
wedu/omr-receipe
Oct 25, 2025
Merged

Update omr receipe#973
Kipok merged 9 commits into
mainfrom
wedu/omr-receipe

Conversation

@wedu-nvidia

@wedu-nvidia wedu-nvidia commented Oct 18, 2025

Copy link
Copy Markdown
Collaborator

#963

Summary by CodeRabbit

  • New Features

    • Added a --backend CLI option (default: megatron) to run training and evaluation on one or multiple backends.
    • Pipeline now runs per-backend training and per-backend initial/final evaluations, producing backend-specific models and result outputs.
  • Tests

    • Test and verification flows updated to iterate backends, pass the backend option through, and validate backend-specific outputs.

Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
@coderabbitai

coderabbitai Bot commented Oct 18, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

A new multi-backend parameter was added and propagated through the OpenMath reasoning recipe: CLI --backend (default ["megatron"]) is threaded into prepare, initial_eval, run_sdg, run_training, and final_eval; training and evaluation now iterate per-backend. Test runner and result checker were updated to accept and use backend-specific paths.

Changes

Cohort / File(s) Summary
Recipe script: backend plumbing & per-backend execution
recipes/openmathreasoning/scripts/simplified_recipe.py
Added --backend CLI (default ["megatron"]); updated signatures for prepare, initial_eval, run_sdg, run_training, and final_eval to accept backend; threaded backend through the main flow; changed run_training, initial_eval, and final_eval to iterate per-backend and construct backend-specific model/output paths (e.g., training-{backend}, evals/after-training-{backend}); backend-aware wandb handling and conditional tweaks (e.g., sequence packing adjustments for fsdp) applied where relevant.
Test runner: forward backend to recipe
tests/slurm-tests/omr_simple_recipe/run_test.py
Added --backend CLI arg (default ["megatron"]); propagate args.backend into the command that invokes the simplified_recipe script and into the check_results invocation; added reuse_code=False to run_cmd call.
Result checker: backend-aware evaluation
tests/slurm-tests/omr_simple_recipe/check_results.py
Updated check_results signature to accept backend: str; CLI now accepts multi --backend and iterates each backend and benchmark, loading backend-specific after-training result paths (after-training-<backend>/eval-results/...) and calling check_results with the backend identifier; assertion messages and path construction updated to include backend.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI
  participant Recipe as simplified_recipe
  participant Prepare
  participant InitialEval
  participant SDG as run_sdg
  participant Train as run_training
  participant FinalEval

  CLI->>Recipe: invoke with --backend [b1,b2,...]
  loop for each backend b
    Recipe->>Prepare: prepare(..., backend=b)
    Recipe->>InitialEval: initial_eval(..., backend=b)
    Recipe->>SDG: run_sdg(..., backend=b)
    Recipe->>Train: run_training(..., backend=b)
    Train-->>Recipe: model_path(b)
    Recipe->>FinalEval: final_eval(..., backend=b, model=model_path(b))
  end
  Note right of Recipe #DDDDFF: per-backend artifacts and outputs created
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through code with a twitch of my nose,

Backends in a basket, in neat little rows,
Megatron or fsdp, each gets its own run,
Prepared, trained, checked — one by one, then we're done! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title "Update omr receipe" uses a non-descriptive term ("Update") that lacks specificity about the changeset. While the title does refer to the omr recipe component being modified, it fails to convey the main purpose of the changes, which center on adding multi-backend support and backend parameter threading throughout the codebase. A teammate scanning the history would not understand that this PR introduces backend-specific training iterations, evaluation flows, and CLI arguments. Additionally, the title contains a typo ("receipe" instead of "recipe"). Consider revising the title to be more descriptive and specific, such as "Add multi-backend support to omr recipe" or "Introduce backend parameter and per-backend training iterations". This would clearly communicate the main change to reviewers. Also correct the typo to "recipe".
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch wedu/omr-receipe

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@wedu-nvidia

Copy link
Copy Markdown
Collaborator Author

@Kipok I tested the full loop and the results are still in the range, so no need to update the range.

@wedu-nvidia wedu-nvidia requested a review from Kipok October 18, 2025 03:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
recipes/openmathreasoning/scripts/simplified_recipe.py (7)

27-47: Silence ARG001 by prefixing unused params in prepare.

num_gpus, backend, wandb_params aren’t used here. Keep the uniform signature but underscore them.

-def prepare(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):
+def prepare(workspace, cluster, _num_gpus, expname_prefix, _backend, _wandb_params):

32-40: Pin or parameterize download ref to avoid HEAD drift.

Fetching from refs/heads/main is brittle in CI. Allow overriding ref or pin to a commit/tag.

-        f"export DOWNLOAD_PREFIX=https://raw.githubusercontent.com/NVIDIA-NeMo/Skills/refs/heads/main/recipes/openmathreasoning && "
+        f"export DOWNLOAD_REF=${{DOWNLOAD_REF:-refs/heads/main}} && "
+        f"export DOWNLOAD_PREFIX=https://raw.githubusercontent.com/NVIDIA-NeMo/Skills/${{DOWNLOAD_REF}}/recipes/openmathreasoning && "

49-90: Underscore unused backend in run_sdg to appease Ruff.

-def run_sdg(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):
+def run_sdg(workspace, cluster, num_gpus, expname_prefix, _backend, wandb_params):

143-159: Underscore unused backend in final_eval.

-def final_eval(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):
+def final_eval(workspace, cluster, num_gpus, expname_prefix, _backend, wandb_params):

161-176: Underscore unused backend in initial_eval.

-def initial_eval(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):
+def initial_eval(workspace, cluster, num_gpus, expname_prefix, _backend, wandb_params):

211-216: Validate backend via argparse choices.

-    parser.add_argument(
-        "--backend",
-        type=str,
-        default="megatron",
-        help="Can either be megatron or fsdp",
-    )
+    parser.add_argument(
+        "--backend",
+        type=str,
+        choices=("megatron", "fsdp"),
+        default="megatron",
+        help="Training backend",
+    )

63-63: Wrap run_after values in lists to match type hints.

The run_after parameter in generate() and sft_nemo_rl() is typed as List[str], but the current code passes bare strings. While the runtime includes defensive isinstance checks to convert strings to lists, following the type hints improves consistency and type-checking compatibility.

-        run_after=f"{expname_prefix}-download-assets",
+        run_after=[f"{expname_prefix}-download-assets"],
@@
-        run_after=f"{expname_prefix}-problem-extraction",
+        run_after=[f"{expname_prefix}-problem-extraction"],
@@
-        run_after=f"{expname_prefix}-solution-generation",
+        run_after=[f"{expname_prefix}-solution-generation"],
@@
-        run_after=f"{expname_prefix}-prepare-training-data",
+        run_after=[f"{expname_prefix}-prepare-training-data"],
@@
-        run_after=f"{expname_prefix}-training",
+        run_after=[f"{expname_prefix}-training"],
@@
-        run_after=f"{expname_prefix}-download-assets",
+        run_after=[f"{expname_prefix}-download-assets"],

Applies to lines: 63, 81, 108, 138, 155, 173

tests/slurm-tests/omr_simple_recipe/run_test.py (2)

27-27: Validate backend choices in test CLI for early failure.

-    ap.add_argument("--backend", type=str, default="megatron", help="Can either be megatron or fsdp")
+    ap.add_argument("--backend", type=str, choices=("megatron", "fsdp"), default="megatron", help="Training backend")

30-44: Avoid shell=True; pass argv list to subprocess.run.

Safer and avoids quoting bugs.

-    cmd = (
-        f"python -m recipes.openmathreasoning.scripts.simplified_recipe "
-        f" --cluster {args.cluster} "
-        f" --workspace {args.workspace} "
-        f" --expname_prefix {args.expname_prefix} "
-        f" --backend {args.backend} "
-    )
-
-    if args.disable_wandb:
-        cmd += " --disable_wandb "
-    elif args.wandb_project:
-        cmd += f" --wandb_project {args.wandb_project} "
-
-    subprocess.run(cmd, shell=True, check=True)
+    cmd = [
+        "python", "-m", "recipes.openmathreasoning.scripts.simplified_recipe",
+        "--cluster", args.cluster,
+        "--workspace", args.workspace,
+        "--expname_prefix", args.expname_prefix,
+        "--backend", args.backend,
+    ]
+    if args.disable_wandb:
+        cmd.append("--disable_wandb")
+    elif args.wandb_project:
+        cmd += ["--wandb_project", args.wandb_project]
+
+    subprocess.run(cmd, check=True)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b18f39 and d694d41.

📒 Files selected for processing (2)
  • recipes/openmathreasoning/scripts/simplified_recipe.py (8 hunks)
  • tests/slurm-tests/omr_simple_recipe/run_test.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
recipes/openmathreasoning/scripts/simplified_recipe.py (2)
nemo_skills/pipeline/nemo_rl/sft.py (1)
  • sft_nemo_rl (214-498)
nemo_skills/pipeline/cli.py (1)
  • wrap_arguments (42-51)
🪛 Ruff (0.14.0)
recipes/openmathreasoning/scripts/simplified_recipe.py

27-27: Unused function argument: num_gpus

(ARG001)


27-27: Unused function argument: backend

(ARG001)


27-27: Unused function argument: wandb_params

(ARG001)


49-49: Unused function argument: backend

(ARG001)


143-143: Unused function argument: backend

(ARG001)


161-161: Unused function argument: backend

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pre-commit
  • GitHub Check: unit-tests
🔇 Additional comments (3)
recipes/openmathreasoning/scripts/simplified_recipe.py (3)

126-140: Good: backend plumbed into sft_nemo_rl and ctx assembled cleanly.


114-125: This review comment is based on a misunderstanding and should be disregarded.

NeMo RL explicitly supports "PyTorch FSDP2, TP, CP, and SP for efficient training", and supports "performant parallelisms with Megatron Core (TP/PP/CP/SP/EP/FSDP)". The FSDP2Manager flattens "data and context dimensions if context parallelism is enabled", confirming that FSDP and CP > 1 are compatible.

The actual incompatibility is between context parallelism and sequence packing (due to conflicting attention implementations), not between context parallelism and FSDP. Since the code already disables sequence_packing for FSDP (line 124), keeping context_parallel_size=2 for all backends is correct and does not cause issues.

No changes are needed.


126-140: Verify that cluster config includes HF_HOME in env_vars before running this recipe.

The review comment is valid. The sft_nemo_rl function validates HF_HOME when backend is "megatron", and this recipe defaults to the megatron backend (line 213: default="megatron"). If HF_HOME is missing from env_variables, sft_nemo_rl raises a RuntimeError directing users to add HF_HOME=/your/path/to/hf_home to the cluster config's env_vars. The validation also checks that the HF_HOME path is mounted.

Developers must ensure their cluster config (passed to run_training() via the cluster parameter) includes HF_HOME properly configured to avoid runtime failure when sft_nemo_rl() is called. This is a configuration requirement, not a code defect.

@Kipok Kipok left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the current version repeats the whole script for different backend. This is how it was before, but it's inefficient because we don't need to repeat baseline evals / generation two times as they are identical. Could you please change it such that instead it uses backend as a list, so by default it's running only megatron, but we can configure it to run both megatron and fsdp inside the same job in parallel and then do two final evals for both of them. But the initial steps will be shared

@wedu-nvidia

Copy link
Copy Markdown
Collaborator Author

OK, that's a better idea and I will make changes on this.

Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
tests/slurm-tests/omr_simple_recipe/check_results.py (1)

56-75: Avoid reloading baseline per-backend; hoist and use f-strings.

Load baseline_results once per benchmark and switch to clearer f-strings.

-    for training_backend in args.backend:
-        for benchmark in ("aime24", "aime25"):
-            common_path = Path(args.workspace) / "evals"
-            baseline_results = load_json(common_path / "baseline" / "eval-results" / benchmark / "metrics.json")
-            after_training_results = load_json(
-                common_path
-                / "after-training-{}".format(training_backend)
-                / "eval-results"
-                / benchmark
-                / "metrics.json"
-            )
-            check_results(benchmark, baseline_results, after_training_results, training_backend)
+    common_path = Path(args.workspace) / "evals"
+    for benchmark in ("aime24", "aime25"):
+        baseline_results = load_json(common_path / "baseline" / "eval-results" / benchmark / "metrics.json")
+        for training_backend in args.backend:
+            after_training_results = load_json(
+                common_path
+                / f"after-training-{training_backend}"
+                / "eval-results"
+                / benchmark
+                / "metrics.json"
+            )
+            check_results(benchmark, baseline_results, after_training_results, training_backend)
recipes/openmathreasoning/scripts/simplified_recipe.py (2)

27-31: Silence Ruff ARG001 for intentionally unused params.

These functions accept backend/num_gpus/wandb_params to keep a uniform call signature but don’t use them. Add # noqa: ARG001 on the def lines to keep the API without linter noise.

-def prepare(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):
+def prepare(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):  # noqa: ARG001
@@
-def run_sdg(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):
+def run_sdg(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):  # noqa: ARG001
@@
-def initial_eval(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):
+def initial_eval(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):  # noqa: ARG001

Also applies to: 49-55, 162-177


27-41: Stale comment: this step is wrapped in run_cmd.

Line 28 says “not wrapping with run_cmd”, but below you do call run_cmd. Update the comment to avoid confusion.

-    # data preparation needs to run locally without container, so not wrapping with run_cmd
+    # Data preparation assets are fetched via run_cmd (non-containerized script execution).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d694d41 and c540577.

📒 Files selected for processing (3)
  • recipes/openmathreasoning/scripts/simplified_recipe.py (6 hunks)
  • tests/slurm-tests/omr_simple_recipe/check_results.py (2 hunks)
  • tests/slurm-tests/omr_simple_recipe/run_test.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/slurm-tests/omr_simple_recipe/run_test.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/slurm-tests/omr_simple_recipe/check_results.py (2)
tests/slurm-tests/omr_simple_recipe/run_test.py (1)
  • main (20-63)
tests/slurm-tests/utils.py (1)
  • load_json (18-21)
recipes/openmathreasoning/scripts/simplified_recipe.py (2)
nemo_skills/pipeline/nemo_rl/sft.py (1)
  • sft_nemo_rl (214-498)
nemo_skills/pipeline/cli.py (1)
  • wrap_arguments (42-51)
🪛 Ruff (0.14.1)
recipes/openmathreasoning/scripts/simplified_recipe.py

27-27: Unused function argument: num_gpus

(ARG001)


27-27: Unused function argument: backend

(ARG001)


27-27: Unused function argument: wandb_params

(ARG001)


49-49: Unused function argument: backend

(ARG001)


162-162: Unused function argument: backend

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unit-tests
🔇 Additional comments (1)
tests/slurm-tests/omr_simple_recipe/check_results.py (1)

35-51: Backend-aware assertion context looks good.

Including the backend in the after_training assertion message improves debuggability. No functional concerns.

Comment thread recipes/openmathreasoning/scripts/simplified_recipe.py Outdated
Comment on lines +146 to +159
eval(
ctx=wrap_arguments("++inference.tokens_to_generate=16384 "),
cluster=cluster,
model=f"{workspace}/training/qwen2.5-14b-improved-hf-{training_backend}",
server_type="vllm",
server_gpus=num_gpus,
benchmarks="aime24:8,aime25:8",
output_dir=f"{workspace}/evals/after-training-{training_backend}",
num_jobs=1,
expname=f"{expname_prefix}-final-eval",
run_after=f"{expname_prefix}-training-{training_backend}",
wandb_name=f"{expname_prefix}-final-eval" if not wandb_params["disable_wandb"] else None,
wandb_project=wandb_params["wandb_project"],
)

@coderabbitai coderabbitai Bot Oct 20, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Potential duplicate expname across backends in final_eval. Please verify.

Multiple eval(...) calls use the same expname=f"{expname_prefix}-final-eval". If the scheduler requires unique experiment names, this can collide or break run_after dependencies.

Option if uniqueness is required:

-            expname=f"{expname_prefix}-final-eval",
+            expname=f"{expname_prefix}-final-eval-{training_backend}",

Run this check to see how eval constructs task names and whether expname must be unique:


🏁 Script executed:

#!/bin/bash
# Inspect eval entrypoint/task naming to assess expname uniqueness requirements
rg -nP -C3 '\bdef\s+eval\b|\badd_task\(|task_name=|expname=' nemo_skills/pipeline -g '!**/tests/**'

Length of output: 22975



🏁 Script executed:

#!/bin/bash
# Inspect get_exp function to understand expname uniqueness requirements
rg -nP -A20 'def get_exp' nemo_skills/pipeline/utils/exp.py

Length of output: 1989


🏁 Script executed:

#!/bin/bash
# Check how _reuse_exp parameter is used in the pipeline
rg -nP '_reuse_exp|reuse_code_exp' nemo_skills/pipeline -g '!**/tests/**' | head -50

Length of output: 5232


🏁 Script executed:

#!/bin/bash
# Search for documentation or comments about experiment naming uniqueness
rg -nP -B2 -A5 'experiment.*name|expname.*unique' nemo_skills/pipeline -g '!**/tests/**'

Length of output: 14431



🏁 Script executed:

#!/bin/bash
# Inspect the simplified_recipe.py context to see how eval() is called in a loop and what variables control it
cat recipes/openmathreasoning/scripts/simplified_recipe.py | sed -n '100,180p'

Length of output: 3228


Include {training_backend} suffix in final_eval expname to avoid collision across backend iterations.

The final_eval() function loops over each training_backend but uses the same expname=f"{expname_prefix}-final-eval" for all backends. Since run_after correctly targets different training experiments per backend (...training-{training_backend}), each eval invocation should have a distinct experiment name:

-            expname=f"{expname_prefix}-final-eval",
+            expname=f"{expname_prefix}-final-eval-{training_backend}",

This matches the pattern used in sft_nemo_rl() where each training backend gets its own experiment name. Without this, multiple eval() calls with identical names will conflict within the scheduler.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
eval(
ctx=wrap_arguments("++inference.tokens_to_generate=16384 "),
cluster=cluster,
model=f"{workspace}/training/qwen2.5-14b-improved-hf-{training_backend}",
server_type="vllm",
server_gpus=num_gpus,
benchmarks="aime24:8,aime25:8",
output_dir=f"{workspace}/evals/after-training-{training_backend}",
num_jobs=1,
expname=f"{expname_prefix}-final-eval",
run_after=f"{expname_prefix}-training-{training_backend}",
wandb_name=f"{expname_prefix}-final-eval" if not wandb_params["disable_wandb"] else None,
wandb_project=wandb_params["wandb_project"],
)
eval(
ctx=wrap_arguments("++inference.tokens_to_generate=16384 "),
cluster=cluster,
model=f"{workspace}/training/qwen2.5-14b-improved-hf-{training_backend}",
server_type="vllm",
server_gpus=num_gpus,
benchmarks="aime24:8,aime25:8",
output_dir=f"{workspace}/evals/after-training-{training_backend}",
num_jobs=1,
expname=f"{expname_prefix}-final-eval-{training_backend}",
run_after=f"{expname_prefix}-training-{training_backend}",
wandb_name=f"{expname_prefix}-final-eval" if not wandb_params["disable_wandb"] else None,
wandb_project=wandb_params["wandb_project"],
)
🤖 Prompt for AI Agents
In recipes/openmathreasoning/scripts/simplified_recipe.py around lines 146 to
159, the eval() calls use a fixed expname f"{expname_prefix}-final-eval" (and
the same wandb_name) for every training_backend, causing name collisions across
backend iterations; modify the expname to include the training_backend suffix
(e.g., f"{expname_prefix}-final-eval-{training_backend}") and do the same for
wandb_name when wandb is enabled so each backend’s final eval has a unique
experiment name.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wedu-nvidia let's do this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Signed-off-by: Wei Du <wedu@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
recipes/openmathreasoning/scripts/simplified_recipe.py (1)

146-160: CRITICAL: Duplicate expname collision across backends remains unresolved.

Lines 156 and 158 still use the same expname and wandb_name for all backend iterations, causing scheduler name collisions. This issue was previously flagged but not addressed.

Apply this diff to fix the collision:

         eval(
             ctx=wrap_arguments("++inference.tokens_to_generate=16384 "),
             cluster=cluster,
             model=f"{workspace}/training-{training_backend}/qwen2.5-14b-improved-hf",
             server_type="vllm",
             server_gpus=num_gpus,
             benchmarks="aime24:8,aime25:8",
             output_dir=f"{workspace}/evals/after-training-{training_backend}",
             num_jobs=1,
-            expname=f"{expname_prefix}-final-eval",
+            expname=f"{expname_prefix}-final-eval-{training_backend}",
             run_after=f"{expname_prefix}-training-{training_backend}",
-            wandb_name=f"{expname_prefix}-final-eval" if not wandb_params["disable_wandb"] else None,
+            wandb_name=f"{expname_prefix}-final-eval-{training_backend}" if not wandb_params["disable_wandb"] else None,
             wandb_project=wandb_params["wandb_project"],
         )
🧹 Nitpick comments (2)
recipes/openmathreasoning/scripts/simplified_recipe.py (2)

27-27: Unused parameters in function signature.

The backend, num_gpus, and wandb_params parameters are not used in this function. If they are included for signature consistency across recipe functions, consider documenting this intent or removing them if unnecessary.


49-49: Unused backend parameter.

The backend parameter is not used in this function. If included for signature consistency, consider documenting this or removing if unnecessary.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c540577 and 72dfacf.

📒 Files selected for processing (1)
  • recipes/openmathreasoning/scripts/simplified_recipe.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
recipes/openmathreasoning/scripts/simplified_recipe.py (3)
nemo_skills/pipeline/nemo_rl/sft.py (1)
  • sft_nemo_rl (214-498)
nemo_skills/pipeline/cli.py (1)
  • wrap_arguments (42-51)
nemo_skills/pipeline/eval.py (1)
  • eval (42-545)
🪛 Ruff (0.14.1)
recipes/openmathreasoning/scripts/simplified_recipe.py

27-27: Unused function argument: num_gpus

(ARG001)


27-27: Unused function argument: backend

(ARG001)


27-27: Unused function argument: wandb_params

(ARG001)


49-49: Unused function argument: backend

(ARG001)


163-163: Unused function argument: backend

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit-tests
  • GitHub Check: pre-commit
🔇 Additional comments (4)
recipes/openmathreasoning/scripts/simplified_recipe.py (4)

113-141: LGTM! Previous issues resolved.

The training loop correctly addresses the previously flagged concerns:

  • base_args is copied for each backend iteration (line 123), preventing FSDP-specific flags from affecting subsequent backends
  • Backend-specific output_dir (line 130) prevents checkpoint/log collisions
  • Backend-specific expname (line 138) ensures unique experiment names

163-163: Unused backend parameter is acceptable here.

The backend parameter is unused in initial_eval, which is appropriate since this function evaluates the baseline model before any backend-specific training occurs. The parameter is likely included for signature consistency with other recipe functions.


213-219: LGTM! CLI backend argument properly configured.

The --backend argument correctly supports multiple backends with appropriate validation and defaults.


227-239: LGTM! Backend parameter correctly propagated.

The backend argument is properly passed through all recipe function calls via the args tuple.

Comment on lines +146 to +159
eval(
ctx=wrap_arguments("++inference.tokens_to_generate=16384 "),
cluster=cluster,
model=f"{workspace}/training/qwen2.5-14b-improved-hf-{training_backend}",
server_type="vllm",
server_gpus=num_gpus,
benchmarks="aime24:8,aime25:8",
output_dir=f"{workspace}/evals/after-training-{training_backend}",
num_jobs=1,
expname=f"{expname_prefix}-final-eval",
run_after=f"{expname_prefix}-training-{training_backend}",
wandb_name=f"{expname_prefix}-final-eval" if not wandb_params["disable_wandb"] else None,
wandb_project=wandb_params["wandb_project"],
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wedu-nvidia let's do this

f"{args.expname_prefix}-final-eval",
f"{args.expname_prefix}-baseline-eval",
],
reuse_code=False,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not change, I am not sure it got changed, I will revise back.



def prepare(workspace, cluster, num_gpus, expname_prefix, wandb_params):
def prepare(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need backend in this and run_sdg and initial_eval?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought original used it for convenience so that it can pass *args. Now I reivsed it.

Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
@Kipok Kipok merged commit 30b4527 into main Oct 25, 2025
5 checks passed
@Kipok Kipok deleted the wedu/omr-receipe branch October 25, 2025 16:46
dgtm777 pushed a commit that referenced this pull request Oct 29, 2025
Signed-off-by: Wei Du <wedu@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants