Update omr receipe#973
Conversation
WalkthroughA new multi-backend parameter was added and propagated through the OpenMath reasoning recipe: CLI Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI
participant Recipe as simplified_recipe
participant Prepare
participant InitialEval
participant SDG as run_sdg
participant Train as run_training
participant FinalEval
CLI->>Recipe: invoke with --backend [b1,b2,...]
loop for each backend b
Recipe->>Prepare: prepare(..., backend=b)
Recipe->>InitialEval: initial_eval(..., backend=b)
Recipe->>SDG: run_sdg(..., backend=b)
Recipe->>Train: run_training(..., backend=b)
Train-->>Recipe: model_path(b)
Recipe->>FinalEval: final_eval(..., backend=b, model=model_path(b))
end
Note right of Recipe #DDDDFF: per-backend artifacts and outputs created
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@Kipok I tested the full loop and the results are still in the range, so no need to update the range. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (9)
recipes/openmathreasoning/scripts/simplified_recipe.py (7)
27-47: Silence ARG001 by prefixing unused params in prepare.num_gpus, backend, wandb_params aren’t used here. Keep the uniform signature but underscore them.
-def prepare(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): +def prepare(workspace, cluster, _num_gpus, expname_prefix, _backend, _wandb_params):
32-40: Pin or parameterize download ref to avoid HEAD drift.Fetching from refs/heads/main is brittle in CI. Allow overriding ref or pin to a commit/tag.
- f"export DOWNLOAD_PREFIX=https://raw.githubusercontent.com/NVIDIA-NeMo/Skills/refs/heads/main/recipes/openmathreasoning && " + f"export DOWNLOAD_REF=${{DOWNLOAD_REF:-refs/heads/main}} && " + f"export DOWNLOAD_PREFIX=https://raw.githubusercontent.com/NVIDIA-NeMo/Skills/${{DOWNLOAD_REF}}/recipes/openmathreasoning && "
49-90: Underscore unused backend in run_sdg to appease Ruff.-def run_sdg(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): +def run_sdg(workspace, cluster, num_gpus, expname_prefix, _backend, wandb_params):
143-159: Underscore unused backend in final_eval.-def final_eval(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): +def final_eval(workspace, cluster, num_gpus, expname_prefix, _backend, wandb_params):
161-176: Underscore unused backend in initial_eval.-def initial_eval(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): +def initial_eval(workspace, cluster, num_gpus, expname_prefix, _backend, wandb_params):
211-216: Validate backend via argparse choices.- parser.add_argument( - "--backend", - type=str, - default="megatron", - help="Can either be megatron or fsdp", - ) + parser.add_argument( + "--backend", + type=str, + choices=("megatron", "fsdp"), + default="megatron", + help="Training backend", + )
63-63: Wraprun_aftervalues in lists to match type hints.The
run_afterparameter ingenerate()andsft_nemo_rl()is typed asList[str], but the current code passes bare strings. While the runtime includes defensiveisinstancechecks to convert strings to lists, following the type hints improves consistency and type-checking compatibility.- run_after=f"{expname_prefix}-download-assets", + run_after=[f"{expname_prefix}-download-assets"], @@ - run_after=f"{expname_prefix}-problem-extraction", + run_after=[f"{expname_prefix}-problem-extraction"], @@ - run_after=f"{expname_prefix}-solution-generation", + run_after=[f"{expname_prefix}-solution-generation"], @@ - run_after=f"{expname_prefix}-prepare-training-data", + run_after=[f"{expname_prefix}-prepare-training-data"], @@ - run_after=f"{expname_prefix}-training", + run_after=[f"{expname_prefix}-training"], @@ - run_after=f"{expname_prefix}-download-assets", + run_after=[f"{expname_prefix}-download-assets"],Applies to lines: 63, 81, 108, 138, 155, 173
tests/slurm-tests/omr_simple_recipe/run_test.py (2)
27-27: Validate backend choices in test CLI for early failure.- ap.add_argument("--backend", type=str, default="megatron", help="Can either be megatron or fsdp") + ap.add_argument("--backend", type=str, choices=("megatron", "fsdp"), default="megatron", help="Training backend")
30-44: Avoid shell=True; pass argv list to subprocess.run.Safer and avoids quoting bugs.
- cmd = ( - f"python -m recipes.openmathreasoning.scripts.simplified_recipe " - f" --cluster {args.cluster} " - f" --workspace {args.workspace} " - f" --expname_prefix {args.expname_prefix} " - f" --backend {args.backend} " - ) - - if args.disable_wandb: - cmd += " --disable_wandb " - elif args.wandb_project: - cmd += f" --wandb_project {args.wandb_project} " - - subprocess.run(cmd, shell=True, check=True) + cmd = [ + "python", "-m", "recipes.openmathreasoning.scripts.simplified_recipe", + "--cluster", args.cluster, + "--workspace", args.workspace, + "--expname_prefix", args.expname_prefix, + "--backend", args.backend, + ] + if args.disable_wandb: + cmd.append("--disable_wandb") + elif args.wandb_project: + cmd += ["--wandb_project", args.wandb_project] + + subprocess.run(cmd, check=True)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
recipes/openmathreasoning/scripts/simplified_recipe.py(8 hunks)tests/slurm-tests/omr_simple_recipe/run_test.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
recipes/openmathreasoning/scripts/simplified_recipe.py (2)
nemo_skills/pipeline/nemo_rl/sft.py (1)
sft_nemo_rl(214-498)nemo_skills/pipeline/cli.py (1)
wrap_arguments(42-51)
🪛 Ruff (0.14.0)
recipes/openmathreasoning/scripts/simplified_recipe.py
27-27: Unused function argument: num_gpus
(ARG001)
27-27: Unused function argument: backend
(ARG001)
27-27: Unused function argument: wandb_params
(ARG001)
49-49: Unused function argument: backend
(ARG001)
143-143: Unused function argument: backend
(ARG001)
161-161: Unused function argument: backend
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (3)
recipes/openmathreasoning/scripts/simplified_recipe.py (3)
126-140: Good: backend plumbed into sft_nemo_rl and ctx assembled cleanly.
114-125: This review comment is based on a misunderstanding and should be disregarded.NeMo RL explicitly supports "PyTorch FSDP2, TP, CP, and SP for efficient training", and supports "performant parallelisms with Megatron Core (TP/PP/CP/SP/EP/FSDP)". The FSDP2Manager flattens "data and context dimensions if context parallelism is enabled", confirming that FSDP and CP > 1 are compatible.
The actual incompatibility is between context parallelism and sequence packing (due to conflicting attention implementations), not between context parallelism and FSDP. Since the code already disables sequence_packing for FSDP (line 124), keeping context_parallel_size=2 for all backends is correct and does not cause issues.
No changes are needed.
126-140: Verify that cluster config includes HF_HOME in env_vars before running this recipe.The review comment is valid. The sft_nemo_rl function validates HF_HOME when backend is "megatron", and this recipe defaults to the megatron backend (line 213:
default="megatron"). If HF_HOME is missing from env_variables, sft_nemo_rl raises a RuntimeError directing users to addHF_HOME=/your/path/to/hf_hometo the cluster config's env_vars. The validation also checks that the HF_HOME path is mounted.Developers must ensure their cluster config (passed to
run_training()via theclusterparameter) includes HF_HOME properly configured to avoid runtime failure when sft_nemo_rl() is called. This is a configuration requirement, not a code defect.
Kipok
left a comment
There was a problem hiding this comment.
so the current version repeats the whole script for different backend. This is how it was before, but it's inefficient because we don't need to repeat baseline evals / generation two times as they are identical. Could you please change it such that instead it uses backend as a list, so by default it's running only megatron, but we can configure it to run both megatron and fsdp inside the same job in parallel and then do two final evals for both of them. But the initial steps will be shared
|
OK, that's a better idea and I will make changes on this. |
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
tests/slurm-tests/omr_simple_recipe/check_results.py (1)
56-75: Avoid reloading baseline per-backend; hoist and use f-strings.Load
baseline_resultsonce per benchmark and switch to clearer f-strings.- for training_backend in args.backend: - for benchmark in ("aime24", "aime25"): - common_path = Path(args.workspace) / "evals" - baseline_results = load_json(common_path / "baseline" / "eval-results" / benchmark / "metrics.json") - after_training_results = load_json( - common_path - / "after-training-{}".format(training_backend) - / "eval-results" - / benchmark - / "metrics.json" - ) - check_results(benchmark, baseline_results, after_training_results, training_backend) + common_path = Path(args.workspace) / "evals" + for benchmark in ("aime24", "aime25"): + baseline_results = load_json(common_path / "baseline" / "eval-results" / benchmark / "metrics.json") + for training_backend in args.backend: + after_training_results = load_json( + common_path + / f"after-training-{training_backend}" + / "eval-results" + / benchmark + / "metrics.json" + ) + check_results(benchmark, baseline_results, after_training_results, training_backend)recipes/openmathreasoning/scripts/simplified_recipe.py (2)
27-31: Silence Ruff ARG001 for intentionally unused params.These functions accept
backend/num_gpus/wandb_paramsto keep a uniform call signature but don’t use them. Add# noqa: ARG001on the def lines to keep the API without linter noise.-def prepare(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): +def prepare(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): # noqa: ARG001 @@ -def run_sdg(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): +def run_sdg(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): # noqa: ARG001 @@ -def initial_eval(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): +def initial_eval(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): # noqa: ARG001Also applies to: 49-55, 162-177
27-41: Stale comment: this step is wrapped in run_cmd.Line 28 says “not wrapping with run_cmd”, but below you do call
run_cmd. Update the comment to avoid confusion.- # data preparation needs to run locally without container, so not wrapping with run_cmd + # Data preparation assets are fetched via run_cmd (non-containerized script execution).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
recipes/openmathreasoning/scripts/simplified_recipe.py(6 hunks)tests/slurm-tests/omr_simple_recipe/check_results.py(2 hunks)tests/slurm-tests/omr_simple_recipe/run_test.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/slurm-tests/omr_simple_recipe/run_test.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/slurm-tests/omr_simple_recipe/check_results.py (2)
tests/slurm-tests/omr_simple_recipe/run_test.py (1)
main(20-63)tests/slurm-tests/utils.py (1)
load_json(18-21)
recipes/openmathreasoning/scripts/simplified_recipe.py (2)
nemo_skills/pipeline/nemo_rl/sft.py (1)
sft_nemo_rl(214-498)nemo_skills/pipeline/cli.py (1)
wrap_arguments(42-51)
🪛 Ruff (0.14.1)
recipes/openmathreasoning/scripts/simplified_recipe.py
27-27: Unused function argument: num_gpus
(ARG001)
27-27: Unused function argument: backend
(ARG001)
27-27: Unused function argument: wandb_params
(ARG001)
49-49: Unused function argument: backend
(ARG001)
162-162: Unused function argument: backend
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (1)
tests/slurm-tests/omr_simple_recipe/check_results.py (1)
35-51: Backend-aware assertion context looks good.Including the backend in the after_training assertion message improves debuggability. No functional concerns.
| eval( | ||
| ctx=wrap_arguments("++inference.tokens_to_generate=16384 "), | ||
| cluster=cluster, | ||
| model=f"{workspace}/training/qwen2.5-14b-improved-hf-{training_backend}", | ||
| server_type="vllm", | ||
| server_gpus=num_gpus, | ||
| benchmarks="aime24:8,aime25:8", | ||
| output_dir=f"{workspace}/evals/after-training-{training_backend}", | ||
| num_jobs=1, | ||
| expname=f"{expname_prefix}-final-eval", | ||
| run_after=f"{expname_prefix}-training-{training_backend}", | ||
| wandb_name=f"{expname_prefix}-final-eval" if not wandb_params["disable_wandb"] else None, | ||
| wandb_project=wandb_params["wandb_project"], | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
Potential duplicate expname across backends in final_eval. Please verify.
Multiple eval(...) calls use the same expname=f"{expname_prefix}-final-eval". If the scheduler requires unique experiment names, this can collide or break run_after dependencies.
Option if uniqueness is required:
- expname=f"{expname_prefix}-final-eval",
+ expname=f"{expname_prefix}-final-eval-{training_backend}",Run this check to see how eval constructs task names and whether expname must be unique:
🏁 Script executed:
#!/bin/bash
# Inspect eval entrypoint/task naming to assess expname uniqueness requirements
rg -nP -C3 '\bdef\s+eval\b|\badd_task\(|task_name=|expname=' nemo_skills/pipeline -g '!**/tests/**'Length of output: 22975
🏁 Script executed:
#!/bin/bash
# Inspect get_exp function to understand expname uniqueness requirements
rg -nP -A20 'def get_exp' nemo_skills/pipeline/utils/exp.pyLength of output: 1989
🏁 Script executed:
#!/bin/bash
# Check how _reuse_exp parameter is used in the pipeline
rg -nP '_reuse_exp|reuse_code_exp' nemo_skills/pipeline -g '!**/tests/**' | head -50Length of output: 5232
🏁 Script executed:
#!/bin/bash
# Search for documentation or comments about experiment naming uniqueness
rg -nP -B2 -A5 'experiment.*name|expname.*unique' nemo_skills/pipeline -g '!**/tests/**'Length of output: 14431
🏁 Script executed:
#!/bin/bash
# Inspect the simplified_recipe.py context to see how eval() is called in a loop and what variables control it
cat recipes/openmathreasoning/scripts/simplified_recipe.py | sed -n '100,180p'Length of output: 3228
Include {training_backend} suffix in final_eval expname to avoid collision across backend iterations.
The final_eval() function loops over each training_backend but uses the same expname=f"{expname_prefix}-final-eval" for all backends. Since run_after correctly targets different training experiments per backend (...training-{training_backend}), each eval invocation should have a distinct experiment name:
- expname=f"{expname_prefix}-final-eval",
+ expname=f"{expname_prefix}-final-eval-{training_backend}",This matches the pattern used in sft_nemo_rl() where each training backend gets its own experiment name. Without this, multiple eval() calls with identical names will conflict within the scheduler.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| eval( | |
| ctx=wrap_arguments("++inference.tokens_to_generate=16384 "), | |
| cluster=cluster, | |
| model=f"{workspace}/training/qwen2.5-14b-improved-hf-{training_backend}", | |
| server_type="vllm", | |
| server_gpus=num_gpus, | |
| benchmarks="aime24:8,aime25:8", | |
| output_dir=f"{workspace}/evals/after-training-{training_backend}", | |
| num_jobs=1, | |
| expname=f"{expname_prefix}-final-eval", | |
| run_after=f"{expname_prefix}-training-{training_backend}", | |
| wandb_name=f"{expname_prefix}-final-eval" if not wandb_params["disable_wandb"] else None, | |
| wandb_project=wandb_params["wandb_project"], | |
| ) | |
| eval( | |
| ctx=wrap_arguments("++inference.tokens_to_generate=16384 "), | |
| cluster=cluster, | |
| model=f"{workspace}/training/qwen2.5-14b-improved-hf-{training_backend}", | |
| server_type="vllm", | |
| server_gpus=num_gpus, | |
| benchmarks="aime24:8,aime25:8", | |
| output_dir=f"{workspace}/evals/after-training-{training_backend}", | |
| num_jobs=1, | |
| expname=f"{expname_prefix}-final-eval-{training_backend}", | |
| run_after=f"{expname_prefix}-training-{training_backend}", | |
| wandb_name=f"{expname_prefix}-final-eval" if not wandb_params["disable_wandb"] else None, | |
| wandb_project=wandb_params["wandb_project"], | |
| ) |
🤖 Prompt for AI Agents
In recipes/openmathreasoning/scripts/simplified_recipe.py around lines 146 to
159, the eval() calls use a fixed expname f"{expname_prefix}-final-eval" (and
the same wandb_name) for every training_backend, causing name collisions across
backend iterations; modify the expname to include the training_backend suffix
(e.g., f"{expname_prefix}-final-eval-{training_backend}") and do the same for
wandb_name when wandb is enabled so each backend’s final eval has a unique
experiment name.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
recipes/openmathreasoning/scripts/simplified_recipe.py (1)
146-160: CRITICAL: Duplicate expname collision across backends remains unresolved.Lines 156 and 158 still use the same
expnameandwandb_namefor all backend iterations, causing scheduler name collisions. This issue was previously flagged but not addressed.Apply this diff to fix the collision:
eval( ctx=wrap_arguments("++inference.tokens_to_generate=16384 "), cluster=cluster, model=f"{workspace}/training-{training_backend}/qwen2.5-14b-improved-hf", server_type="vllm", server_gpus=num_gpus, benchmarks="aime24:8,aime25:8", output_dir=f"{workspace}/evals/after-training-{training_backend}", num_jobs=1, - expname=f"{expname_prefix}-final-eval", + expname=f"{expname_prefix}-final-eval-{training_backend}", run_after=f"{expname_prefix}-training-{training_backend}", - wandb_name=f"{expname_prefix}-final-eval" if not wandb_params["disable_wandb"] else None, + wandb_name=f"{expname_prefix}-final-eval-{training_backend}" if not wandb_params["disable_wandb"] else None, wandb_project=wandb_params["wandb_project"], )
🧹 Nitpick comments (2)
recipes/openmathreasoning/scripts/simplified_recipe.py (2)
27-27: Unused parameters in function signature.The
backend,num_gpus, andwandb_paramsparameters are not used in this function. If they are included for signature consistency across recipe functions, consider documenting this intent or removing them if unnecessary.
49-49: Unused backend parameter.The
backendparameter is not used in this function. If included for signature consistency, consider documenting this or removing if unnecessary.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
recipes/openmathreasoning/scripts/simplified_recipe.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
recipes/openmathreasoning/scripts/simplified_recipe.py (3)
nemo_skills/pipeline/nemo_rl/sft.py (1)
sft_nemo_rl(214-498)nemo_skills/pipeline/cli.py (1)
wrap_arguments(42-51)nemo_skills/pipeline/eval.py (1)
eval(42-545)
🪛 Ruff (0.14.1)
recipes/openmathreasoning/scripts/simplified_recipe.py
27-27: Unused function argument: num_gpus
(ARG001)
27-27: Unused function argument: backend
(ARG001)
27-27: Unused function argument: wandb_params
(ARG001)
49-49: Unused function argument: backend
(ARG001)
163-163: Unused function argument: backend
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit-tests
- GitHub Check: pre-commit
🔇 Additional comments (4)
recipes/openmathreasoning/scripts/simplified_recipe.py (4)
113-141: LGTM! Previous issues resolved.The training loop correctly addresses the previously flagged concerns:
base_argsis copied for each backend iteration (line 123), preventing FSDP-specific flags from affecting subsequent backends- Backend-specific
output_dir(line 130) prevents checkpoint/log collisions- Backend-specific
expname(line 138) ensures unique experiment names
163-163: Unused backend parameter is acceptable here.The
backendparameter is unused ininitial_eval, which is appropriate since this function evaluates the baseline model before any backend-specific training occurs. The parameter is likely included for signature consistency with other recipe functions.
213-219: LGTM! CLI backend argument properly configured.The
--backendargument correctly supports multiple backends with appropriate validation and defaults.
227-239: LGTM! Backend parameter correctly propagated.The backend argument is properly passed through all recipe function calls via the args tuple.
| eval( | ||
| ctx=wrap_arguments("++inference.tokens_to_generate=16384 "), | ||
| cluster=cluster, | ||
| model=f"{workspace}/training/qwen2.5-14b-improved-hf-{training_backend}", | ||
| server_type="vllm", | ||
| server_gpus=num_gpus, | ||
| benchmarks="aime24:8,aime25:8", | ||
| output_dir=f"{workspace}/evals/after-training-{training_backend}", | ||
| num_jobs=1, | ||
| expname=f"{expname_prefix}-final-eval", | ||
| run_after=f"{expname_prefix}-training-{training_backend}", | ||
| wandb_name=f"{expname_prefix}-final-eval" if not wandb_params["disable_wandb"] else None, | ||
| wandb_project=wandb_params["wandb_project"], | ||
| ) |
| f"{args.expname_prefix}-final-eval", | ||
| f"{args.expname_prefix}-baseline-eval", | ||
| ], | ||
| reuse_code=False, |
There was a problem hiding this comment.
It should not change, I am not sure it got changed, I will revise back.
|
|
||
|
|
||
| def prepare(workspace, cluster, num_gpus, expname_prefix, wandb_params): | ||
| def prepare(workspace, cluster, num_gpus, expname_prefix, backend, wandb_params): |
There was a problem hiding this comment.
why do we need backend in this and run_sdg and initial_eval?
There was a problem hiding this comment.
I thought original used it for convenience so that it can pass *args. Now I reivsed it.
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
#963
Summary by CodeRabbit
New Features
Tests