Customize num_tasks per node in run_cmd#965
Conversation
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
WalkthroughThe PR adds a configurable Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
nemo_skills/pipeline/run_cmd.py (1)
59-59: Consider adding validation for positive values.The new
num_tasksparameter looks good and preserves backward compatibility with its default value of 1. However, consider adding validation to ensurenum_tasks > 0to prevent invalid configurations.Apply this diff to add validation:
+ if num_tasks <= 0: + raise ValueError("num_tasks must be a positive integer") + model: str = typer.Option(None, help="Path to the model to evaluate"),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_skills/pipeline/run_cmd.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit-tests
- GitHub Check: pre-commit
🔇 Additional comments (2)
nemo_skills/pipeline/run_cmd.py (2)
57-57: LGTM: Clarified help text.The updated help text correctly specifies that
num_gpusapplies per node, improving consistency with the newnum_tasksparameter.
202-202: No issues found - implementation is correct.The implementation properly handles the
num_tasksparameter. Theadd_taskfunction signature showsnum_tasks: int | list[int], and line 511-515 inexp.pyconfirms it handles both scalar and list inputs. When multiple commands are passed torun_cmd, converting the scalarnum_tasksto a list matching the command count (line 202) is the intended design—all commands in a single invocation use the same resource constraints. This is not a limitation but proper API design.
| qos: str = typer.Option(None, help="Specify Slurm QoS, e.g. to request interactive nodes"), | ||
| time_min: str = typer.Option(None, help="If specified, will use as a time-min slurm parameter"), | ||
| num_gpus: int | None = typer.Option(None, help="Number of GPUs to use"), | ||
| num_gpus: int | None = typer.Option(None, help="Number of GPUs per node to use"), |
There was a problem hiding this comment.
might be a good idea to update message in all pipelines, not just here
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Allow customizing
num_tasks(num tasks per node) inrun_cmd.Default current value is preserved.
Summary by CodeRabbit
Release Notes
New Features
Documentation