Skip to content

Customize num_tasks per node in run_cmd#965

Merged
artbataev merged 2 commits into
mainfrom
customize_num_tasks
Oct 17, 2025
Merged

Customize num_tasks per node in run_cmd#965
artbataev merged 2 commits into
mainfrom
customize_num_tasks

Conversation

@artbataev

@artbataev artbataev commented Oct 17, 2025

Copy link
Copy Markdown
Collaborator

Allow customizing num_tasks (num tasks per node) in run_cmd.
Default current value is preserved.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added configurable parameter to specify the number of tasks per node (default: 1).
  • Documentation

    • Clarified GPU allocation option to reflect per-node configuration rather than total allocation.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
@artbataev artbataev requested a review from Kipok October 17, 2025 18:49
@coderabbitai

coderabbitai Bot commented Oct 17, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

The PR adds a configurable num_tasks parameter to the run_cmd function with a default value of 1, allowing users to specify tasks per node. Help text for num_gpus is clarified to indicate it applies per-node. Task allocation logic is updated to use the new parameter.

Changes

Cohort / File(s) Summary
Task and GPU per-node parameterization
nemo_skills/pipeline/run_cmd.py
Added new num_tasks option (default 1) for tasks per node; clarified num_gpus help text to specify "per node"; updated add_task invocation to propagate user-provided num_tasks across all commands instead of hardcoding to 1

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 Per-node tasks now dance with flair,
Configuration flows through the air—
Tasks multiply, GPUs too,
Each one labeled, bright and new,
Clarity blooms in every share! 🌟

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Customize num_tasks per node in run_cmd" directly and accurately reflects the main changes in the changeset. The title clearly identifies what is being modified (the num_tasks parameter), the scope (per node), and the function affected (run_cmd). It is concise, specific, and uses proper code formatting with backticks, avoiding vague language or noise. The title aligns well with the PR objectives, which confirm that the change adds the ability to customize the number of tasks per node in the run_cmd function.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch customize_num_tasks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
nemo_skills/pipeline/run_cmd.py (1)

59-59: Consider adding validation for positive values.

The new num_tasks parameter looks good and preserves backward compatibility with its default value of 1. However, consider adding validation to ensure num_tasks > 0 to prevent invalid configurations.

Apply this diff to add validation:

+    if num_tasks <= 0:
+        raise ValueError("num_tasks must be a positive integer")
+
     model: str = typer.Option(None, help="Path to the model to evaluate"),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e49453f and 45fb504.

📒 Files selected for processing (1)
  • nemo_skills/pipeline/run_cmd.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit-tests
  • GitHub Check: pre-commit
🔇 Additional comments (2)
nemo_skills/pipeline/run_cmd.py (2)

57-57: LGTM: Clarified help text.

The updated help text correctly specifies that num_gpus applies per node, improving consistency with the new num_tasks parameter.


202-202: No issues found - implementation is correct.

The implementation properly handles the num_tasks parameter. The add_task function signature shows num_tasks: int | list[int], and line 511-515 in exp.py confirms it handles both scalar and list inputs. When multiple commands are passed to run_cmd, converting the scalar num_tasks to a list matching the command count (line 202) is the intended design—all commands in a single invocation use the same resource constraints. This is not a limitation but proper API design.

@Kipok Kipok left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

qos: str = typer.Option(None, help="Specify Slurm QoS, e.g. to request interactive nodes"),
time_min: str = typer.Option(None, help="If specified, will use as a time-min slurm parameter"),
num_gpus: int | None = typer.Option(None, help="Number of GPUs to use"),
num_gpus: int | None = typer.Option(None, help="Number of GPUs per node to use"),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a good idea to update message in all pipelines, not just here

@artbataev artbataev merged commit 7019ff8 into main Oct 17, 2025
7 checks passed
@artbataev artbataev deleted the customize_num_tasks branch October 17, 2025 19:13
SeanNaren pushed a commit to SeanNaren/NeMo-Skills that referenced this pull request Oct 22, 2025
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Oct 29, 2025
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants