Skip to content

[Jobs] Support --num-jobs without a pool#9997

Open
Michaelvll wants to merge 4 commits into
masterfrom
num-jobs-without-pool
Open

[Jobs] Support --num-jobs without a pool#9997
Michaelvll wants to merge 4 commits into
masterfrom
num-jobs-without-pool

Conversation

@Michaelvll

Copy link
Copy Markdown
Collaborator

Summary

sky jobs launch --num-jobs N was restricted to pool submissions: the CLI, client SDK, and server each raised an error if --num-jobs was given without --pool. This PR lifts that restriction so --num-jobs N works without a pool, submitting N independent managed jobs (each on its own cluster).

The submission machinery was already pool-agnostic:

  • The local (consolidation) and remote submission paths both plumb num_jobs through with pool_hash=None.
  • SKYPILOT_NUM_JOBS and the per-job SKYPILOT_JOB_RANK env vars are set for all jobs regardless of pool.
  • Concurrency is already throttled by the jobs scheduler's default path (get_number_of_jobs_controllers()), so without a pool the jobs queue as PENDING and start as controller capacity frees — the same behavior as launching N jobs individually.

So the only change needed was removing the three guards, plus two output fixes the guards were masking:

  • The launch confirmation prompt now reports the job count regardless of pool (previously only set in the pool branch).
  • The "Jobs submitted" summary no longer emits pool-only hints when there is no pool — previously it showed a dashboard link filtered by value=None and sky jobs cancel --pool None. Without a pool it now links to the unfiltered jobs page and suggests sky jobs cancel <job-ids>.
  • A short notice is printed for the non-pool multi-job case.

Changes

  • sky/client/cli/command.py: remove the CLI guard; branch the multi-job summary hints on pool presence; add a non-pool multi-job notice.
  • sky/jobs/client/sdk.py: remove the client-side guard; report job count in the confirmation prompt regardless of pool.
  • sky/jobs/server/core.py: remove the server-side guard.
  • tests/unit_tests/test_sky/test_cli_launch_validation.py: tests that --num-jobs without --pool is accepted and reaches launch with pool=None, and that the multi-job output contains no pool-specific hints when there is no pool.

Test plan

  • pytest tests/unit_tests/test_sky/test_cli_launch_validation.py — passes.
  • Manual: sky jobs launch --num-jobs 3 'echo hi' (no --pool) against a remote API server creates 3 managed jobs, each on its own cluster, with distinct SKYPILOT_JOB_RANK (0–2) and SKYPILOT_NUM_JOBS=3; the summary shows the unfiltered "Show all jobs" link and a sky jobs cancel <ids> hint.

Michaelvll and others added 2 commits June 30, 2026 19:08
Previously --num-jobs was gated to pool submissions by explicit guards in
the CLI, client SDK, and server core. Everything downstream already handles
the no-pool case (pool_hash=None, SKYPILOT_NUM_JOBS / SKYPILOT_JOB_RANK are
set for all jobs, and the jobs scheduler's default path throttles concurrent
launches via get_number_of_jobs_controllers()).

Remove the three guards so --num-jobs N without a pool submits N independent
managed jobs, each on its own cluster. Also fix the confirmation prompt to
report the job count regardless of pool, and add a CLI notice for the
non-pool multi-job case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Asz8JDpagyCDhkcRwThYcY
When --num-jobs is used without a pool, the "Jobs submitted" summary showed
pool-only hints that referenced a non-existent pool: a dashboard link
filtered by `value=None` and `sky jobs cancel --pool None`.

Branch the multi-job hint output on whether a pool was given: link to the
unfiltered jobs page and suggest `sky jobs cancel <job-ids>` when there is
no pool.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Asz8JDpagyCDhkcRwThYcY

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request allows specifying --num-jobs without --pool when launching managed jobs, enabling users to submit multiple independent jobs on their own clusters. It updates the CLI messages, dashboard hints, and cancel hints to correctly handle cases where a pool is not specified, and adds corresponding unit tests. The feedback suggests handling the singular form correctly when num_jobs is explicitly set to 1 to avoid grammatically incorrect output like "1 managed jobs".

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sky/jobs/client/sdk.py
Comment on lines +96 to +97
job_identity = ('a managed job'
if num_jobs is None else f'{num_jobs} managed jobs')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When --num-jobs 1 is explicitly passed, the confirmation prompt will display '1 managed jobs', which is grammatically incorrect. Checking if num_jobs is None or 1 ensures the prompt correctly uses the singular form 'a managed job'.

Suggested change
job_identity = ('a managed job'
if num_jobs is None else f'{num_jobs} managed jobs')
job_identity = ('a managed job'
if num_jobs is None or num_jobs == 1 else f'{num_jobs} managed jobs')

Michaelvll and others added 2 commits June 30, 2026 22:12
With a large --num-jobs, the "Jobs submitted with IDs" line and the cancel
hint dumped every id (e.g. 800 ids), making the output unreadable.

Collapse contiguous ids into ranges (e.g. 2711-3510) for the submitted-ids
line, reusing a shared helper (promoted from a private core.py function to
sky.jobs.utils.format_job_ids_as_ranges). Replace the explicit id list in
the non-pool cancel hint with a `<job-ids>` placeholder, matching the
existing `<job-id>` log hints.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Asz8JDpagyCDhkcRwThYcY
Adds an end-to-end smoke test covering `sky jobs launch --num-jobs N`
without `--pool`: it asserts N independent managed jobs are submitted
(each on its own cluster), that the CLI prints the non-pool variants of
the launch hints (no `--pool None` / `value=None`), and that every job
sees SKYPILOT_NUM_JOBS=N with a distinct SKYPILOT_JOB_RANK in [0, N).

This guards against regressing the no-pool path now that the pool-only
guards on --num-jobs have been removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_015k54nUcmsbhVummaG9piZu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants