[Jobs] Support --num-jobs without a pool#9997
Conversation
Previously --num-jobs was gated to pool submissions by explicit guards in the CLI, client SDK, and server core. Everything downstream already handles the no-pool case (pool_hash=None, SKYPILOT_NUM_JOBS / SKYPILOT_JOB_RANK are set for all jobs, and the jobs scheduler's default path throttles concurrent launches via get_number_of_jobs_controllers()). Remove the three guards so --num-jobs N without a pool submits N independent managed jobs, each on its own cluster. Also fix the confirmation prompt to report the job count regardless of pool, and add a CLI notice for the non-pool multi-job case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Asz8JDpagyCDhkcRwThYcY
When --num-jobs is used without a pool, the "Jobs submitted" summary showed pool-only hints that referenced a non-existent pool: a dashboard link filtered by `value=None` and `sky jobs cancel --pool None`. Branch the multi-job hint output on whether a pool was given: link to the unfiltered jobs page and suggest `sky jobs cancel <job-ids>` when there is no pool. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Asz8JDpagyCDhkcRwThYcY
There was a problem hiding this comment.
Code Review
This pull request allows specifying --num-jobs without --pool when launching managed jobs, enabling users to submit multiple independent jobs on their own clusters. It updates the CLI messages, dashboard hints, and cancel hints to correctly handle cases where a pool is not specified, and adds corresponding unit tests. The feedback suggests handling the singular form correctly when num_jobs is explicitly set to 1 to avoid grammatically incorrect output like "1 managed jobs".
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| job_identity = ('a managed job' | ||
| if num_jobs is None else f'{num_jobs} managed jobs') |
There was a problem hiding this comment.
When --num-jobs 1 is explicitly passed, the confirmation prompt will display '1 managed jobs', which is grammatically incorrect. Checking if num_jobs is None or 1 ensures the prompt correctly uses the singular form 'a managed job'.
| job_identity = ('a managed job' | |
| if num_jobs is None else f'{num_jobs} managed jobs') | |
| job_identity = ('a managed job' | |
| if num_jobs is None or num_jobs == 1 else f'{num_jobs} managed jobs') |
With a large --num-jobs, the "Jobs submitted with IDs" line and the cancel hint dumped every id (e.g. 800 ids), making the output unreadable. Collapse contiguous ids into ranges (e.g. 2711-3510) for the submitted-ids line, reusing a shared helper (promoted from a private core.py function to sky.jobs.utils.format_job_ids_as_ranges). Replace the explicit id list in the non-pool cancel hint with a `<job-ids>` placeholder, matching the existing `<job-id>` log hints. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Asz8JDpagyCDhkcRwThYcY
Adds an end-to-end smoke test covering `sky jobs launch --num-jobs N` without `--pool`: it asserts N independent managed jobs are submitted (each on its own cluster), that the CLI prints the non-pool variants of the launch hints (no `--pool None` / `value=None`), and that every job sees SKYPILOT_NUM_JOBS=N with a distinct SKYPILOT_JOB_RANK in [0, N). This guards against regressing the no-pool path now that the pool-only guards on --num-jobs have been removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_015k54nUcmsbhVummaG9piZu
Summary
sky jobs launch --num-jobs Nwas restricted to pool submissions: the CLI, client SDK, and server each raised an error if--num-jobswas given without--pool. This PR lifts that restriction so--num-jobs Nworks without a pool, submitting N independent managed jobs (each on its own cluster).The submission machinery was already pool-agnostic:
num_jobsthrough withpool_hash=None.SKYPILOT_NUM_JOBSand the per-jobSKYPILOT_JOB_RANKenv vars are set for all jobs regardless of pool.get_number_of_jobs_controllers()), so without a pool the jobs queue asPENDINGand start as controller capacity frees — the same behavior as launching N jobs individually.So the only change needed was removing the three guards, plus two output fixes the guards were masking:
value=Noneandsky jobs cancel --pool None. Without a pool it now links to the unfiltered jobs page and suggestssky jobs cancel <job-ids>.Changes
sky/client/cli/command.py: remove the CLI guard; branch the multi-job summary hints on pool presence; add a non-pool multi-job notice.sky/jobs/client/sdk.py: remove the client-side guard; report job count in the confirmation prompt regardless of pool.sky/jobs/server/core.py: remove the server-side guard.tests/unit_tests/test_sky/test_cli_launch_validation.py: tests that--num-jobswithout--poolis accepted and reaches launch withpool=None, and that the multi-job output contains no pool-specific hints when there is no pool.Test plan
pytest tests/unit_tests/test_sky/test_cli_launch_validation.py— passes.sky jobs launch --num-jobs 3 'echo hi'(no--pool) against a remote API server creates 3 managed jobs, each on its own cluster, with distinctSKYPILOT_JOB_RANK(0–2) andSKYPILOT_NUM_JOBS=3; the summary shows the unfiltered "Show all jobs" link and asky jobs cancel <ids>hint.