Skip to content

Add cluv submit command for SLURM job submission#10

Merged
lebrice merged 33 commits into
masterfrom
feat/submit-command
Apr 2, 2026
Merged

Add cluv submit command for SLURM job submission#10
lebrice merged 33 commits into
masterfrom
feat/submit-command

Conversation

@lebrice
Copy link
Copy Markdown
Contributor

@lebrice lebrice commented Mar 28, 2026

Summary

  • Adds `cluv submit <job.sh> [sbatch-flags...] [-- program-args...]` — a clean replacement for the manual `safe_sbatch` pattern.
  • Enforces a clean git state, syncs the project, injects `GIT_COMMIT`, then runs `sbatch` on the remote with configurable `SBATCH_*` env vars.
  • Job script is a required positional argument — not configurable via `pyproject.toml`.
  • Arguments before `--` are forwarded as sbatch flags; arguments after `--` are passed to the job script.
  • Extends `[tool.cluv]` config with global SLURM defaults (`[tool.cluv.slurm]`) and per-cluster overrides (`[tool.cluv.clusters.]`).
  • Migrates `clusters = [...]` list to `[tool.cluv.clusters.*]` sub-tables; old list format still works for backward compat.

Config shape:

[tool.cluv.slurm]
SBATCH_TIME = "3:00:00"

[tool.cluv.clusters.mila]
SBATCH_PARTITION = "long"

[tool.cluv.clusters.rorqual]
SBATCH_ACCOUNT = "def-bengioy"
SBATCH_PARTITION = "main"

Usage:

cluv submit rorqual scripts/job.sh
cluv submit rorqual scripts/job.sh --partition=gpu --mem=40G
cluv submit rorqual scripts/job.sh --partition=gpu -- python train.py --lr=0.001

Test plan

  • `uv run pytest tests/test_config.py tests/test_submit.py -v` — all unit tests pass
  • `uv run pytest -m integration -v` — `test_submit_rorqual_builds_correct_command` passes with an active rorqual connection
  • `uv run cluv submit --help` — verify help output shows new positional syntax
  • `uv run cluv status` — verify cluster list still loads correctly from new config format

🤖 Generated with Claude Code

lebrice and others added 8 commits March 31, 2026 14:44
- New \`cluv submit <cluster> <command>\` command that enforces a clean git
  state, injects \`GIT_COMMIT\`, applies \`SBATCH_*\` env vars from config,
  syncs the project, and runs sbatch on the remote cluster.
- Extend \`CluvConfig\` with \`SubmitConfig\`, global \`slurm\` dict, and
  per-cluster \`cluster_configs\` dict.
- Migrate \`[tool.cluv]\` clusters from a flat list to \`[tool.cluv.clusters.*]\`
  sub-tables (backward compat with old list format preserved).
- Global SBATCH_* defaults go in \`[tool.cluv.slurm]\`; per-cluster overrides
  go in \`[tool.cluv.clusters.<name>]\`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_config.py: 14 unit tests for new/old cluster format, global slurm
  vars, submit config, and edge cases.
- test_submit.py: 9 async unit tests (fully mocked) covering dirty-tree
  abort, missing job script, env var merging, --no-sync, --job-script
  override, and GIT_COMMIT injection.
- test_integration.py: smoke test for submit reaching rorqual over SSH.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add install_scripts() to sync_task_function(): after cloning the project,
all executable files in scripts/ are symlinked into ~/.local/bin/ on each
cluster (with .sh extension stripped). Uses ln -sf so re-runs are safe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix test_submit_rorqual_builds_correct_command: with no_sync=False,
  submit() uses the mocked sync result rather than calling
  RemoteV2.connect() internally (which would return a different object).
  Also switch to direct attribute assignment to avoid instance patching
  issues with RemoteV2.
- Remove partiton-stats_output.txt (stale sample output file).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Untracked files don't affect job reproducibility. Only changes to
tracked files (modified, staged, deleted) should block submission.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sbatch vs program args

- `cluv submit <cluster> <job.sh> [--no-sync] [sbatch-flags...] [-- program-args...]`
- Remove SubmitConfig and job_script from pyproject.toml config
- Replace REMAINDER `command` + `--job-script` with required `job_script` positional
  and `rest` REMAINDER split on `--` inside submit()
- Update all tests accordingly; replace obsolete test_no_job_script_aborts and
  test_cli_job_script_overrides_config with sbatch flag forwarding tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…iption

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lebrice lebrice force-pushed the feat/submit-command branch from 95f1e99 to 87e2b86 Compare March 31, 2026 18:44
lebrice and others added 3 commits March 31, 2026 14:47
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_submit_rorqual_real wraps run_async to capture sbatch output and
asserts "Submitted batch job" — requires active rorqual connection and
clean git tree, mocks sync so the project needn't be freshly pushed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 50.71770% with 103 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.78%. Comparing base (73fa4c4) to head (4412a59).

Files with missing lines Patch % Lines
cluv/cli/submit.py 28.57% 30 Missing ⚠️
cluv/remote.py 67.77% 29 Missing ⚠️
cluv/cli/sync.py 32.25% 21 Missing ⚠️
cluv/__main__.py 0.00% 16 Missing ⚠️
cluv/cli/login.py 57.14% 3 Missing ⚠️
cluv/cli/status.py 40.00% 3 Missing ⚠️
cluv/cli/run.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #10      +/-   ##
==========================================
+ Coverage   40.00%   40.78%   +0.78%     
==========================================
  Files          11       13       +2     
  Lines         725      868     +143     
==========================================
+ Hits          290      354      +64     
- Misses        435      514      +79     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lebrice and others added 17 commits March 31, 2026 15:27
argparse consumes '--' before REMAINDER sees it, so program args were
landing in sbatch_args. Fix by splitting argv on '--' in main() before
parse_args(), injecting program_args into args_dict post-parse.

Also renames rest → sbatch_args throughout for clarity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
lebrice added 5 commits April 1, 2026 18:07
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
@lebrice lebrice merged commit de05e5f into master Apr 2, 2026
5 checks passed
@lebrice lebrice deleted the feat/submit-command branch April 2, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants