cluv — sync UV-based Python projects across HPC clusters.
In early development. Commands are functional, but expect bugs or missing features.
- Python >= 3.13
- UV
- SSH access configured for each cluster in
~/.ssh/config(runcluv loginto open ControlMaster sessions) - A GitHub repository with your project
Install as a UV tool:
uv tool install git+https://github.com/mila-iqia/cluvThen you can run cluv directly as a command:
cluv init
cluv login mila
cluv sync mila
cluv submit mila job.sh- Initialize your project with:
cluv init
- Establish SSH connections to all configured clusters:
cluv login
- Sync your project to all clusters and run
uv syncon each:cluv sync
See the examples folder for sample projects using cluv. Each example includes a README with instructions specific to that project.
Add a [tool.cluv] section to the pyproject.toml of your project. cluv init generates a default config, or you can write it by hand.
See the config at the project root for an example, or refer to the schema below.
| Field | Type | Description |
|---|---|---|
clusters |
table | Per-cluster settings, keyed by SSH hostname from ~/.ssh/config. |
env |
table | Global environment variables applied to all clusters. |
results_path |
string | Path relative to the project root for storing results. cluv sync rsyncs that directory back from each remote cluster. |
Environment variables for a specific cluster. Values here are merged on top of [tool.cluv.env] when submitting.
Environment variables can be set at multiple levels when submitting jobs, with the following precedence (highest to lowest):
- Command-line arguments to
cluv submit. - Cluster-specific variables in
[tool.cluv.clusters.<name>.env] - Global variables in
[tool.cluv.env] - SBATCH directives inside the job script (e.g.
#SBATCH --export=VAR=value) - Default values from the cluster (e.g.
SBATCH_PARTITION)
Here's an example pyproject.toml with cluv configuration for three clusters, and some global and cluster-specific environment variables:
[tool.cluv]
results_path = "logs"
[tool.cluv.env]
SBATCH_TIME = "3:00:00"
WANDB_MODE = "offline"
[tool.cluv.clusters.mila]
env = { WANDB_MODE="online", SBATCH_PARTITION="long" }
[tool.cluv.clusters.narval]
[tool.cluv.clusters.tamia]Initialize the current directory as a cluv project. Must be run from inside your $HOME directory.
cluv init
Default project structure after cluv init:
my_project/
├── README.md
├── logs -> $SCRATCH/logs/my_project # symlink to $SCRATCH
├── pyproject.toml # includes [tool.cluv] config
├── scripts/
│ ├── job.sh # Slurm job script template
│ └── safe_job.sh # Slurm job script template (copies .venv and prior results)
└── src/
└── my_project/
└── __init__.py
Open SSH ControlMaster connections to all configured clusters. Run this before any command that requires a live connection.
cluv login [<cluster> ...]
Push local git changes, then on each cluster: clone or fetch the repo, check out the current branch, and run uv sync. Optionally rsyncs results back if results_path is set in the config.
cluv sync [<cluster> ...]
Display an overview of each cluster: GPU availability, running/queued jobs, estimated queue wait, GPU utilisation, and disk usage. Falls back to mock data if no active connections exist.
cluv status [<cluster> ...]
Submit a SLURM job on a remote cluster.
cluv submit <cluster> <job.sh> [<sbatch-flags> ...] [-- <program-args> ...]
For example:
cluv submit rorqual script/job.sh --time=00:10:00 -- python main.pySync the project to a cluster, then run a command there with uv run.
cluv run <cluster> <command> [<args> ...]