Skip to content

YodaEmbedding/easy-slurm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Easy Slurm

License: MIT PyPI

Easily manage and submit robust jobs to Slurm using Python and Bash.

Features

  • Freezes source code by copying to separate $JOB_DIR.
  • Auto-submits another job if current job times out.
  • Exposes hooks for custom bash code: setup/setup_resume, on_run/on_run_resume, and teardown.
  • Format job names using parameters from config files.
  • Interactive jobs supported for easy debugging.

Installation

pip install easy-slurm

Usage

Easy Slurm provides a CLI / YAML interface, as well as a Python interface.

Python API

To submit a job, simply fill in the various parameters shown in the example below.

import easy_slurm

easy_slurm.submit_job(
    job_dir="$HOME/jobs/{date}-{job_name}",
    src=["./src", "./assets"],
    setup="""
        virtualenv "$SLURM_TMPDIR/env"
        source "$SLURM_TMPDIR/env/bin/activate"
        pip install -r "$SLURM_TMPDIR/src/requirements.txt"
    """,
    setup_resume="""
        # Runs only on subsequent runs. Call setup and do anything else needed.
        setup
    """,
    on_run="cd src && python main.py",
    on_run_resume="cd src && python main.py --resume",
    teardown="""
        # Do any cleanup tasks here.
    """,
    sbatch_options={
        "job-name": "example-simple",
        "account": "your-username",
        "time": "3:00:00",
        "nodes": "1",
    },
    resubmit_limit=64,  # Automatic resubmission limit.
)

All job files will be kept in the job_dir directory. Provide directory paths to src -- these will be archived and copied to the job_dir directory. Also provide Bash code in the hooks, which will be run in the following order:

First run: Subsequent runs:
setup setup_resume
on_run on_run_resume
teardown teardown

Full examples can be found here, including a simple example to run "training epochs" on a cluster.

CLI / YAML Interface

Jobs can also be fully configured using YAML files. See examples/simple_yaml.

job.yaml
job_dir: "$HOME/jobs/{date}-{job_name}"
src: ["./src", "./assets"]
setup: |
  virtualenv "$SLURM_TMPDIR/env"
  source "$SLURM_TMPDIR/env/bin/activate"
  pip install -r "$SLURM_TMPDIR/src/requirements.txt"
setup_resume: |
  # Runs only on subsequent runs. Call setup and do anything else needed.
  setup
on_run: "cd src && python main.py"
on_run_resume: "cd src && python main.py --resume"
teardown: |
  # Do any cleanup tasks here.
sbatch_options:
  job-name: "example-simple"
  account: "your-username"
  time: "3:00:00"
  nodes: 1
resubmit_limit: 64  # Automatic resubmission limit.

Then submit the job using:

easy-slurm --job="job.yaml"

One can override the parameters in the YAML file using command-line arguments. For example:

easy-slurm --job="job.yaml" --src='["./src", "./assets", "./extra"]'

Formatting

One useful feature is formatting paths using custom template strings:

easy_slurm.submit_job(
    job_dir="$HOME/jobs/{date:%Y-%m-%d_%H-%M-%S_%3f}-{job_name}",
)

The job names can be formatted using a config dictionary:

easy_slurm.submit_job(
    sbatch_options={
        "job-name": "bs={hp.batch_size:04},lr={hp.lr:.1e}",
        # Equivalent to:
        # "job-name": "bs=0032,lr=1.0e-02"
    },
    config={"hp": {"batch_size": 32, "lr": 1e-2}},
)

This helps in automatically creating descriptive, human-readable job names.

For the CLI / YAML interface, the same can be achieved using the --config argument:

easy-slurm --job="job.yaml" --config="config.yaml"

See the documentation for more information and examples.

About

Easily manage and submit robust jobs to Slurm using Python and Bash.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published