|
| 1 | +# SLURM Launcher |
| 2 | + |
| 3 | +Use `cardio-tensor-slurm` to submit chunked orientation jobs as SLURM array tasks. |
| 4 | + |
| 5 | +## What It Does |
| 6 | + |
| 7 | +- Splits a dataset into chunks. |
| 8 | +- Generates a `.slurm` script. |
| 9 | +- Submits one or more SLURM arrays with `sbatch`. |
| 10 | +- Monitors output progress. |
| 11 | + |
| 12 | +The launcher calls `cardio-tensor` on each task with: |
| 13 | + |
| 14 | +```bash |
| 15 | +cardio-tensor <conf_file> --start_index <start> --end_index <end> |
| 16 | +``` |
| 17 | + |
| 18 | +## Basic Usage |
| 19 | + |
| 20 | +```bash |
| 21 | +cardio-tensor-slurm path/to/parameters.conf |
| 22 | +``` |
| 23 | + |
| 24 | +## Useful Options |
| 25 | + |
| 26 | +```bash |
| 27 | +cardio-tensor-slurm path/to/parameters.conf \ |
| 28 | + --start_index 0 \ # First slice index (inclusive) |
| 29 | + --end_index 5000 \ # Last slice index (exclusive) |
| 30 | + --chunk_size 100 \ # Slices processed per SLURM task |
| 31 | + --time_limit 4:00:00 \ # Wall-time limit for each task |
| 32 | + --cpus_per_task 8 \ # CPU cores requested per task |
| 33 | + --mem_gb 64 \ # Memory requested per task (GB) |
| 34 | + --array_parallel 50 \ # Max concurrent tasks in the SLURM array |
| 35 | + --partition nice \ # Optional partition/queue name |
| 36 | + --log_dir /path/to/logs \ # Where SLURM stdout/stderr logs are written |
| 37 | + --submit_dir /path/to/slurm_scripts # Where generated .slurm scripts are saved |
| 38 | +``` |
| 39 | + |
| 40 | +Other useful flags: |
| 41 | + |
| 42 | +- `--no_monitor`: submit jobs and return immediately (do not watch output progress). |
| 43 | +- `--dry_run`: generate the `.slurm` script and print `sbatch` command without submitting. |
| 44 | + |
| 45 | +Run help for the full list: |
| 46 | + |
| 47 | +```bash |
| 48 | +cardio-tensor-slurm -h |
| 49 | +``` |
| 50 | + |
| 51 | +## Troubleshooting |
| 52 | + |
| 53 | +- **`sbatch` fails** |
| 54 | + Verify partition/account/time/memory policy on your cluster. |
| 55 | + |
| 56 | +- **Job fails before running `cardio-tensor`** |
| 57 | + Check logs in `OUTPUT_PATH/slurm/log/`. |
| 58 | + |
| 59 | +- **No progress in monitor** |
| 60 | + Confirm output folder permissions and that tasks can write outputs. |
| 61 | + |
| 62 | +- **Want to debug without submitting jobs** |
| 63 | + Use `--dry_run`. |
0 commit comments