Skip to content

Commit 2c31075

Browse files
committed
Add missing explanation for safe_sbatch script
Signed-off-by: Fabrice Normandin <[email protected]>
1 parent 360a1b7 commit 2c31075

File tree

2 files changed

+101
-52
lines changed

2 files changed

+101
-52
lines changed

docs/examples/LLMs/accelerate_example/README.rst

Lines changed: 82 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,61 @@ Click here to see `the code for this example
6666
--num_machines $SLURM_NNODES --num_processes $SLURM_NTASKS \
6767
main.py $@"
6868
69+
**pyproject.toml**
70+
71+
.. code:: toml
72+
73+
[project]
74+
name = "accelerate_example"
75+
version = "0.1.0"
76+
description = "Add your description here"
77+
readme = "README.md"
78+
requires-python = ">=3.12"
79+
dependencies = [
80+
"accelerate>=1.7.0",
81+
"datasets>=3.6.0",
82+
"evaluate>=0.4.4",
83+
"scikit-learn>=1.7.0",
84+
"simple-parsing>=0.1.7",
85+
"transformers>=4.52.4",
86+
]
87+
88+
89+
[tool.uv]
90+
python-preference = "system"
91+
92+
## From https://docs.astral.sh/uv/reference/settings/#index-strategy:
93+
## "Only use results from the first index that returns a match for a given package name."
94+
## In other words: only get the package from PyPI if there isn't a version of it in the DRAC wheelhouse.
95+
# index-strategy = "first-index"
96+
97+
## "Search for every package name across all indexes, exhausting the versions from the first index before
98+
## moving on to the next"
99+
## In other words: Only get the package from PyPI if the requested version is higher than the version
100+
## in the DRAC wheelhouse.
101+
# index-strategy = "unsafe-first-match"
102+
103+
## "Search for every package name across all indexes, preferring the "best" version found.
104+
## If a package version is in multiple indexes, only look at the entry for the first index."
105+
## In other words: Consider all versions of the package DRAC + PyPI, and use the version that best matches
106+
## the requested version. In a tie, choose the DRAC wheel.
107+
index-strategy = "unsafe-best-match"
108+
109+
[[tool.uv.index]]
110+
name = "drac-gentoo2023-x86-64-v3"
111+
url = "/cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/x86-64-v3"
112+
format = "flat"
113+
114+
[[tool.uv.index]]
115+
name = "drac-gentoo2023-generic"
116+
url = "/cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/generic"
117+
format = "flat"
118+
119+
[[tool.uv.index]]
120+
name = "drac-generic"
121+
url = "/cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic"
122+
format = "flat"
123+
69124
**main.py**
70125

71126
.. code:: python
@@ -644,60 +699,39 @@ Click here to see `the code for this example
644699
if __name__ == "__main__":
645700
main()
646701
647-
**pyproject.toml**
648-
649-
.. code:: toml
650-
651-
[project]
652-
name = "accelerate_example"
653-
version = "0.1.0"
654-
description = "Add your description here"
655-
readme = "README.md"
656-
requires-python = ">=3.12"
657-
dependencies = [
658-
"accelerate>=1.7.0",
659-
"datasets>=3.6.0",
660-
"evaluate>=0.4.4",
661-
"scikit-learn>=1.7.0",
662-
"simple-parsing>=0.1.7",
663-
"transformers>=4.52.4",
664-
]
665-
666702
667-
[tool.uv]
668-
python-preference = "system"
703+
**safe_sbatch**
669704

670-
## From https://docs.astral.sh/uv/reference/settings/#index-strategy:
671-
## "Only use results from the first index that returns a match for a given package name."
672-
## In other words: only get the package from PyPI if there isn't a version of it in the DRAC wheelhouse.
673-
# index-strategy = "first-index"
705+
This example implements code checkpointing, so that jobs are executed with the code
706+
as it was at the time of job submission, even if the code is updated later.
707+
This is done by submitting the job with the `safe_sbatch` script instead of `sbatch`.
708+
Compared to `sbatch`, `safe_sbatch` will prevent submitting jobs if there are uncommitted
709+
changes in the code repository.
674710

675-
## "Search for every package name across all indexes, exhausting the versions from the first index before
676-
## moving on to the next"
677-
## In other words: Only get the package from PyPI if the requested version is higher than the version
678-
## in the DRAC wheelhouse.
679-
# index-strategy = "unsafe-first-match"
680711

681-
## "Search for every package name across all indexes, preferring the "best" version found.
682-
## If a package version is in multiple indexes, only look at the entry for the first index."
683-
## In other words: Consider all versions of the package DRAC + PyPI, and use the version that best matches
684-
## the requested version. In a tie, choose the DRAC wheel.
685-
index-strategy = "unsafe-best-match"
712+
.. code:: bash
686713
687-
[[tool.uv.index]]
688-
name = "drac-gentoo2023-x86-64-v3"
689-
url = "/cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/x86-64-v3"
690-
format = "flat"
714+
#!/bin/bash
715+
set -eof pipefail
716+
git_status=`git status --porcelain`
717+
# idea: Could add command-line arguments to control whether to add all changes and commit before sbatch.
718+
if [[ ! -z $git_status ]]; then
719+
echo "Your working directory is dirty! Please add and commit changes before continuing."
720+
exit 1
721+
fi;
722+
# This environment variable will be available in the job script.
723+
# It should be used to checkout the repo at this commit (in a different directory than the original).
724+
# For example:
725+
# ```
726+
# git clone "$repo" "$dest"
727+
# echo "Checking out commit $GIT_COMMIT"
728+
# cd "$dest"
729+
# git checkout $GIT_COMMIT
730+
# ```
731+
export GIT_COMMIT=`git rev-parse HEAD`
732+
exec sbatch "$@"
691733
692-
[[tool.uv.index]]
693-
name = "drac-gentoo2023-generic"
694-
url = "/cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/generic"
695-
format = "flat"
696734
697-
[[tool.uv.index]]
698-
name = "drac-generic"
699-
url = "/cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic"
700-
format = "flat"
701735
702736
**Running this example**
703737

@@ -712,7 +746,7 @@ Click here to see `the code for this example
712746
$ uv sync
713747
714748
715-
3. Launch the job:
749+
3. Launch the job, either with `sbatch` or the provided `safe_sbatch` script:
716750

717751
.. code-block:: bash
718752

docs/examples/LLMs/accelerate_example/index.rst

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,15 +23,30 @@ Click here to see `the code for this example
2323
.. literalinclude:: job.sh
2424
:language: bash
2525

26+
**pyproject.toml**
27+
28+
.. literalinclude:: pyproject.toml
29+
:language: toml
30+
2631
**main.py**
2732

2833
.. literalinclude:: main.py
2934
:language: python
3035

31-
**pyproject.toml**
3236

33-
.. literalinclude:: pyproject.toml
34-
:language: toml
37+
**safe_sbatch**
38+
39+
This example implements code checkpointing, so that jobs are executed with the code
40+
as it was at the time of job submission, even if the code is updated later.
41+
This is done by submitting the job with the `safe_sbatch` script instead of `sbatch`.
42+
Compared to `sbatch`, `safe_sbatch` will prevent submitting jobs if there are uncommitted
43+
changes in the code repository.
44+
45+
46+
.. literalinclude:: safe_sbatch
47+
:language: bash
48+
49+
3550

3651
**Running this example**
3752

@@ -46,7 +61,7 @@ Click here to see `the code for this example
4661
$ uv sync
4762
4863
49-
3. Launch the job:
64+
3. Launch the job, either with `sbatch` or the provided `safe_sbatch` script:
5065

5166
.. code-block:: bash
5267

0 commit comments

Comments
 (0)