Skip to content

Commit 48328f6

Browse files
authored
Merge pull request #539 from LLNL/docsrun
doc: alphabetize resource managers in run section
2 parents 5b42e05 + f749d71 commit 48328f6

File tree

1 file changed

+5
-6
lines changed

1 file changed

+5
-6
lines changed

doc/rst/users/run.rst

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,8 @@ For instance, the name :code:`1234.5` refers to step id 5 of job id 1234.
5353
On ALPS, each job step within an allocation has a unique id that can be obtained
5454
through :code:`apstat`.
5555

56-
Ignoring node failures
57-
----------------------
56+
Tolerating node failures
57+
------------------------
5858

5959
Before running an SCR job, it is recommended to configure the job allocation to withstand node failures.
6060
By default, most resource managers terminate the job allocation if a node fails,
@@ -65,12 +65,12 @@ one must specify the appropriate flags from the table below.
6565
SCR job allocation flags
6666

6767
================== ================================================================
68+
LSF batch script :code:`#BSUB -env "all, LSB_DJOB_COMMFAIL_ACTION=KILL_TASKS"`
69+
LSF interactive :code:`bsub -env "all, LSB_DJOB_COMMFAIL_ACTION=KILL_TASKS" ...`
6870
MOAB batch script :code:`#MSUB -l resfailpolicy=ignore`
6971
MOAB interactive :code:`qsub -I ... -l resfailpolicy=ignore`
7072
SLURM batch script :code:`#SBATCH --no-kill`
7173
SLURM interactive :code:`salloc --no-kill ...`
72-
LSF batch script :code:`#BSUB -env "all, LSB_DJOB_COMMFAIL_ACTION=KILL_TASKS"`
73-
LSF interactive :code:`bsub -env "all, LSB_DJOB_COMMFAIL_ACTION=KILL_TASKS" ...`
7474
================== ================================================================
7575

7676
The SCR wrapper script
@@ -120,9 +120,8 @@ An example SLURM batch script with :code:`scr_srun` is shown below
120120
.. code-block:: bash
121121
122122
#!/bin/bash
123-
#SBATCH --partition pbatch
124-
#SBATCH --nodes 66
125123
#SBATCH --no-kill
124+
#SBATCH --nodes 66
126125
127126
# above, tell SLURM to not kill the job allocation upon a node failure
128127
# also note that the job requested 2 spares -- it uses 64 nodes but allocated 66

0 commit comments

Comments
 (0)