Skip to content

Commit 43f7d84

Browse files
committed
Flux: Shut Down 2min to Walltime
1 parent b2da874 commit 43f7d84

File tree

4 files changed

+20
-5
lines changed

4 files changed

+20
-5
lines changed

Docs/source/install/hpc/tuolumne.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,9 @@ The batch script below can be used to run a WarpX simulation on 1 node with 4 AP
271271
Replace descriptions between chevrons ``<>`` by relevant values, for instance ``<input file>`` could be ``plasma_mirror_inputs``.
272272
WarpX runs with one MPI rank per GPU and uses 21 (of 24) CPU cores (3 are reserved for the system).
273273

274+
The batch script below also :ref:`sends WarpX a signal <running-cpp-parameters-signal>` when the simulations gets close to the walltime of the job, to shut down cleanly.
275+
Adjust the ``FLUX_WT_SIG`` and ``WARPX_WT`` to modify or disable this behavior as needed.
276+
274277
.. tab-set::
275278

276279
.. tab-item:: GPU

Docs/source/usage/parameters.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,8 @@ Overall simulation parameters
354354
If set, the environment variable ``OMP_NUM_THREADS`` takes precedence over ``system`` and ``nosmt``, but not over integer numbers set in this option.
355355

356356

357+
.. _running-cpp-parameters-signal:
358+
357359
Signal Handling
358360
^^^^^^^^^^^^^^^
359361

Tools/machines/tuolumne-llnl/tuolumne_cpu.flux

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@
2626
EXE="./warpx.2d"
2727
INPUTS="./inputs_hist_10.input"
2828

29+
# clean shutdown close to walltime (or checkpoint)
30+
# https://warpx.readthedocs.io/en/latest/usage/parameters.html#signal-handling
31+
FLUX_WT_SIG="--signal=SIGUSR1@120s"
32+
WARPX_WT="warpx.break_signals=USR1"
33+
2934
# enviroment setup
3035
if [[ -z "${MY_PROFILE}" ]]; then
3136
echo "WARNING: FORGOT TO"
@@ -49,7 +54,7 @@ export OMP_NUM_THREADS=21
4954

5055
# start MPI parallel processes
5156
NNODES=$(flux resource list -s up -no {nnodes})
52-
flux run --exclusive --nodes=${NNODES} \
53-
--tasks-per-node=4 \
54-
${EXE} ${INPUTS} \
57+
flux run ${FLUX_WT_SIG} --exclusive --nodes=${NNODES} \
58+
--tasks-per-node=4 \
59+
${EXE} ${INPUTS} ${WARPX_WT} \
5560
> output.txt

Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@
2626
EXE="./warpx.2d"
2727
INPUTS="./inputs_hist_10.input"
2828

29+
# clean shutdown close to walltime (or checkpoint)
30+
# https://warpx.readthedocs.io/en/latest/usage/parameters.html#signal-handling
31+
FLUX_WT_SIG="--signal=SIGUSR1@120s"
32+
WARPX_WT="warpx.break_signals=USR1"
33+
2934
# enviroment setup
3035
if [[ -z "${MY_PROFILE}" ]]; then
3136
echo "WARNING: FORGOT TO"
@@ -52,8 +57,8 @@ GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1"
5257

5358
# start MPI parallel processes
5459
NNODES=$(flux resource list -s up -no {nnodes})
55-
flux run --exclusive --nodes=${NNODES} \
60+
flux run ${FLUX_WT_SIG} --exclusive --nodes=${NNODES} \
5661
--tasks-per-node=4 \
5762
${EXE} ${INPUTS} \
58-
${GPU_AWARE_MPI} \
63+
${GPU_AWARE_MPI} ${WARPX_WT} \
5964
> output.txt

0 commit comments

Comments
 (0)