From 9353bee469e98e6f72cdc4121a82ba44667f9e01 Mon Sep 17 00:00:00 2001 From: Axel Huebl Date: Wed, 15 Oct 2025 11:10:06 -0700 Subject: [PATCH 1/7] Tuolumne (LLNL): CPU-Only Add CPU-only instructions for Tuolumne at LLNL. This is mostly for development, because this is not using the GPU-part of the APU. --- Docs/source/install/hpc/tuolumne.rst | 186 +++++++++++++----- .../tuolumne-llnl/install_cpu_dependencies.sh | 176 +++++++++++++++++ .../tuolumne_cpu_warpx.profile.example | 63 ++++++ 3 files changed, 372 insertions(+), 53 deletions(-) create mode 100644 Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh create mode 100644 Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example diff --git a/Docs/source/install/hpc/tuolumne.rst b/Docs/source/install/hpc/tuolumne.rst index 55df6287024..3cf68a0c42b 100644 --- a/Docs/source/install/hpc/tuolumne.rst +++ b/Docs/source/install/hpc/tuolumne.rst @@ -44,73 +44,132 @@ Use the following commands to download the WarpX source code: git clone https://github.com/BLAST-WarpX/warpx.git /p/lustre5/${USER}/tuolumne/src/warpx -We use system software modules, add environment hints and further dependencies via the file ``$HOME/tuolumne_mi300a_warpx.profile``. -Create it now: +On Tuolumne, we usually accelerate all computations with the GPU cores of the MI300A APU. +For development purposes, you can also limit yourself to the CPU cores of the MI300A. -.. code-block:: bash +.. tab-set:: - cp /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example $HOME/tuolumne_mi300a_warpx.profile + .. tab-item:: GPU -.. dropdown:: Script Details - :color: light - :icon: info - :animate: fade-in-slide-down + We use system software modules, add environment hints and further dependencies via the file ``$HOME/tuolumne_mi300a_warpx.profile``. + Create it now: - .. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example - :language: bash + .. code-block:: bash -Edit the 2nd line of this script, which sets the ``export proj=""`` variable. -**Currently, this is unused and can be kept empty.** -Once project allocation becomes required, e.g., if you are member of the project ``abcde``, then run ``vi $HOME/tuolumne_mi300a_warpx.profile``. -Enter the edit mode by typing ``i`` and edit line 2 to read: + cp /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example $HOME/tuolumne_mi300a_warpx.profile -.. code-block:: bash + .. dropdown:: Script Details + :color: light + :icon: info + :animate: fade-in-slide-down + + .. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example + :language: bash + + Edit the 2nd line of this script, which sets the ``export proj=""`` variable. + **Currently, this is unused and can be kept empty.** + Once project allocation becomes required, e.g., if you are member of the project ``abcde``, then run ``vi $HOME/tuolumne_mi300a_warpx.profile``. + Enter the edit mode by typing ``i`` and edit line 2 to read: + + .. code-block:: bash + + export proj="abcde" + + Exit the ``vi`` editor with ``Esc`` and then type ``:wq`` (write & quit). + + .. important:: + + Now, and as the first step on future logins to Tuolumne, activate these environment settings: + + .. code-block:: bash + + source $HOME/tuolumne_mi300a_warpx.profile + + Finally, since Tuolumne does not yet provide software modules for some of our dependencies, install them once: + + + .. code-block:: bash + + bash /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh + source /p/lustre5/${USER}/tuolumne/warpx/mi300a/venvs/warpx-tuolumne-mi300a/bin/activate + + .. dropdown:: Script Details + :color: light + :icon: info + :animate: fade-in-slide-down + + .. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh + :language: bash + + .. dropdown:: AI/ML Dependencies (Optional) + :animate: fade-in-slide-down + + If you plan to run AI/ML workflows depending on PyTorch et al., run the next step as well. + This will take a while and should be skipped if not needed. - export proj="abcde" + .. code-block:: bash -Exit the ``vi`` editor with ``Esc`` and then type ``:wq`` (write & quit). + bash /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/install_mi300a_ml.sh -.. important:: + .. dropdown:: Script Details + :color: light + :icon: info + :animate: fade-in-slide-down - Now, and as the first step on future logins to Tuolumne, activate these environment settings: + .. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/install_mi300a_ml.sh + :language: bash - .. code-block:: bash + .. tab-item:: CPU - source $HOME/tuolumne_mi300a_warpx.profile + We use system software modules, add environment hints and further dependencies via the file ``$HOME/tuolumne_cpu_warpx.profile``. + Create it now: -Finally, since Tuolumne does not yet provide software modules for some of our dependencies, install them once: + .. code-block:: bash + cp /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example $HOME/tuolumne_cpu_warpx.profile - .. code-block:: bash + .. dropdown:: Script Details + :color: light + :icon: info + :animate: fade-in-slide-down - bash /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh - source /p/lustre5/${USER}/tuolumne/warpx/mi300a/venvs/warpx-tuolumne-mi300a/bin/activate + .. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example + :language: bash - .. dropdown:: Script Details - :color: light - :icon: info - :animate: fade-in-slide-down + Edit the 2nd line of this script, which sets the ``export proj=""`` variable. + **Currently, this is unused and can be kept empty.** + Once project allocation becomes required, e.g., if you are member of the project ``abcde``, then run ``vi $HOME/tuolumne_cpu_warpx.profile``. + Enter the edit mode by typing ``i`` and edit line 2 to read: - .. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh - :language: bash + .. code-block:: bash - .. dropdown:: AI/ML Dependencies (Optional) - :animate: fade-in-slide-down + export proj="abcde" - If you plan to run AI/ML workflows depending on PyTorch et al., run the next step as well. - This will take a while and should be skipped if not needed. + Exit the ``vi`` editor with ``Esc`` and then type ``:wq`` (write & quit). - .. code-block:: bash + .. important:: - bash /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/install_mi300a_ml.sh + Now, and as the first step on future logins to Tuolumne, activate these environment settings: - .. dropdown:: Script Details - :color: light - :icon: info - :animate: fade-in-slide-down + .. code-block:: bash - .. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/install_mi300a_ml.sh - :language: bash + source $HOME/tuolumne_cpu_warpx.profile + + Finally, since Tuolumne does not yet provide software modules for some of our dependencies, install them once: + + + .. code-block:: bash + + bash /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh + source /p/lustre5/${USER}/tuolumne/warpx/cpu/venvs/warpx-tuolumne-cpu/bin/activate + + .. dropdown:: Script Details + :color: light + :icon: info + :animate: fade-in-slide-down + + .. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh + :language: bash .. _building-tuolumne-compilation: @@ -120,20 +179,41 @@ Compilation Use the following :ref:`cmake commands ` to compile the application executable: -.. code-block:: bash +.. tab-set:: - cd /p/lustre5/${USER}/tuolumne/src/warpx + .. tab-item:: GPU - cmake --fresh -S . -B build_tuolumne -DWarpX_COMPUTE=HIP -DWarpX_FFT=ON -DWarpX_DIMS="1;2;RZ;3" - cmake --build build_tuolumne -j 24 + .. code-block:: bash -The WarpX application executables are now in ``/p/lustre5/${USER}/tuolumne/src/warpx/build_tuolumne/bin/``. -Additionally, the following commands will install WarpX as a Python module: + cd /p/lustre5/${USER}/tuolumne/src/warpx -.. code-block:: bash + cmake --fresh -S . -B build_tuolumne -DWarpX_COMPUTE=HIP -DWarpX_FFT=ON -DWarpX_DIMS="1;2;RZ;3" + cmake --build build_tuolumne -j 24 + + The WarpX application executables are now in ``/p/lustre5/${USER}/tuolumne/src/warpx/build_tuolumne/bin/``. + Additionally, the following commands will install WarpX as a Python module: + + .. code-block:: bash + + cmake --fresh -S . -B build_tuolumne_py -DWarpX_COMPUTE=HIP -DWarpX_FFT=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3" + cmake --build build_tuolumne_py -j 24 --target pip_install + + .. tab-item:: CPU + + .. code-block:: bash + + cd /p/lustre5/${USER}/tuolumne/src/warpx + + cmake --fresh -S . -B build_tuolumne_cpu -DWarpX_COMPUTE=OMP -DWarpX_FFT=ON -DWarpX_DIMS="1;2;RZ;3" + cmake --build build_tuolumne_cpu -j 24 + + The WarpX application executables are now in ``/p/lustre5/${USER}/tuolumne/src/warpx/build_tuolumne_cpu/bin/``. + Additionally, the following commands will install WarpX as a Python module: + + .. code-block:: bash - cmake --fresh -S . -B build_tuolumne_py -DWarpX_COMPUTE=HIP -DWarpX_FFT=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3" - cmake --build build_tuolumne_py -j 24 --target pip_install + cmake --fresh -S . -B build_tuolumne_cpu_py -DWarpX_COMPUTE=OMP -DWarpX_FFT=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3" + cmake --build build_tuolumne_cpu_py -j 24 --target pip_install Now, you can :ref:`submit tuolumne compute jobs ` for WarpX :ref:`Python (PICMI) scripts ` (:ref:`example scripts `). Or, you can use the WarpX executables to submit tuolumne jobs (:ref:`example inputs `). @@ -183,7 +263,7 @@ MI300A APUs (128GB) `Each compute node `__ is divided into 4 sockets, each with: -* 1 MI300A GPU, +* 1 MI300A APU (incl. 1 GPU), * 21 available user CPU cores, with 3 cores reserved for the OS (2 hardware threads per core) * 128GB HBM3 memory (a single NUMA domain) diff --git a/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh b/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh new file mode 100644 index 00000000000..2a7a68c2be6 --- /dev/null +++ b/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh @@ -0,0 +1,176 @@ +#!/bin/bash +# +# Copyright 2024 The WarpX Community +# +# This file is part of WarpX. +# +# Author: Axel Huebl +# License: BSD-3-Clause-LBNL + +# Exit on first error encountered ############################################# +# +set -eu -o pipefail + + +# Check: ###################################################################### +# +# Was tuolumne_cpu_warpx.profile sourced and configured correctly? +# early access: not yet used! +#if [ -z ${proj-} ]; then echo "WARNING: The 'proj' variable is not yet set in your tuolumne_cpu_warpx.profile file! Please edit its line 2 to continue!"; exit 1; fi + + +# Remove old dependencies ##################################################### +# +SRC_DIR="/p/lustre5/${USER}/tuolumne/src" +SW_DIR="/p/lustre5/${USER}/tuolumne/warpx/cpu" +rm -rf ${SW_DIR} +mkdir -p ${SW_DIR} + +# remove common user mistakes in python, located in .local instead of a venv +python3 -m pip uninstall -qq -y pywarpx +python3 -m pip uninstall -qq -y warpx +python3 -m pip uninstall -qqq -y mpi4py 2>/dev/null || true + + +# General extra dependencies ################################################## +# + +# tmpfs build directory: avoids issues often seen with $HOME and is faster +build_dir=$(mktemp -d) +build_procs=24 + +# C-Blosc2 (I/O compression) +if [ -d ${SRC_DIR}/c-blosc2 ] +then + cd ${SRC_DIR}/c-blosc2 + git fetch --prune + git checkout v2.15.1 + cd - +else + git clone -b v2.15.1 https://github.com/Blosc/c-blosc2.git ${SRC_DIR}/c-blosc2 +fi +cmake \ + --fresh \ + -S ${SRC_DIR}/c-blosc2 \ + -B ${build_dir}/c-blosc2-build \ + -DBUILD_TESTS=OFF \ + -DBUILD_BENCHMARKS=OFF \ + -DBUILD_EXAMPLES=OFF \ + -DBUILD_FUZZERS=OFF \ + -DBUILD_STATIC=OFF \ + -DDEACTIVATE_AVX2=OFF \ + -DDEACTIVATE_AVX512=OFF \ + -DWITH_SANITIZER=OFF \ + -DCMAKE_INSTALL_PREFIX=${SW_DIR}/c-blosc-2.15.1 +cmake \ + --build ${build_dir}/c-blosc2-build \ + --target install \ + --parallel ${build_procs} +rm -rf ${build_dir}/c-blosc2-build + +# ADIOS2 +if [ -d ${SRC_DIR}/adios2 ] +then + cd ${SRC_DIR}/adios2 + git fetch --prune + git checkout v2.10.2 + cd - +else + git clone -b v2.10.2 https://github.com/ornladios/ADIOS2.git ${SRC_DIR}/adios2 +fi +cmake \ + --fresh \ + -S ${SRC_DIR}/adios2 \ + -B ${build_dir}/adios2-build \ + -DADIOS2_USE_Blosc2=ON \ + -DADIOS2_USE_Campaign=OFF \ + -DADIOS2_USE_Fortran=OFF \ + -DADIOS2_USE_Python=OFF \ + -DADIOS2_USE_ZeroMQ=OFF \ + -DCMAKE_INSTALL_PREFIX=${SW_DIR}/adios2-2.10.2 +cmake \ + --build ${build_dir}/adios2-build \ + --target install \ + --parallel ${build_procs} +rm -rf ${build_dir}/adios2-build + +# BLAS++ (for PSATD+RZ) +if [ -d ${SRC_DIR}/blaspp ] +then + cd ${SRC_DIR}/blaspp + git fetch --prune + git checkout v2024.05.31 + cd - +else + git clone -b v2024.05.31 https://github.com/icl-utk-edu/blaspp.git ${SRC_DIR}/blaspp +fi +cmake \ + --fresh \ + -S ${SRC_DIR}/blaspp \ + -B ${build_dir}/blaspp-tuolumne-cpu-build \ + -Duse_openmp=ON \ + -Dgpu_backend=OFF \ + -DCMAKE_CXX_STANDARD=17 \ + -DCMAKE_INSTALL_PREFIX=${SW_DIR}/blaspp-2024.05.31 +cmake \ + --build ${build_dir}/blaspp-tuolumne-cpu-build \ + --target install \ + --parallel ${build_procs} +rm -rf ${build_dir}/blaspp-tuolumne-cpu-build + +# LAPACK++ (for PSATD+RZ) +if [ -d ${SRC_DIR}/lapackpp ] +then + cd ${SRC_DIR}/lapackpp + git fetch --prune + git checkout v2024.05.31 + cd - +else + git clone -b v2024.05.31 https://github.com/icl-utk-edu/lapackpp.git ${SRC_DIR}/lapackpp +fi +cmake \ + --fresh \ + -S ${SRC_DIR}/lapackpp \ + -B ${build_dir}/lapackpp-tuolumne-cpu-build \ + -DCMAKE_CXX_STANDARD=17 \ + -Dgpu_backend=OFF \ + -Dbuild_tests=OFF \ + -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON \ + -DCMAKE_INSTALL_PREFIX=${SW_DIR}/lapackpp-2024.05.31 +cmake \ + --build ${build_dir}/lapackpp-tuolumne-cpu-build \ + --target install \ + --parallel ${build_procs} +rm -rf ${build_dir}/lapackpp-tuolumne-cpu-build + +# Python ###################################################################### +# +# sometimes, the Tuolumne PIP Index is down +export PIP_EXTRA_INDEX_URL="https://pypi.org/simple" + +python3 -m pip install --upgrade pip +# python3 -m pip cache purge || true # Cache disabled on system +rm -rf ${SW_DIR}/venvs/warpx-tuolumne-cpu +python3 -m venv ${SW_DIR}/venvs/warpx-tuolumne-cpu +source ${SW_DIR}/venvs/warpx-tuolumne-cpu/bin/activate +python3 -m pip install --upgrade pip +python3 -m pip install --upgrade build +python3 -m pip install --upgrade packaging +python3 -m pip install --upgrade wheel +python3 -m pip install --upgrade setuptools[core] +python3 -m pip install --upgrade "cython>=3.0" +python3 -m pip install --upgrade numpy +python3 -m pip install --upgrade pandas +python3 -m pip install --upgrade scipy +python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py +python3 -m pip install --upgrade openpmd-api +python3 -m pip install --upgrade openpmd-viewer +python3 -m pip install --upgrade matplotlib +python3 -m pip install --upgrade yt +# install or update WarpX dependencies such as picmistandard +python3 -m pip install --upgrade -r ${SRC_DIR}/warpx/requirements.txt + +# for ML dependencies, see install_cpu_ml.sh (TODO) + +# remove build temporary directory +rm -rf ${build_dir} diff --git a/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example b/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example new file mode 100644 index 00000000000..bf3ff4d8dd6 --- /dev/null +++ b/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example @@ -0,0 +1,63 @@ +# please set your project account +export proj="" # change me! + +# remembers the location of this script +export MY_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE) +# early access: not yet used +# if [ -z ${proj-} ]; then echo "WARNING: The 'proj' variable is not yet set in your $MY_PROFILE file! Please edit its line 2 to continue!"; return; fi + +# required dependencies +module load cmake/3.29.2 +module load cray-fftw/3.3.10.11 + +# optional: faster builds +# ccache is system provided +module load ninja/1.10.2 + +# optional: for QED support with detailed tables +# TODO: no Boost module found + +# optional: for openPMD and PSATD+RZ support +SW_DIR="/p/lustre5/${USER}/tuolumne/warpx/cpu" +# module load cray-hdf5-parallel/1.14.3.5 # missing module for cce/20.0.0 +export CMAKE_PREFIX_PATH=${SW_DIR}/c-blosc-2.15.1:$CMAKE_PREFIX_PATH +export CMAKE_PREFIX_PATH=${SW_DIR}/adios2-2.10.2:$CMAKE_PREFIX_PATH +export CMAKE_PREFIX_PATH=${SW_DIR}/blaspp-2024.05.31:$CMAKE_PREFIX_PATH +export CMAKE_PREFIX_PATH=${SW_DIR}/lapackpp-2024.05.31:$CMAKE_PREFIX_PATH + +export LD_LIBRARY_PATH=${SW_DIR}/c-blosc-2.15.1/lib64:$LD_LIBRARY_PATH +export LD_LIBRARY_PATH=${SW_DIR}/adios2-2.10.2/lib64:$LD_LIBRARY_PATH +export LD_LIBRARY_PATH=${SW_DIR}/blaspp-2024.05.31/lib64:$LD_LIBRARY_PATH +export LD_LIBRARY_PATH=${SW_DIR}/lapackpp-2024.05.31/lib64:$LD_LIBRARY_PATH + +export PATH=${SW_DIR}/adios2-2.10.2/bin:${PATH} + +# python +module load cray-python/3.11.7 + +if [ -d "${SW_DIR}/venvs/warpx-tuolumne-cpu" ] +then + source ${SW_DIR}/venvs/warpx-tuolumne-cpu/bin/activate +fi + +# an alias to request an interactive batch node for one hour +# for parallel execution, start on the batch node: srun +alias getNode="salloc -N 1 -t 1:00:00" +# an alias to run a command on a batch node for up to 30min +# usage: runNode +alias runNode="srun -N 1 --ntasks-per-node=4 -t 0:30:00" + +# Transparent huge pages on CPU +# https://hpc.llnl.gov/documentation/user-guides/using-el-capitan-systems/introduction-and-quickstart/pro-tips +#export LDFLAGS="${LDFLAGS} -lhugetlbfs" +#export HSA_XNACK=1 +#export HUGETLB_MORECORE=yes + +# optimize compilation for MI300A CPU part (Zen4) +export CXXFLAGS="-march=znver4" +export CFLAGS="-march=znver4" + +# compiler environment hints +export CC=$(which cc) +export CXX=$(which CC) +export FC=$(which ftn) From bb4872c09b6058a4d5a5761312f7922f2f294cb1 Mon Sep 17 00:00:00 2001 From: Axel Huebl Date: Wed, 15 Oct 2025 12:08:26 -0700 Subject: [PATCH 2/7] Job Script --- Docs/source/install/hpc/tuolumne.rst | 30 +++++++--- .../machines/tuolumne-llnl/tuolumne_cpu.flux | 55 +++++++++++++++++++ .../tuolumne-llnl/tuolumne_mi300a.flux | 4 +- 3 files changed, 78 insertions(+), 11 deletions(-) create mode 100644 Tools/machines/tuolumne-llnl/tuolumne_cpu.flux diff --git a/Docs/source/install/hpc/tuolumne.rst b/Docs/source/install/hpc/tuolumne.rst index 3cf68a0c42b..d4d0366326b 100644 --- a/Docs/source/install/hpc/tuolumne.rst +++ b/Docs/source/install/hpc/tuolumne.rst @@ -269,20 +269,32 @@ MI300A APUs (128GB) The batch script below can be used to run a WarpX simulation on 1 node with 4 APUs on the supercomputer Tuolumne at LLNL. Replace descriptions between chevrons ``<>`` by relevant values, for instance ```` could be ``plasma_mirror_inputs``. -WarpX runs with one MPI rank per GPU. +WarpX runs with one MPI rank per GPU and uses 21 (of 24) CPU cores (3 are reserved for the system). -Note that we append these non-default runtime options: +.. tab-set:: -* ``amrex.use_gpu_aware_mpi=1``: make use of fast APU to APU MPI communications + .. tab-item:: GPU -.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux - :language: bash - :caption: You can copy this file from ``Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux``. + .. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux + :language: bash + :caption: You can copy this file from ``Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux``. -To run a simulation, copy the lines above to a file ``tuolumne_mi300a.flux`` and run + To run a simulation, copy the lines above to a file ``tuolumne_mi300a.flux`` and run -.. code-block:: bash + .. code-block:: bash + + flux batch tuolumne_mi300a.flux + + .. tab-item:: CPU + + .. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_cpu.flux + :language: bash + :caption: You can copy this file from ``Tools/machines/tuolumne-llnl/tuolumne_cpu.flux``. + + To run a simulation, copy the lines above to a file ``tuolumne_cpu.flux`` and run + + .. code-block:: bash - flux batch tuolumne_mi300a.flux + flux batch tuolumne_cpu.flux to submit the job. diff --git a/Tools/machines/tuolumne-llnl/tuolumne_cpu.flux b/Tools/machines/tuolumne-llnl/tuolumne_cpu.flux new file mode 100644 index 00000000000..e23dab00145 --- /dev/null +++ b/Tools/machines/tuolumne-llnl/tuolumne_cpu.flux @@ -0,0 +1,55 @@ +#!/bin/bash + +# Copyright 2025 The WarpX Community +# +# This file is part of WarpX. +# +# Authors: Axel Huebl, Andreas Kemp +# License: BSD-3-Clause-LBNL + +### Flux directives ### +#flux: --setattr=bank=mstargt +#flux: --job-name=hemi +#flux: --nodes=16 +#flux: --time-limit=360s +#flux: --queue=pbatch +# pdebug +#flux: --exclusive +#flux: --error=WarpX.e{{id}} +#flux: --output=WarpX.o{{id}} + +# Not yet tested: Transparent huge pages on CPU +# https://hpc.llnl.gov/documentation/user-guides/using-el-capitan-systems/introduction-and-quickstart/pro-tips +# --setattr=thp=always + +# executable & inputs file or python interpreter & PICMI script here +EXE="./warpx.2d" +INPUTS="./inputs_hist_10.input" + +# enviroment setup +if [[ -z "${MY_PROFILE}" ]]; then + echo "WARNING: FORGOT TO" + echo " source $HOME/tuolumne_cpu_warpx.profile" + echo "before submission. Doing that now." + + source $HOME/tuolumne_cpu_warpx.profile +fi + +# pin to closest NIC to APU +#export MPICH_OFI_NIC_POLICY=APU + +# Not yet tested: Transparent huge pages on CPU +# https://hpc.llnl.gov/documentation/user-guides/using-el-capitan-systems/introduction-and-quickstart/pro-tips +#export HSA_XNACK=1 +#export HUGETLB_MORECORE=yes + +# threads for OpenMP and threaded compressors per MPI rank +# note: 21 physical cores per socket maximum (system reserves 3) +export OMP_NUM_THREADS=21 + +# start MPI parallel processes +NNODES=$(flux resource list -s up -no {nnodes}) +flux run --exclusive --nodes=${NNODES} \ + --tasks-per-node=4 \ + ${EXE} ${INPUTS} \ + > output.txt diff --git a/Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux b/Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux index 934cca390a5..1c1b43212a1 100644 --- a/Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux +++ b/Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux @@ -44,8 +44,8 @@ export MPICH_OFI_NIC_POLICY=GPU #export HUGETLB_MORECORE=yes # threads for OpenMP and threaded compressors per MPI rank -# note: 16 avoids hyperthreading (32 virtual cores, 16 physical) -export OMP_NUM_THREADS=16 +# note: 21 physical cores per socket maximum (system reserves 3) +export OMP_NUM_THREADS=21 # GPU-aware MPI optimizations GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1" From 6262c1ab3608156a0eedcdb605c3ad1deb9e6521 Mon Sep 17 00:00:00 2001 From: Axel Huebl Date: Fri, 17 Oct 2025 14:13:00 -0700 Subject: [PATCH 3/7] Dependencies: Build More Static Libs Try to delay the on-set of the error: ``` ./warpx.3d: error while loading shared libraries: /opt/cray/pe/lib64/libsci_cray_mp.so.6: cannot allocate memory in static TLS block ``` --- Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh | 4 ++++ Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh b/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh index 2a7a68c2be6..6dfc982926f 100644 --- a/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh +++ b/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh @@ -53,6 +53,7 @@ cmake \ --fresh \ -S ${SRC_DIR}/c-blosc2 \ -B ${build_dir}/c-blosc2-build \ + -DBUILD_SHARED_LIBS=OFF \ -DBUILD_TESTS=OFF \ -DBUILD_BENCHMARKS=OFF \ -DBUILD_EXAMPLES=OFF \ @@ -87,6 +88,7 @@ cmake \ -DADIOS2_USE_Fortran=OFF \ -DADIOS2_USE_Python=OFF \ -DADIOS2_USE_ZeroMQ=OFF \ + -DBUILD_SHARED_LIBS=OFF \ -DCMAKE_INSTALL_PREFIX=${SW_DIR}/adios2-2.10.2 cmake \ --build ${build_dir}/adios2-build \ @@ -110,6 +112,7 @@ cmake \ -B ${build_dir}/blaspp-tuolumne-cpu-build \ -Duse_openmp=ON \ -Dgpu_backend=OFF \ + -DBUILD_SHARED_LIBS=OFF \ -DCMAKE_CXX_STANDARD=17 \ -DCMAKE_INSTALL_PREFIX=${SW_DIR}/blaspp-2024.05.31 cmake \ @@ -135,6 +138,7 @@ cmake \ -DCMAKE_CXX_STANDARD=17 \ -Dgpu_backend=OFF \ -Dbuild_tests=OFF \ + -DBUILD_SHARED_LIBS=OFF \ -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON \ -DCMAKE_INSTALL_PREFIX=${SW_DIR}/lapackpp-2024.05.31 cmake \ diff --git a/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh b/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh index c9ee80fd991..85a54304f1c 100644 --- a/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh +++ b/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh @@ -53,6 +53,7 @@ cmake \ --fresh \ -S ${SRC_DIR}/c-blosc2 \ -B ${build_dir}/c-blosc2-build \ + -DBUILD_SHARED_LIBS=OFF \ -DBUILD_TESTS=OFF \ -DBUILD_BENCHMARKS=OFF \ -DBUILD_EXAMPLES=OFF \ @@ -87,6 +88,7 @@ cmake \ -DADIOS2_USE_Fortran=OFF \ -DADIOS2_USE_Python=OFF \ -DADIOS2_USE_ZeroMQ=OFF \ + -DBUILD_SHARED_LIBS=OFF \ -DCMAKE_INSTALL_PREFIX=${SW_DIR}/adios2-2.10.2 cmake \ --build ${build_dir}/adios2-build \ @@ -110,6 +112,7 @@ cmake \ -B ${build_dir}/blaspp-tuolumne-mi300a-build \ -Duse_openmp=OFF \ -Dgpu_backend=hip \ + -DBUILD_SHARED_LIBS=OFF \ -DCMAKE_CXX_STANDARD=17 \ -DCMAKE_INSTALL_PREFIX=${SW_DIR}/blaspp-2024.05.31 cmake \ @@ -135,6 +138,7 @@ cmake \ -DCMAKE_CXX_STANDARD=17 \ -Dgpu_backend=hip \ -Dbuild_tests=OFF \ + -DBUILD_SHARED_LIBS=OFF \ -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON \ -DCMAKE_INSTALL_PREFIX=${SW_DIR}/lapackpp-2024.05.31 cmake \ From 5fb85a8181ea6767b6ac4f5a21abf5d66f77d26a Mon Sep 17 00:00:00 2001 From: Axel Huebl Date: Fri, 17 Oct 2025 21:21:00 -0700 Subject: [PATCH 4/7] Unload libsci --- Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example | 1 + 1 file changed, 1 insertion(+) diff --git a/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example b/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example index bf3ff4d8dd6..80d3995db6c 100644 --- a/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example +++ b/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example @@ -7,6 +7,7 @@ export MY_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE # if [ -z ${proj-} ]; then echo "WARNING: The 'proj' variable is not yet set in your $MY_PROFILE file! Please edit its line 2 to continue!"; return; fi # required dependencies +module unload cray-libsci module load cmake/3.29.2 module load cray-fftw/3.3.10.11 From 6f33b0d3015e00645d030653df11fb98d71313d5 Mon Sep 17 00:00:00 2001 From: Axel Huebl Date: Fri, 17 Oct 2025 21:43:58 -0700 Subject: [PATCH 5/7] Tuo: BYO HDF5 Bring Your Own HDF5 --- .../tuolumne-llnl/install_cpu_dependencies.sh | 24 +++++++++++++++++++ .../install_mi300a_dependencies.sh | 24 +++++++++++++++++++ .../tuolumne_cpu_warpx.profile.example | 4 +++- .../tuolumne_mi300a_warpx.profile.example | 4 +++- 4 files changed, 54 insertions(+), 2 deletions(-) diff --git a/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh b/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh index 6dfc982926f..49fa519feac 100644 --- a/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh +++ b/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh @@ -69,6 +69,30 @@ cmake \ --parallel ${build_procs} rm -rf ${build_dir}/c-blosc2-build +# HDF5 +if [ -d ${SRC_DIR}/hdf5 ] +then + cd ${SRC_DIR}/hdf5 + git fetch --prune + git checkout hdf5-1_14_1-2 + cd - +else + git clone -b hdf5-1_14_1-2 https://github.com/HDFGroup/hdf5.git ${SRC_DIR}/hdf5 +fi +cmake \ + --fresh \ + -S ${SRC_DIR}/hdf5 \ + -B ${build_dir}/hdf5-build \ + -DBUILD_SHARED_LIBS=OFF \ + -DBUILD_TESTING=OFF \ + -DHDF5_ENABLE_PARALLEL=ON \ + -DCMAKE_INSTALL_PREFIX=${SW_DIR}/hdf5-1.14.1.2 +cmake \ + --build ${build_dir}/hdf5-build \ + --target install \ + --parallel ${build_procs} +rm -rf ${build_dir}/hdf5-build + # ADIOS2 if [ -d ${SRC_DIR}/adios2 ] then diff --git a/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh b/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh index 85a54304f1c..cce000a7d50 100644 --- a/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh +++ b/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh @@ -69,6 +69,30 @@ cmake \ --parallel ${build_procs} rm -rf ${build_dir}/c-blosc2-build +# HDF5 +if [ -d ${SRC_DIR}/hdf5 ] +then + cd ${SRC_DIR}/hdf5 + git fetch --prune + git checkout hdf5-1_14_1-2 + cd - +else + git clone -b hdf5-1_14_1-2 https://github.com/HDFGroup/hdf5.git ${SRC_DIR}/hdf5 +fi +cmake \ + --fresh \ + -S ${SRC_DIR}/hdf5 \ + -B ${build_dir}/hdf5-build \ + -DBUILD_SHARED_LIBS=OFF \ + -DBUILD_TESTING=OFF \ + -DHDF5_ENABLE_PARALLEL=ON \ + -DCMAKE_INSTALL_PREFIX=${SW_DIR}/hdf5-1.14.1.2 +cmake \ + --build ${build_dir}/hdf5-build \ + --target install \ + --parallel ${build_procs} +rm -rf ${build_dir}/hdf5-build + # ADIOS2 if [ -d ${SRC_DIR}/adios2 ] then diff --git a/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example b/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example index 80d3995db6c..b42e170e2b0 100644 --- a/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example +++ b/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example @@ -20,18 +20,20 @@ module load ninja/1.10.2 # optional: for openPMD and PSATD+RZ support SW_DIR="/p/lustre5/${USER}/tuolumne/warpx/cpu" -# module load cray-hdf5-parallel/1.14.3.5 # missing module for cce/20.0.0 +export CMAKE_PREFIX_PATH=${SW_DIR}/hdf5-1.14.1.2:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/c-blosc-2.15.1:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/adios2-2.10.2:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/blaspp-2024.05.31:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/lapackpp-2024.05.31:$CMAKE_PREFIX_PATH +export LD_LIBRARY_PATH=${SW_DIR}/hdf5-1.14.1.2/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/c-blosc-2.15.1/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/adios2-2.10.2/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/blaspp-2024.05.31/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/lapackpp-2024.05.31/lib64:$LD_LIBRARY_PATH export PATH=${SW_DIR}/adios2-2.10.2/bin:${PATH} +export PATH=${SW_DIR}/hdf5-1.14.1.2/bin:${PATH} # python module load cray-python/3.11.7 diff --git a/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example b/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example index 5226eb6c291..03a9df910f4 100644 --- a/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example +++ b/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example @@ -19,18 +19,20 @@ module load ninja/1.10.2 # optional: for openPMD and PSATD+RZ support SW_DIR="/p/lustre5/${USER}/tuolumne/warpx/mi300a" -# module load cray-hdf5-parallel/1.14.3.5 # missing module for cce/20.0.0 +export CMAKE_PREFIX_PATH=${SW_DIR}/hdf5-1.14.1.2:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/c-blosc-2.15.1:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/adios2-2.10.2:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/blaspp-2024.05.31:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/lapackpp-2024.05.31:$CMAKE_PREFIX_PATH +export LD_LIBRARY_PATH=${SW_DIR}/hdf5-1.14.1.2/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/c-blosc-2.15.1/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/adios2-2.10.2/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/blaspp-2024.05.31/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/lapackpp-2024.05.31/lib64:$LD_LIBRARY_PATH export PATH=${SW_DIR}/adios2-2.10.2/bin:${PATH} +export PATH=${SW_DIR}/hdf5-1.14.1.2/bin:${PATH} # python module load cray-python/3.11.7 From b2da87471bc8c8bf77a45207043dc47a6f91f2d5 Mon Sep 17 00:00:00 2001 From: Axel Huebl Date: Fri, 17 Oct 2025 23:07:49 -0700 Subject: [PATCH 6/7] Tuo: PETSC --- .../tuolumne-llnl/install_cpu_dependencies.sh | 36 ++++++++++++++++++ .../install_mi300a_dependencies.sh | 37 +++++++++++++++++++ .../tuolumne_cpu_warpx.profile.example | 2 + .../tuolumne_mi300a_warpx.profile.example | 2 + 4 files changed, 77 insertions(+) diff --git a/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh b/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh index 49fa519feac..8b7f448b0f9 100644 --- a/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh +++ b/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh @@ -171,6 +171,42 @@ cmake \ --parallel ${build_procs} rm -rf ${build_dir}/lapackpp-tuolumne-cpu-build +# PETSC +if [ -d ${SRC_DIR}/petsc ] +then + cd ${SRC_DIR}/petsc + git fetch --prune + git checkout v3.24.0 + cd - +else + git clone -b v3.24.0 https://gitlab.com/petsc/petsc.git ${SRC_DIR}/petsc +fi +cd petsc +./configure \ + CC=${CC} \ + CXX=${CXX} \ + FC=${FC} \ + COPTFLAGS="-g -O3" \ + FOPTFLAGS="-g -O3" \ + CXXOPTFLAGS="-g -O2" \ + --prefix=${SW_DIR}/petsc-3.24.0 \ + --with-batch \ + --with-cmake=1 \ + --with-cuda=0 \ + --with-hip=0 \ + --with-fortran-bindings=0 \ + --with-fftw=1 \ + --with-fftw-dir=${FFTW_ROOT} \ + --with-make-np=${build_procs} \ + ---with-openmp-kernels=1 \ + --with-clean=1 \ + --with-debugging=0 \ + --with-x=0 \ + --with-zlib=1 +make all +make install +cd - + # Python ###################################################################### # # sometimes, the Tuolumne PIP Index is down diff --git a/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh b/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh index cce000a7d50..75366f6eda3 100644 --- a/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh +++ b/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh @@ -171,6 +171,43 @@ cmake \ --parallel ${build_procs} rm -rf ${build_dir}/lapackpp-tuolumne-mi300a-build +# PETSC +if [ -d ${SRC_DIR}/petsc ] +then + cd ${SRC_DIR}/petsc + git fetch --prune + git checkout v3.24.0 + cd - +else + git clone -b v3.24.0 https://gitlab.com/petsc/petsc.git ${SRC_DIR}/petsc +fi +cd petsc +./configure \ + COPTFLAGS="-g -O3" \ + FOPTFLAGS="-g -O3" \ + CXXOPTFLAGS="-g -O2" \ + HIPOPTFLAGS="-g -O3" \ + LDFLAGS+="${LDFLAGS}" \ + --prefix=${SW_DIR}/petsc-3.24.0 \ + --with-batch \ + --with-cmake=1 \ + --with-cuda=0 \ + --with-hip=1 \ + --with-hip-dir=${ROCM_PATH} \ + --with-fortran-bindings=0 \ + --with-fftw=0 \ + --download-kokkos \ + --download-kokkos-kernels \ + --with-make-np=${build_procs} \ + --with-mpi-dir=${MPICH_DIR} \ + --with-clean=1 \ + --with-debugging=0 \ + --with-x=0 \ + --with-zlib=1 +make all +make install +cd - + # Python ###################################################################### # # sometimes, the Tuolumne PIP Index is down diff --git a/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example b/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example index b42e170e2b0..a83d6e47d2b 100644 --- a/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example +++ b/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example @@ -25,12 +25,14 @@ export CMAKE_PREFIX_PATH=${SW_DIR}/c-blosc-2.15.1:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/adios2-2.10.2:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/blaspp-2024.05.31:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/lapackpp-2024.05.31:$CMAKE_PREFIX_PATH +export CMAKE_PREFIX_PATH=${SW_DIR}/petsc-3.24.0:$CMAKE_PREFIX_PATH export LD_LIBRARY_PATH=${SW_DIR}/hdf5-1.14.1.2/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/c-blosc-2.15.1/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/adios2-2.10.2/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/blaspp-2024.05.31/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/lapackpp-2024.05.31/lib64:$LD_LIBRARY_PATH +export LD_LIBRARY_PATH=${SW_DIR}/petsc-3.24.0/lib:$LD_LIBRARY_PATH export PATH=${SW_DIR}/adios2-2.10.2/bin:${PATH} export PATH=${SW_DIR}/hdf5-1.14.1.2/bin:${PATH} diff --git a/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example b/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example index 03a9df910f4..d9503eca454 100644 --- a/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example +++ b/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example @@ -24,12 +24,14 @@ export CMAKE_PREFIX_PATH=${SW_DIR}/c-blosc-2.15.1:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/adios2-2.10.2:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/blaspp-2024.05.31:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${SW_DIR}/lapackpp-2024.05.31:$CMAKE_PREFIX_PATH +export CMAKE_PREFIX_PATH=${SW_DIR}/petsc-3.24.0:$CMAKE_PREFIX_PATH export LD_LIBRARY_PATH=${SW_DIR}/hdf5-1.14.1.2/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/c-blosc-2.15.1/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/adios2-2.10.2/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/blaspp-2024.05.31/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${SW_DIR}/lapackpp-2024.05.31/lib64:$LD_LIBRARY_PATH +export LD_LIBRARY_PATH=${SW_DIR}/petsc-3.24.0/lib:$LD_LIBRARY_PATH export PATH=${SW_DIR}/adios2-2.10.2/bin:${PATH} export PATH=${SW_DIR}/hdf5-1.14.1.2/bin:${PATH} From 43f7d848b2d9db158a322eb971cc48c3b3632104 Mon Sep 17 00:00:00 2001 From: Axel Huebl Date: Fri, 17 Oct 2025 23:41:48 -0700 Subject: [PATCH 7/7] Flux: Shut Down 2min to Walltime --- Docs/source/install/hpc/tuolumne.rst | 3 +++ Docs/source/usage/parameters.rst | 2 ++ Tools/machines/tuolumne-llnl/tuolumne_cpu.flux | 11 ++++++++--- Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux | 9 +++++++-- 4 files changed, 20 insertions(+), 5 deletions(-) diff --git a/Docs/source/install/hpc/tuolumne.rst b/Docs/source/install/hpc/tuolumne.rst index d4d0366326b..78e2a49b39a 100644 --- a/Docs/source/install/hpc/tuolumne.rst +++ b/Docs/source/install/hpc/tuolumne.rst @@ -271,6 +271,9 @@ The batch script below can be used to run a WarpX simulation on 1 node with 4 AP Replace descriptions between chevrons ``<>`` by relevant values, for instance ```` could be ``plasma_mirror_inputs``. WarpX runs with one MPI rank per GPU and uses 21 (of 24) CPU cores (3 are reserved for the system). +The batch script below also :ref:`sends WarpX a signal ` when the simulations gets close to the walltime of the job, to shut down cleanly. +Adjust the ``FLUX_WT_SIG`` and ``WARPX_WT`` to modify or disable this behavior as needed. + .. tab-set:: .. tab-item:: GPU diff --git a/Docs/source/usage/parameters.rst b/Docs/source/usage/parameters.rst index 65e8ffa418b..6710da45aa1 100644 --- a/Docs/source/usage/parameters.rst +++ b/Docs/source/usage/parameters.rst @@ -354,6 +354,8 @@ Overall simulation parameters If set, the environment variable ``OMP_NUM_THREADS`` takes precedence over ``system`` and ``nosmt``, but not over integer numbers set in this option. +.. _running-cpp-parameters-signal: + Signal Handling ^^^^^^^^^^^^^^^ diff --git a/Tools/machines/tuolumne-llnl/tuolumne_cpu.flux b/Tools/machines/tuolumne-llnl/tuolumne_cpu.flux index e23dab00145..cb34b93dd71 100644 --- a/Tools/machines/tuolumne-llnl/tuolumne_cpu.flux +++ b/Tools/machines/tuolumne-llnl/tuolumne_cpu.flux @@ -26,6 +26,11 @@ EXE="./warpx.2d" INPUTS="./inputs_hist_10.input" +# clean shutdown close to walltime (or checkpoint) +# https://warpx.readthedocs.io/en/latest/usage/parameters.html#signal-handling +FLUX_WT_SIG="--signal=SIGUSR1@120s" +WARPX_WT="warpx.break_signals=USR1" + # enviroment setup if [[ -z "${MY_PROFILE}" ]]; then echo "WARNING: FORGOT TO" @@ -49,7 +54,7 @@ export OMP_NUM_THREADS=21 # start MPI parallel processes NNODES=$(flux resource list -s up -no {nnodes}) -flux run --exclusive --nodes=${NNODES} \ - --tasks-per-node=4 \ - ${EXE} ${INPUTS} \ +flux run ${FLUX_WT_SIG} --exclusive --nodes=${NNODES} \ + --tasks-per-node=4 \ + ${EXE} ${INPUTS} ${WARPX_WT} \ > output.txt diff --git a/Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux b/Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux index 1c1b43212a1..5c228222866 100644 --- a/Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux +++ b/Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux @@ -26,6 +26,11 @@ EXE="./warpx.2d" INPUTS="./inputs_hist_10.input" +# clean shutdown close to walltime (or checkpoint) +# https://warpx.readthedocs.io/en/latest/usage/parameters.html#signal-handling +FLUX_WT_SIG="--signal=SIGUSR1@120s" +WARPX_WT="warpx.break_signals=USR1" + # enviroment setup if [[ -z "${MY_PROFILE}" ]]; then echo "WARNING: FORGOT TO" @@ -52,8 +57,8 @@ GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1" # start MPI parallel processes NNODES=$(flux resource list -s up -no {nnodes}) -flux run --exclusive --nodes=${NNODES} \ +flux run ${FLUX_WT_SIG} --exclusive --nodes=${NNODES} \ --tasks-per-node=4 \ ${EXE} ${INPUTS} \ - ${GPU_AWARE_MPI} \ + ${GPU_AWARE_MPI} ${WARPX_WT} \ > output.txt