Skip to content

Commit edbc4da

Browse files
authored
Tuolumne (LLNL): CPU-Only, HDF5, PETSC, Signal Handling (#6270)
Add CPU-only instructions for Tuolumne at LLNL. This is mostly for development, because this is not using the GPU-part of the APU. For both GPU and CPU builds: Add support for a self-built HDF5 (system module went missing), self-built PETSC, and document signal handling. [Preview.](https://warpx--6270.org.readthedocs.build/en/6270/install/hpc/tuolumne.html) - [x] draft - [x] install: tested - [x] runtime: tested - [x] fix many-thread issue with TLS - [x] add HDF5 - [x] add PETSC - [x] document signal usage `flux run --signal=SIGUSR1@120s --exclusive ...` (only works if in `flux run ...` command, not the script comments)
1 parent 2362c33 commit edbc4da

File tree

8 files changed

+606
-67
lines changed

8 files changed

+606
-67
lines changed

Docs/source/install/hpc/tuolumne.rst

Lines changed: 157 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -44,73 +44,132 @@ Use the following commands to download the WarpX source code:
4444
4545
git clone https://github.com/BLAST-WarpX/warpx.git /p/lustre5/${USER}/tuolumne/src/warpx
4646
47-
We use system software modules, add environment hints and further dependencies via the file ``$HOME/tuolumne_mi300a_warpx.profile``.
48-
Create it now:
47+
On Tuolumne, we usually accelerate all computations with the GPU cores of the MI300A APU.
48+
For development purposes, you can also limit yourself to the CPU cores of the MI300A.
4949

50-
.. code-block:: bash
50+
.. tab-set::
5151

52-
cp /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example $HOME/tuolumne_mi300a_warpx.profile
52+
.. tab-item:: GPU
5353

54-
.. dropdown:: Script Details
55-
:color: light
56-
:icon: info
57-
:animate: fade-in-slide-down
54+
We use system software modules, add environment hints and further dependencies via the file ``$HOME/tuolumne_mi300a_warpx.profile``.
55+
Create it now:
5856

59-
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example
60-
:language: bash
57+
.. code-block:: bash
6158
62-
Edit the 2nd line of this script, which sets the ``export proj=""`` variable.
63-
**Currently, this is unused and can be kept empty.**
64-
Once project allocation becomes required, e.g., if you are member of the project ``abcde``, then run ``vi $HOME/tuolumne_mi300a_warpx.profile``.
65-
Enter the edit mode by typing ``i`` and edit line 2 to read:
59+
cp /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example $HOME/tuolumne_mi300a_warpx.profile
6660
67-
.. code-block:: bash
61+
.. dropdown:: Script Details
62+
:color: light
63+
:icon: info
64+
:animate: fade-in-slide-down
65+
66+
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_mi300a_warpx.profile.example
67+
:language: bash
68+
69+
Edit the 2nd line of this script, which sets the ``export proj=""`` variable.
70+
**Currently, this is unused and can be kept empty.**
71+
Once project allocation becomes required, e.g., if you are member of the project ``abcde``, then run ``vi $HOME/tuolumne_mi300a_warpx.profile``.
72+
Enter the edit mode by typing ``i`` and edit line 2 to read:
73+
74+
.. code-block:: bash
75+
76+
export proj="abcde"
77+
78+
Exit the ``vi`` editor with ``Esc`` and then type ``:wq`` (write & quit).
79+
80+
.. important::
81+
82+
Now, and as the first step on future logins to Tuolumne, activate these environment settings:
83+
84+
.. code-block:: bash
85+
86+
source $HOME/tuolumne_mi300a_warpx.profile
87+
88+
Finally, since Tuolumne does not yet provide software modules for some of our dependencies, install them once:
89+
90+
91+
.. code-block:: bash
92+
93+
bash /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh
94+
source /p/lustre5/${USER}/tuolumne/warpx/mi300a/venvs/warpx-tuolumne-mi300a/bin/activate
95+
96+
.. dropdown:: Script Details
97+
:color: light
98+
:icon: info
99+
:animate: fade-in-slide-down
100+
101+
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh
102+
:language: bash
103+
104+
.. dropdown:: AI/ML Dependencies (Optional)
105+
:animate: fade-in-slide-down
106+
107+
If you plan to run AI/ML workflows depending on PyTorch et al., run the next step as well.
108+
This will take a while and should be skipped if not needed.
109+
110+
.. code-block:: bash
111+
112+
bash /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/install_mi300a_ml.sh
113+
114+
.. dropdown:: Script Details
115+
:color: light
116+
:icon: info
117+
:animate: fade-in-slide-down
68118

69-
export proj="abcde"
119+
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/install_mi300a_ml.sh
120+
:language: bash
70121

71-
Exit the ``vi`` editor with ``Esc`` and then type ``:wq`` (write & quit).
122+
.. tab-item:: CPU
72123

73-
.. important::
124+
We use system software modules, add environment hints and further dependencies via the file ``$HOME/tuolumne_cpu_warpx.profile``.
125+
Create it now:
74126

75-
Now, and as the first step on future logins to Tuolumne, activate these environment settings:
127+
.. code-block:: bash
76128
77-
.. code-block:: bash
129+
cp /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example $HOME/tuolumne_cpu_warpx.profile
78130
79-
source $HOME/tuolumne_mi300a_warpx.profile
131+
.. dropdown:: Script Details
132+
:color: light
133+
:icon: info
134+
:animate: fade-in-slide-down
80135

81-
Finally, since Tuolumne does not yet provide software modules for some of our dependencies, install them once:
136+
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_cpu_warpx.profile.example
137+
:language: bash
82138

139+
Edit the 2nd line of this script, which sets the ``export proj=""`` variable.
140+
**Currently, this is unused and can be kept empty.**
141+
Once project allocation becomes required, e.g., if you are member of the project ``abcde``, then run ``vi $HOME/tuolumne_cpu_warpx.profile``.
142+
Enter the edit mode by typing ``i`` and edit line 2 to read:
83143

84-
.. code-block:: bash
144+
.. code-block:: bash
85145
86-
bash /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh
87-
source /p/lustre5/${USER}/tuolumne/warpx/mi300a/venvs/warpx-tuolumne-mi300a/bin/activate
146+
export proj="abcde"
88147
89-
.. dropdown:: Script Details
90-
:color: light
91-
:icon: info
92-
:animate: fade-in-slide-down
148+
Exit the ``vi`` editor with ``Esc`` and then type ``:wq`` (write & quit).
93149

94-
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/install_mi300a_dependencies.sh
95-
:language: bash
150+
.. important::
96151

97-
.. dropdown:: AI/ML Dependencies (Optional)
98-
:animate: fade-in-slide-down
152+
Now, and as the first step on future logins to Tuolumne, activate these environment settings:
99153

100-
If you plan to run AI/ML workflows depending on PyTorch et al., run the next step as well.
101-
This will take a while and should be skipped if not needed.
154+
.. code-block:: bash
102155
103-
.. code-block:: bash
156+
source $HOME/tuolumne_cpu_warpx.profile
104157
105-
bash /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/install_mi300a_ml.sh
158+
Finally, since Tuolumne does not yet provide software modules for some of our dependencies, install them once:
106159

107-
.. dropdown:: Script Details
108-
:color: light
109-
:icon: info
110-
:animate: fade-in-slide-down
111160

112-
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/install_mi300a_ml.sh
113-
:language: bash
161+
.. code-block:: bash
162+
163+
bash /p/lustre5/${USER}/tuolumne/src/warpx/Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh
164+
source /p/lustre5/${USER}/tuolumne/warpx/cpu/venvs/warpx-tuolumne-cpu/bin/activate
165+
166+
.. dropdown:: Script Details
167+
:color: light
168+
:icon: info
169+
:animate: fade-in-slide-down
170+
171+
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/install_cpu_dependencies.sh
172+
:language: bash
114173

115174

116175
.. _building-tuolumne-compilation:
@@ -120,20 +179,41 @@ Compilation
120179

121180
Use the following :ref:`cmake commands <building-cmake>` to compile the application executable:
122181

123-
.. code-block:: bash
182+
.. tab-set::
124183

125-
cd /p/lustre5/${USER}/tuolumne/src/warpx
184+
.. tab-item:: GPU
126185

127-
cmake --fresh -S . -B build_tuolumne -DWarpX_COMPUTE=HIP -DWarpX_FFT=ON -DWarpX_DIMS="1;2;RZ;3"
128-
cmake --build build_tuolumne -j 24
186+
.. code-block:: bash
129187
130-
The WarpX application executables are now in ``/p/lustre5/${USER}/tuolumne/src/warpx/build_tuolumne/bin/``.
131-
Additionally, the following commands will install WarpX as a Python module:
188+
cd /p/lustre5/${USER}/tuolumne/src/warpx
132189
133-
.. code-block:: bash
190+
cmake --fresh -S . -B build_tuolumne -DWarpX_COMPUTE=HIP -DWarpX_FFT=ON -DWarpX_DIMS="1;2;RZ;3"
191+
cmake --build build_tuolumne -j 24
192+
193+
The WarpX application executables are now in ``/p/lustre5/${USER}/tuolumne/src/warpx/build_tuolumne/bin/``.
194+
Additionally, the following commands will install WarpX as a Python module:
195+
196+
.. code-block:: bash
197+
198+
cmake --fresh -S . -B build_tuolumne_py -DWarpX_COMPUTE=HIP -DWarpX_FFT=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
199+
cmake --build build_tuolumne_py -j 24 --target pip_install
200+
201+
.. tab-item:: CPU
202+
203+
.. code-block:: bash
134204
135-
cmake --fresh -S . -B build_tuolumne_py -DWarpX_COMPUTE=HIP -DWarpX_FFT=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
136-
cmake --build build_tuolumne_py -j 24 --target pip_install
205+
cd /p/lustre5/${USER}/tuolumne/src/warpx
206+
207+
cmake --fresh -S . -B build_tuolumne_cpu -DWarpX_COMPUTE=OMP -DWarpX_FFT=ON -DWarpX_DIMS="1;2;RZ;3"
208+
cmake --build build_tuolumne_cpu -j 24
209+
210+
The WarpX application executables are now in ``/p/lustre5/${USER}/tuolumne/src/warpx/build_tuolumne_cpu/bin/``.
211+
Additionally, the following commands will install WarpX as a Python module:
212+
213+
.. code-block:: bash
214+
215+
cmake --fresh -S . -B build_tuolumne_cpu_py -DWarpX_COMPUTE=OMP -DWarpX_FFT=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
216+
cmake --build build_tuolumne_cpu_py -j 24 --target pip_install
137217
138218
Now, you can :ref:`submit tuolumne compute jobs <running-cpp-tuolumne>` for WarpX :ref:`Python (PICMI) scripts <usage-picmi>` (:ref:`example scripts <usage-examples>`).
139219
Or, you can use the WarpX executables to submit tuolumne jobs (:ref:`example inputs <usage-examples>`).
@@ -183,26 +263,41 @@ MI300A APUs (128GB)
183263

184264
`Each compute node <https://hpc.llnl.gov/documentation/user-guides/using-el-capitan-systems/introduction-and-quickstart/pro-tips>`__ is divided into 4 sockets, each with:
185265

186-
* 1 MI300A GPU,
266+
* 1 MI300A APU (incl. 1 GPU),
187267
* 21 available user CPU cores, with 3 cores reserved for the OS (2 hardware threads per core)
188268
* 128GB HBM3 memory (a single NUMA domain)
189269

190270
The batch script below can be used to run a WarpX simulation on 1 node with 4 APUs on the supercomputer Tuolumne at LLNL.
191271
Replace descriptions between chevrons ``<>`` by relevant values, for instance ``<input file>`` could be ``plasma_mirror_inputs``.
192-
WarpX runs with one MPI rank per GPU.
272+
WarpX runs with one MPI rank per GPU and uses 21 (of 24) CPU cores (3 are reserved for the system).
193273

194-
Note that we append these non-default runtime options:
274+
The batch script below also :ref:`sends WarpX a signal <running-cpp-parameters-signal>` when the simulations gets close to the walltime of the job, to shut down cleanly.
275+
Adjust the ``FLUX_WT_SIG`` and ``WARPX_WT`` to modify or disable this behavior as needed.
195276

196-
* ``amrex.use_gpu_aware_mpi=1``: make use of fast APU to APU MPI communications
277+
.. tab-set::
197278

198-
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux
199-
:language: bash
200-
:caption: You can copy this file from ``Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux``.
279+
.. tab-item:: GPU
201280

202-
To run a simulation, copy the lines above to a file ``tuolumne_mi300a.flux`` and run
281+
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux
282+
:language: bash
283+
:caption: You can copy this file from ``Tools/machines/tuolumne-llnl/tuolumne_mi300a.flux``.
203284

204-
.. code-block:: bash
285+
To run a simulation, copy the lines above to a file ``tuolumne_mi300a.flux`` and run
286+
287+
.. code-block:: bash
288+
289+
flux batch tuolumne_mi300a.flux
290+
291+
.. tab-item:: CPU
292+
293+
.. literalinclude:: ../../../../Tools/machines/tuolumne-llnl/tuolumne_cpu.flux
294+
:language: bash
295+
:caption: You can copy this file from ``Tools/machines/tuolumne-llnl/tuolumne_cpu.flux``.
296+
297+
To run a simulation, copy the lines above to a file ``tuolumne_cpu.flux`` and run
298+
299+
.. code-block:: bash
205300
206-
flux batch tuolumne_mi300a.flux
301+
flux batch tuolumne_cpu.flux
207302
208303
to submit the job.

Docs/source/usage/parameters.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,8 @@ Overall simulation parameters
354354
If set, the environment variable ``OMP_NUM_THREADS`` takes precedence over ``system`` and ``nosmt``, but not over integer numbers set in this option.
355355

356356

357+
.. _running-cpp-parameters-signal:
358+
357359
Signal Handling
358360
^^^^^^^^^^^^^^^
359361

0 commit comments

Comments
 (0)