Skip to content

Commit 080d4bd

Browse files
ax3llucafedeli88
andauthored
Docs: Ookami (Stony Brook) (#1991)
* Docs: Ookami (Stony Brook) Add Ookami build instructions. * Add Suggestions from Luca Co-authored-by: Luca Fedeli <[email protected]> * Add 4x12 MPI/OMP ups, forgot to add * Ookami: Finalize Batch & Storage Co-authored-by: Luca Fedeli <[email protected]>
1 parent 2cfc24f commit 080d4bd

File tree

2 files changed

+149
-0
lines changed

2 files changed

+149
-0
lines changed

Docs/source/install/hpc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ HPC Systems
3131
hpc/lassen
3232
hpc/quartz
3333
hpc/lawrencium
34+
hpc/ookami
3435

3536
.. tip::
3637

Docs/source/install/hpc/ookami.rst

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
.. _building-ookami:
2+
3+
Ookami (Stony Brook)
4+
====================
5+
6+
The `Ookami cluster <https://www.stonybrook.edu/ookami/>`__ is located at Stony Brook University.
7+
8+
If you are new to this system, please see the following resources:
9+
10+
* `Ookami documentation <https://www.stonybrook.edu/commcms/ookami/support/index_links_and_docs.php>`__
11+
* Batch system: `Slurm <https://www.stonybrook.edu/commcms/ookami/support/faq/example-slurm-script>`__ (see `available queues <https://www.stonybrook.edu/commcms/ookami/support/faq/queues_on_ookami>`__)
12+
* `Filesystem locations <https://www.stonybrook.edu/commcms/ookami/support/faq/ookami_storage_options.php>`__:
13+
14+
* ``/lustre/home/<netid>`` (30GByte, backuped)
15+
* ``/lustre/scratch/<netid>`` (14 day purge)
16+
* ``/lustre/projects/<your_group>*`` (1TByte default, up to 8TB possible, shared within our group/project, backuped, prefer this location)
17+
18+
We use Ookami as a development cluster for `A64FX <https://www.arm.com/blogs/blueprint/fujitsu-a64fx-arm>`__,
19+
The cluster also provides a few extra nodes, e.g. two ``Thunder X2`` (ARM) nodes.
20+
21+
22+
23+
Installation
24+
------------
25+
26+
Use the following commands to download the WarpX source code and switch to the correct branch:
27+
28+
.. code-block:: bash
29+
30+
git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx
31+
32+
We use the following modules and environments on the system (``$HOME/warpx_gcc10.profile``).
33+
34+
.. code-block:: bash
35+
36+
# please set your project account (not relevant yet)
37+
#export proj=<yourProject>
38+
39+
# required dependencies
40+
module load cmake/3.19.0
41+
module load gcc/10.3.0
42+
module load openmpi/gcc10/4.1.0
43+
44+
# optional: faster builds (not available yet)
45+
#module load ccache
46+
#module load ninja
47+
48+
# optional: for PSATD support (not available yet)
49+
#module load fftw
50+
51+
# optional: for QED lookup table generation support (not available yet)
52+
#module load boost
53+
54+
# optional: for openPMD support
55+
#module load adios2 # not available yet
56+
#module load hdf5 # only serial
57+
58+
# compiler environment hints
59+
export CC=$(which gcc)
60+
export CXX=$(which g++)
61+
export FC=$(which gfortran)
62+
export CXXFLAGS="-mcpu=a64fx"
63+
64+
65+
We recommend to store the above lines in a file, such as ``$HOME/warpx_gcc10.profile``, and load it into your shell after a login:
66+
67+
.. code-block:: bash
68+
69+
source $HOME/warpx_gcc10.profile
70+
71+
72+
Then, ``cd`` into the directory ``$HOME/src/warpx`` and use the following commands to compile:
73+
74+
.. code-block:: bash
75+
76+
cd $HOME/src/warpx
77+
rm -rf build
78+
79+
cmake -S . -B build -DWarpX_COMPUTE=OMP -DWarpX_OPENPMD=ON
80+
cmake --build build -j 10
81+
82+
# or (currently better performance)
83+
cmake -S . -B build -DWarpX_COMPUTE=NOACC -DWarpX_OPENPMD=ON
84+
cmake --build build -j 10
85+
86+
The general :ref:`cmake compile-time options <building-cmake>` apply as usual.
87+
88+
89+
.. _running-cpp-ookami:
90+
91+
Running
92+
-------
93+
94+
For running on 48 cores of a single node:
95+
96+
.. code-block:: bash
97+
98+
srun -p short -N 1 -n 48 --pty bash
99+
OMP_NUM_THREADS=1 mpiexec -n 48 --map-by ppr:12:numa:pe=1 --report-bindings ./warpx inputs
100+
101+
# alternatively, using 4 MPI ranks with each 12 threads on a single node:
102+
OMP_NUM_THREADS=12 mpiexec -n 4 --map-by ppr:4:numa:pe=12 --report-bindings ./warpx inputs
103+
104+
The Ookami HPE Apollo 80 system has 174 A64FX compute nodes each with 32GB of high-bandwidth memory.
105+
106+
107+
Additional Compilers
108+
--------------------
109+
110+
This section is just a note for developers.
111+
We compiled with the Fujitsu Compiler (Clang) with the following build string:
112+
113+
.. code-block:: bash
114+
115+
cmake -S . -B build \
116+
-DCMAKE_C_COMPILER=$(which mpifcc) \
117+
-DCMAKE_C_COMPILER_ID="Clang" \
118+
-DCMAKE_C_COMPILER_VERSION=12.0 \
119+
-DCMAKE_C_STANDARD_COMPUTED_DEFAULT="11" \
120+
-DCMAKE_CXX_COMPILER=$(which mpiFCC) \
121+
-DCMAKE_CXX_COMPILER_ID="Clang" \
122+
-DCMAKE_CXX_COMPILER_VERSION=12.0 \
123+
-DCMAKE_CXX_STANDARD_COMPUTED_DEFAULT="14" \
124+
-DCMAKE_CXX_FLAGS="-Nclang" \
125+
-DAMReX_DIFFERENT_COMPILER=ON \
126+
-DAMReX_MPI_THREAD_MULTIPLE=FALSE \
127+
-DWarpX_COMPUTE=OMP
128+
cmake --build build -j 10
129+
130+
An internal compiler error requires us to modify a range-based for loop to a conventional for loop for ``WarpX::setLoadBalanceEfficiency``.
131+
We need to rewrite (at the moment three) loops that look roughly like this:
132+
133+
.. code-block:: cpp
134+
135+
for (int i : costs[lev]->IndexArray()) {
136+
(*costs[lev])[i] = 0.0;
137+
WarpX::setLoadBalanceEfficiency(lev, -1);
138+
}
139+
140+
into
141+
142+
.. code-block:: cpp
143+
144+
const auto idx_arr = costs[lev]->IndexArray();
145+
for (auto it = idx_arr.begin(); it < idx_arr.end(); ++it ) {
146+
(*costs[lev])[*it] = 0.0;
147+
WarpX::setLoadBalanceEfficiency(lev, -1);
148+
}

0 commit comments

Comments
 (0)