Skip to content

Commit 1b75fb7

Browse files
committed
Merge branch 'dev' into clean_godfrey
2 parents 5da2c42 + 24147a2 commit 1b75fb7

34 files changed

+436
-170
lines changed

Docs/source/building/cori.rst

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,40 @@ In order to compile for the **Knight's Landing (KNL) architecture**:
5151
module swap PrgEnv-intel PrgEnv-gnu
5252
make -j 16 COMP=gnu
5353

54+
GPU Build
55+
---------
56+
57+
To compile on the experimental GPU nodes on Cori, you first need to purge
58+
your modules, most of which won't work on the GPU nodes.
59+
60+
::
61+
62+
module purge
63+
64+
Then, you need to load the following modules:
65+
66+
::
67+
68+
module load esslurm cuda pgi openmpi/3.1.0-ucx
69+
70+
Currently, you need to use OpenMPI; mvapich2 seems not to work.
71+
72+
Then, you need to use slurm to request access to a GPU node:
73+
74+
::
75+
76+
salloc -C gpu -N 1 -t 30 -c 10 --gres=gpu:1 --mem=30GB -A m1759
77+
78+
This reserves 10 logical cores (5 physical), 1 GPU, and 30 GB of RAM for your job.
79+
Note that you can't cross-compile for the GPU nodes - you have to log on to one
80+
and then build your software.
81+
82+
Finally, navigate to the base of the WarpX repository and compile in GPU mode:
83+
84+
::
85+
86+
make -j 16 COMP=pgi USE_GPU=TRUE
87+
5488

5589
Building WarpX with openPMD support
5690
-----------------------------------

Docs/source/building/summit.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ correct branch:
1515
git clone --branch master https://bitbucket.org/berkeleylab/picsar.git
1616
git clone --branch development https://github.com/AMReX-Codes/amrex.git
1717

18-
Then, use the following set of commands to compile:
18+
Then, ``cd`` into the directory ``WarpX`` and use the following set of commands to compile:
1919

2020
::
2121

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
Parallelization in WarpX
2+
=========================
3+
4+
When running a simulation, the domain is split into independent
5+
rectangular sub-domains (called **grids**). This is the way AMReX, a core
6+
component of WarpX, handles parallelization and/or mesh refinement. Furthermore,
7+
this decomposition makes load balancing possible: each MPI rank typically computes
8+
a few grids, and a rank with a lot of work can transfer one or several **grids**
9+
to their neighbors.
10+
11+
A user
12+
does not specify this decomposition explicitly. Instead, the user gives hints to
13+
the code, and the actual decomposition is determined at runtime, depending on
14+
the parallelization. The main user-defined parameters are
15+
``amr.max_grid_size`` and ``amr.blocking_factor``.
16+
17+
AMReX ``max_grid_size`` and ``blocking_factor``
18+
-----------------------------------------------
19+
20+
* ``amr.max_grid_size`` is the maximum number of points per **grid** along each
21+
direction (default ``amr.max_grid_size=32`` in 3D).
22+
23+
* ``amr.blocking_factor``: The size of each **grid** must be divisible by the
24+
`blocking_factor` along all dimensions (default ``amr.blocking_factor=8``).
25+
Note that the ``max_grid_size`` also has to be divisible by ``blocking_factor``.
26+
27+
These parameters can have a dramatic impact on the code performance. Each
28+
**grid** in the decomposition is surrounded by guard cells, thus increasing the
29+
amount of data, computation and communication. Hence having a too small
30+
``max_grid_size``, may ruin the code performance.
31+
32+
On the other hand, a too-large ``max_grid_size`` is likely to result in a single
33+
grid per MPI rank, thus preventing load balancing. By setting these two
34+
parameters, the user wants to give some flexibility to the code while avoiding
35+
pathological behaviors.
36+
37+
For more information on this decomposition, see the
38+
`Gridding and Load Balancing <https://amrex-codes.github.io/amrex/docs_html/ManagingGridHierarchy_Chapter.html>`__
39+
page on AMReX documentation.
40+
41+
For specific information on the dynamic load balancer used in WarpX, visit the
42+
`Load Balancing <https://amrex-codes.github.io/amrex/docs_html/LoadBalancing.html>`__
43+
page on the AMReX documentation.
44+
45+
The best values for these parameters strongly depends on a number of parameters,
46+
among which numerical parameters:
47+
48+
* Algorithms used (Maxwell/spectral field solver, filters, order of the
49+
particle shape factor)
50+
51+
* Number of guard cells (that depends on the particle shape factor and
52+
the type and order of the Maxwell solver, the filters used, `etc.`)
53+
54+
* Number of particles per cell, and the number of species
55+
56+
and MPI decomposition and computer architecture used for the run:
57+
58+
* GPU or CPU
59+
60+
* Number of OpenMP threads
61+
62+
* Amount of high-bandwidth memory.
63+
64+
Below is a list of experience-based parameters
65+
that were observed to give good performance on given supercomputers.
66+
67+
Rule of thumb for 3D runs on NERSC Cori KNL
68+
-------------------------------------------
69+
70+
For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell
71+
solver on Cori KNL for a well load-balanced problem (in our case laser
72+
wakefield acceleration simulation in a boosted frame in the quasi-linear
73+
regime), the following set of parameters provided good performance:
74+
75+
* ``amr.max_grid_size=64`` and ``amr.blocking_factor=64`` so that the size of
76+
each grid is fixed to ``64**3`` (we are not using load-balancing here).
77+
78+
* **8 MPI ranks per KNL node**, with ``OMP_NUM_THREADS=8`` (that is 64 threads
79+
per KNL node, i.e. 1 thread per physical core, and 4 cores left to the
80+
system).
81+
82+
* **2 grids per MPI**, *i.e.*, 16 grids per KNL node.

Docs/source/running_cpp/parameters.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -531,6 +531,11 @@ Numerics and algorithms
531531
- ``0``: Vectorized version
532532
- ``1``: Non-optimized version
533533

534+
.. warning::
535+
536+
The vectorized version does not run on GPU. Use
537+
``algo.charge_deposition=1`` when running on GPU.
538+
534539
* ``algo.field_gathering`` (`integer`)
535540
The algorithm for field gathering:
536541

@@ -649,6 +654,11 @@ Diagnostics and output
649654
perform on-the-fly conversion to the laboratory frame, when running
650655
boosted-frame simulations)
651656

657+
* ``warpx.lab_data_directory`` (`string`)
658+
The directory in which to save the lab frame data when using the
659+
**back-transformed diagnostics**. If not specified, the default is
660+
is `lab_frame_data`.
661+
652662
* ``warpx.num_snapshots_lab`` (`integer`)
653663
Only used when ``warpx.do_boosted_frame_diagnostic`` is ``1``.
654664
The number of lab-frame snapshots that will be written.

Docs/source/running_cpp/running_cpp.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ Running WarpX as an executable
88
examples
99
parameters
1010
profiling
11+
parallelization

Docs/source/visualization/yt.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ or with the `Anaconda distribution <https://anaconda.org/>`__ of python (recomme
1818

1919
::
2020

21-
conda install yt
21+
conda install -c conda-forge yt
2222

2323
Visualizing the data
2424
--------------------

Examples/Modules/gaussian_beam/gaussian_beam_PICMI.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
total_charge = 8.010883097437485e-07
1818

1919
beam_rms_size = 0.25
20-
electron_beam_divergence = -0.04
20+
electron_beam_divergence = -0.04*picmi.c
2121

2222
em_order = 3
2323

Examples/Tests/Langmuir/langmuir2d_PICMI.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@
99
xmax = +20.e-6
1010
ymax = +20.e-6
1111

12-
uniform_plasma = picmi.UniformDistribution(density=1.e25, upper_bound=[0., None, None], directed_velocity=[0.1, 0., 0.])
12+
uniform_plasma = picmi.UniformDistribution(density = 1.e25,
13+
upper_bound = [0., None, None],
14+
directed_velocity = [0.1*picmi.c, 0., 0.])
1315

1416
electrons = picmi.Species(particle_type='electron', name='electrons', initial_distribution=uniform_plasma)
1517

Examples/Tests/Langmuir/langmuir_PICMI.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@
1414
ymax = +20.e-6
1515
zmax = +20.e-6
1616

17-
uniform_plasma = picmi.UniformDistribution(density=1.e25, upper_bound=[0., None, None], directed_velocity=[0.1, 0., 0.])
17+
uniform_plasma = picmi.UniformDistribution(density = 1.e25,
18+
upper_bound = [0., None, None],
19+
directed_velocity = [0.1*picmi.c, 0., 0.])
1820

1921
electrons = picmi.Species(particle_type='electron', name='electrons', initial_distribution=uniform_plasma)
2022

Examples/Tests/Langmuir/langmuir_PICMI_rt.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@
1414
ymax = +20.e-6
1515
zmax = +20.e-6
1616

17-
uniform_plasma = picmi.UniformDistribution(density=1.e25, upper_bound=[0., None, None], directed_velocity=[0.1, 0., 0.])
17+
uniform_plasma = picmi.UniformDistribution(density = 1.e25,
18+
upper_bound = [0., None, None],
19+
directed_velocity = [0.1*picmi.c, 0., 0.])
1820

1921
electrons = picmi.Species(particle_type='electron', name='electrons', initial_distribution=uniform_plasma)
2022

0 commit comments

Comments
 (0)