|
| 1 | +Parallelization in WarpX |
| 2 | +========================= |
| 3 | + |
| 4 | +When running a simulation, the domain is split into independent |
| 5 | +rectangular sub-domains (called **grids**). This is the way AMReX, a core |
| 6 | +component of WarpX, handles parallelization and/or mesh refinement. Furthermore, |
| 7 | +this decomposition makes load balancing possible: each MPI rank typically computes |
| 8 | +a few grids, and a rank with a lot of work can transfer one or several **grids** |
| 9 | +to their neighbors. |
| 10 | + |
| 11 | +A user |
| 12 | +does not specify this decomposition explicitly. Instead, the user gives hints to |
| 13 | +the code, and the actual decomposition is determined at runtime, depending on |
| 14 | +the parallelization. The main user-defined parameters are |
| 15 | +``amr.max_grid_size`` and ``amr.blocking_factor``. |
| 16 | + |
| 17 | +AMReX ``max_grid_size`` and ``blocking_factor`` |
| 18 | +----------------------------------------------- |
| 19 | + |
| 20 | +* ``amr.max_grid_size`` is the maximum number of points per **grid** along each |
| 21 | + direction (default ``amr.max_grid_size=32`` in 3D). |
| 22 | + |
| 23 | +* ``amr.blocking_factor``: The size of each **grid** must be divisible by the |
| 24 | + `blocking_factor` along all dimensions (default ``amr.blocking_factor=8``). |
| 25 | + Note that the ``max_grid_size`` also has to be divisible by ``blocking_factor``. |
| 26 | + |
| 27 | +These parameters can have a dramatic impact on the code performance. Each |
| 28 | +**grid** in the decomposition is surrounded by guard cells, thus increasing the |
| 29 | +amount of data, computation and communication. Hence having a too small |
| 30 | +``max_grid_size``, may ruin the code performance. |
| 31 | + |
| 32 | +On the other hand, a too-large ``max_grid_size`` is likely to result in a single |
| 33 | +grid per MPI rank, thus preventing load balancing. By setting these two |
| 34 | +parameters, the user wants to give some flexibility to the code while avoiding |
| 35 | +pathological behaviors. |
| 36 | + |
| 37 | +For more information on this decomposition, see the |
| 38 | +`Gridding and Load Balancing <https://amrex-codes.github.io/amrex/docs_html/ManagingGridHierarchy_Chapter.html>`__ |
| 39 | +page on AMReX documentation. |
| 40 | + |
| 41 | +For specific information on the dynamic load balancer used in WarpX, visit the |
| 42 | +`Load Balancing <https://amrex-codes.github.io/amrex/docs_html/LoadBalancing.html>`__ |
| 43 | +page on the AMReX documentation. |
| 44 | + |
| 45 | +The best values for these parameters strongly depends on a number of parameters, |
| 46 | +among which numerical parameters: |
| 47 | + |
| 48 | +* Algorithms used (Maxwell/spectral field solver, filters, order of the |
| 49 | + particle shape factor) |
| 50 | + |
| 51 | +* Number of guard cells (that depends on the particle shape factor and |
| 52 | + the type and order of the Maxwell solver, the filters used, `etc.`) |
| 53 | + |
| 54 | +* Number of particles per cell, and the number of species |
| 55 | + |
| 56 | +and MPI decomposition and computer architecture used for the run: |
| 57 | + |
| 58 | +* GPU or CPU |
| 59 | + |
| 60 | +* Number of OpenMP threads |
| 61 | + |
| 62 | +* Amount of high-bandwidth memory. |
| 63 | + |
| 64 | +Below is a list of experience-based parameters |
| 65 | +that were observed to give good performance on given supercomputers. |
| 66 | + |
| 67 | +Rule of thumb for 3D runs on NERSC Cori KNL |
| 68 | +------------------------------------------- |
| 69 | + |
| 70 | +For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell |
| 71 | +solver on Cori KNL for a well load-balanced problem (in our case laser |
| 72 | +wakefield acceleration simulation in a boosted frame in the quasi-linear |
| 73 | +regime), the following set of parameters provided good performance: |
| 74 | + |
| 75 | +* ``amr.max_grid_size=64`` and ``amr.blocking_factor=64`` so that the size of |
| 76 | + each grid is fixed to ``64**3`` (we are not using load-balancing here). |
| 77 | + |
| 78 | +* **8 MPI ranks per KNL node**, with ``OMP_NUM_THREADS=8`` (that is 64 threads |
| 79 | + per KNL node, i.e. 1 thread per physical core, and 4 cores left to the |
| 80 | + system). |
| 81 | + |
| 82 | +* **2 grids per MPI**, *i.e.*, 16 grids per KNL node. |
0 commit comments