Skip to content

Commit dc9e497

Browse files
authored
Merge pull request #295 from mwarusz/omega/kokkos-wrappers
Add tests and developer docs for Kokkos wrappers
2 parents d112926 + 1cd4703 commit dc9e497

File tree

3 files changed

+592
-68
lines changed

3 files changed

+592
-68
lines changed
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
(omega-dev-parallel-loops)=
2+
3+
# Parallel loops
4+
5+
Omega adopts the Kokkos programming model to express on-node parallelism. To provide
6+
simplified syntax for the most frequently used computational patterns, Omega provides
7+
wrappers funtions that internally handle creating and setting-up Kokkos policies.
8+
9+
## Flat multi-dimensional parallelism
10+
11+
### parallelFor
12+
13+
To perform parallel iteration over a multi-dimensional index range Omega provides the
14+
`parallelFor` wrapper. For example, the following code shows how to set every element of
15+
a 3D array in parallel.
16+
```c++
17+
Array3DReal A("A", N1, N2, N3);
18+
parallelFor(
19+
{N1, N2, N3},
20+
KOKKOS_LAMBDA(int J1, int J2, int J3) {
21+
A(J1, J2, J3) = J1 + J2 + J3;
22+
});
23+
```
24+
Ranges with up to five dimensions are supported.
25+
Optionally, a label can be provided as the first argument of `parallelFor`.
26+
```c++
27+
parallelFor("Set A",
28+
{N1, N2, N3},
29+
KOKKOS_LAMBDA(int J1, int J2, int J3) {
30+
A(J1, J2, J3) = J1 + J2 + J3;
31+
});
32+
```
33+
Adding labels can result in more informative messages when
34+
Kokkos debug variables are defined.
35+
36+
### parallelReduce
37+
38+
To perform parallel reductions over a multi-dimensional index range the
39+
`parallelReduce` wrapper is available. The following code sums
40+
every element of `A`.
41+
```c++
42+
Real SumA;
43+
parallelReduce(
44+
{N1, N2, N3},
45+
KOKKOS_LAMBDA(int J1, int J2, int J3, Real &Accum) {
46+
Accum += A(J1, J2, J3);
47+
},
48+
SumA);
49+
```
50+
Note the presence of an accumulator variable `Accum` in the `KOKKOS_LAMBDA` arguments.
51+
You can use `parallelReduce` to perform other types of reductions.
52+
As an example, the following snippet finds the maximum of `A`.
53+
```c++
54+
Real MaxA;
55+
parallelReduce(
56+
{N1, N2, N3},
57+
KOKKOS_LAMBDA(int J1, int J2, int J3, Real &Accum) {
58+
Accum = Kokkos::max(Accum, A(J1, J2, J3));
59+
},
60+
Kokkos::Max<Real>(MaxA));
61+
```
62+
To perform reductions that are not sums, in addition to modifying the lambda body,
63+
the final reduction variable needs to be cast to the appropriate type. In the above example,
64+
`MaxA` is cast to `Kokkos::Max<Real>` to perform a max reduction.
65+
The `parallelReduce` wrapper supports performing multiple reduction at the same time.
66+
You can compute `SumA` and `MaxA` in one pass over the data:
67+
```c++
68+
parallelReduce(
69+
{N1, N2, N3},
70+
KOKKOS_LAMBDA(int J1, int J2, int J3, Real &AccumSum, Real &AccumMax) {
71+
AccumSum += A(J1, J2, J3);
72+
AccumMax = Kokkos::max(AccumMax, A(J1, J2, J3));
73+
},
74+
SumA, Kokkos::Max<Real>(MaxA));
75+
```
76+
Similarly to `parallelFor`, `parallelReduce` supports labels and up to five dimensions.

components/omega/doc/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ devGuide/Linting
6262
devGuide/Docs
6363
devGuide/BuildDocs
6464
devGuide/DataTypes
65+
devGuide/ParallelLoops
6566
devGuide/MachEnv
6667
devGuide/Config
6768
devGuide/Driver

0 commit comments

Comments
 (0)