Skip to content

Commit eef17c1

Browse files
authored
Merge pull request #32479 from NamjaeChoi/kokkos_thread
Kokkos add surface normal, code cleanup using variadic template
2 parents e7d03d6 + 304af65 commit eef17c1

35 files changed

Lines changed: 1204 additions & 1353 deletions
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# KokkosDirectionalNeumannBC
2+
3+
!if! function=hasCapability('kokkos')
4+
5+
This is the Kokkos version of [DirectionalNeumannBC](DirectionalNeumannBC.md). See the original document for details.
6+
7+
## Example Input Syntax
8+
9+
!listing test/tests/kokkos/bcs/directional_neumann/kokkos_2d_directional_neumann_bc_test.i start=[top] end=[] include-end=true
10+
11+
!syntax parameters /BCs/KokkosDirectionalNeumannBC
12+
13+
!syntax inputs /BCs/KokkosDirectionalNeumannBC
14+
15+
!syntax children /BCs/KokkosDirectionalNeumannBC
16+
17+
!if-end!
18+
19+
!else
20+
!include kokkos/kokkos_warning.md

framework/doc/content/syntax/Kokkos/index.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,22 +20,22 @@ Here we provide instructions on some programming practices to write an efficient
2020

2121
### Separate Memory Space
2222

23-
Except for a very few rare cases that provide physically unified memory space for CPU and GPU (such as AMD MI300A), most GPUs have separate memory spaces from their conuterpart CPUs.
24-
Therefore, the you need to take a special care to properly identify which data are accessible and not accessible on either CPU or GPU and whether the data on CPU and GPU are properly synchronized.
23+
Except for a very few rare cases that provide physically unified memory space for CPU and GPU (such as AMD MI300A), most GPUs have separate memory spaces from their counterpart CPUs.
24+
Therefore, you need to take a special care to properly identify which data are accessible and not accessible on either CPU or GPU and whether the data on CPU and GPU are properly synchronized.
2525
Standard containers such as `std::vector`, `std::set`, `std::map` and others are not usable on GPU, and managed pointers are also inaccessible on GPU.
2626
Basically, it is safe to assume that any dynamically-allocated data on CPU cannot be accessed on GPU.
2727
Therefore, we provide alternative data containers to be used on GPU: `Moose::Kokkos::Array`, `Moose::Kokkos::JaggedArray`, and `Moose::Kokkos::Map`.
2828

2929
`Moose::Kokkos::Array` is a template class designed to hold arbitrary type of data.
3030
It receives up to four template arguments: data type, dimension, index type, and layout type.
31-
It supports multi-dimensional indexing, and up to five-dimensional arrays are supported.
31+
It supports multi-dimensional indexing.
3232
The dimension can either be specified through the second template argument with the default being one-dimension or using type aliases: for instance, a three-dimensional array of type `double` can be declared either by `Array<double, 3>` or `Array3D<double>`.
3333
The entries of an array can be accessed with either `operator()` with multi-dimensional indices or `operator[]` with a flattened, dimensionless index, where the flattening follows a layout in which the innermost dimension varies the fastest.
34+
They automatically return either CPU or GPU data depending on where they are being accessed.
3435
The index type template argument is set to 8-byte integer by default to accomodate large arrays.
3536
However, 8-byte integer computation is significantly more expensive than 4-byte integer computation.
3637
If your array size is small enough, consider using 4-byte indices to optimize index calculations.
3738
If having the outermost dimension run the fastest is desired for multi-dimensional arrays, the fourth layout template argument can be optionally set to `Moose::Kokkos::LayoutType::RIGHT` (default is `LEFT`).
38-
They automatically return either CPU or GPU data depending on where they are being accessed.
3939
Arrays can be allocated through the following APIs: `create()`, `createHost()`, and `createDevice()`.
4040
`create()` allocates memories on both CPU and GPU, while `createHost()` or `createDevice()` only allocates memory on either CPU or GPU.
4141
It is important to note that if the creation APIs are called for an initialized array, the original array will be destroyed and a new array will be created.
@@ -61,14 +61,15 @@ vector.aliasHost(petsc_ptr);
6161
vector.copyToDevice();
6262
```
6363
64-
If the data type is not default-constructable, `create()` will only allocate a raw block of uninitialized memory using `malloc()`.
65-
It is your responsibility to loop over the array and perform placement new to properly construct each entry.
64+
If the data type is not default-constructible or if you do not want to initialize array using the default constructor, you can set an optional template argument to `false` when calling `create()` or `createHost()`.
65+
It will only allocate a raw chunk of uninitialized memory using `malloc()` instead of `new`.
66+
Then, it becomes your responsibility to loop over the array and properly construct each entry using placement new.
6667
For example:
6768
6869
```cpp
6970
Array<NotDefaultConstructable> data;
7071
71-
data.create(n);
72+
data.create<false>(n);
7273
7374
for (auto & datum : data)
7475
new (&datum) NotDefaultConstructable(...);
@@ -139,7 +140,6 @@ It is divided into inner and outer arrays.
139140
The outer array is the regular part of a jagged array.
140141
Each entry of the outer array is the inner array, whose size can vary with each other.
141142
As a result, it is defined with up to five template arguments: the data type, inner array dimension size, outer array dimension size, outer array index type (defaults to 8-byte integer; inner arrays always use 4-byte integer), and inner array layout type (defaults to `Moose::Kokkos::LayoutType::LEFT`).
142-
Both inner and outer arrays can be up to three-dimensional.
143143
However, it is not possible to have inner arrays with different dimensions in a single jagged array.
144144

145145
The accessors of a jagged array, `operator()` (dimensional) or `operator[]` (dimensionless), receive the indices for the outer array.

framework/doc/content/syntax/KokkosMaterials/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Same applies to a class with in-class member initialization, which is equivalent
4040
Using a class with dynamic allocations will [incur a significant performance hit](syntax/Kokkos/index.md#kokkos_dynamic_allocation) and will break when it is used for stateful material properties.
4141
4242
Instead, the material properties in Kokkos-MOOSE can be multi-dimensional to partially support the needs for dynamically-sized material properties.
43-
The dimension is provided as the second template argument `dimension`, which has the default value of 0 (scalar) and can be up to 4.
43+
The dimension is provided as the second template argument `dimension`, which has the default value of 0 (scalar).
4444
The size of each dimension is provied as a vector as the function argument `dims`.
4545
When a material property is declared by multiple materials, it should have the same dimension over the entire domain, while the size of each dimension can be different between non-overlapping subdomains.
4646
However, a material property declared by boundary-restricted materials should have identical dimension sizes over the entire domain, even though the materials do not have overlapping boundaries.

framework/include/interfaces/BlockRestrictable.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,7 @@ class BlockRestrictable
277277
* @param tid The thread ID
278278
* @returns The contiguous element ID
279279
*/
280-
KOKKOS_FUNCTION ContiguousElementID kokkosBlockElementID(ThreadID tid) const
280+
KOKKOS_FUNCTION ContiguousElementID kokkosBlockElementID(Moose::Kokkos::ThreadID tid) const
281281
{
282282
return _kokkos_element_ids[tid];
283283
}
@@ -286,7 +286,7 @@ class BlockRestrictable
286286
* @param tid The thread ID
287287
* @returns the contiguous node ID
288288
*/
289-
KOKKOS_FUNCTION ContiguousElementID kokkosBlockNodeID(ThreadID tid) const
289+
KOKKOS_FUNCTION ContiguousElementID kokkosBlockNodeID(Moose::Kokkos::ThreadID tid) const
290290
{
291291
return _kokkos_node_ids[tid];
292292
}
@@ -295,7 +295,7 @@ class BlockRestrictable
295295
* @param tid The thread ID
296296
* @returns The contiguous element ID - side index pair
297297
*/
298-
KOKKOS_FUNCTION auto kokkosBlockElementSideID(ThreadID tid) const
298+
KOKKOS_FUNCTION auto kokkosBlockElementSideID(Moose::Kokkos::ThreadID tid) const
299299
{
300300
return _kokkos_element_side_ids[tid];
301301
}

framework/include/interfaces/BoundaryRestrictable.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -261,7 +261,7 @@ class BoundaryRestrictable
261261
* @param tid The thread ID
262262
* @returns The contiguous node ID
263263
*/
264-
KOKKOS_FUNCTION ContiguousNodeID kokkosBoundaryNodeID(ThreadID tid) const
264+
KOKKOS_FUNCTION ContiguousNodeID kokkosBoundaryNodeID(Moose::Kokkos::ThreadID tid) const
265265
{
266266
return _kokkos_node_ids[tid];
267267
}
@@ -270,7 +270,7 @@ class BoundaryRestrictable
270270
* @param tid The thread ID
271271
* @returns The contiguous element ID - side index pair
272272
*/
273-
KOKKOS_FUNCTION auto kokkosBoundaryElementSideID(ThreadID tid) const
273+
KOKKOS_FUNCTION auto kokkosBoundaryElementSideID(Moose::Kokkos::ThreadID tid) const
274274
{
275275
return _kokkos_element_side_ids[tid];
276276
}

0 commit comments

Comments
 (0)