[GeoMechanicsApplication] Investigate OpenMP implementation in Kratos

As a user I would like to perform computations faster. Kratos uses OpenMP to use thread to speed up computations. 
A number of threads to be used is provided by the user in ProjectParameters.json file.

By default, Krtatos is compiled at Deltares without OpenMP. To activate it, cmake shall include `-DKRATOS_SHARED_MEMORY_PARALLELIZATION=OpenMP` 

OpenMP instructions are mainly used on kratos non-applications level:
1. `ResidualBasedBlockBuilderAndSolver `class has a number of instruction
`# pragma omp for  schedule(guided, 512) nowait`
for for-loop over elements. 
512 is the minimum size of a chunk; therefore, Kratos is tuned to use threads for large scale calculations with thousands elements. 
For small scale calculations, this value shall be decreased. It is possible to do on a compiler level, for example, by introducing a flag like KRATOS_LARGE_SCALE or KRATOS_LOW_SCALE to provide an appropriate value for the minimum chunk size. 
Removing 512 allowed to get the following speed up for lysmer_boundary_stiff_column2d_quad test that is a part of absorbing_boundary.py This test uses a mesh of 50 elements only. 
threads    |  1 |    2 |   4 |    6 |    8 | 10
speed up |  1 | 1.7 | 2.0 | 1.9 | 2.4 | 2.3

2. `block_for_each` is another place where OpenMP instructions are used. It is mainly used to make parallel loops over nodes and it uses a number of threads provided by` int ParallelUtilities::GetNumProcs()` The default return value is 1. Using `-DCMAKE_CXX_FLAGS="-DKRATOS_SMP_OPENMP"` during compilation makes the function to return a maximum number of threads of a computer. git shows names of Riccardo Rossi and Phillip Bucher as the last people touching this function. 
`block_for_each` is also used in GeoMechanicsApplication where we can change it on, for example, `# pragma omp for  schedule(guided) nowait` instruction.
There are two things to pay an attention. a) OpenMP uses an **index** for-loop with integral type of an index; therefore, b) adding the following in `PointerVectorSet `class is needed
```
    reference at(size_type i)
    {
        return *(mData[i]);
    }
    const_reference at(size_type i) const
    {
        return *(mData[i]);
    }
```
by now `block_for_each` has been replaced with an index for-loop only in `GeneralizedNewmarkScheme::UpdateVariablesDerivatives` and this showed almost no effect on the test speed up.

**Future Actions**
 1. Add KRATOS_LARGE_SCALE flag for compilations
 2. Contact Riccardo Rossi and Phillip Bucher to discuss about OpenMP implementation in Kratos
 3. Verify a use of UDSM/UMAT models for a multithread calculation and fix if is needed.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GeoMechanicsApplication] Investigate OpenMP implementation in Kratos #13889

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[GeoMechanicsApplication] Investigate OpenMP implementation in Kratos #13889

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions