Skip to content

[GeoMechanicsApplication] Investigate OpenMP implementation in Kratos #13889

@markelov208

Description

@markelov208

As a user I would like to perform computations faster. Kratos uses OpenMP to use thread to speed up computations.
A number of threads to be used is provided by the user in ProjectParameters.json file.

By default, Krtatos is compiled at Deltares without OpenMP. To activate it, cmake shall include -DKRATOS_SHARED_MEMORY_PARALLELIZATION=OpenMP

OpenMP instructions are mainly used on kratos non-applications level:

  1. ResidualBasedBlockBuilderAndSolver class has a number of instruction
    # pragma omp for schedule(guided, 512) nowait
    for for-loop over elements.
    512 is the minimum size of a chunk; therefore, Kratos is tuned to use threads for large scale calculations with thousands elements.
    For small scale calculations, this value shall be decreased. It is possible to do on a compiler level, for example, by introducing a flag like KRATOS_LARGE_SCALE or KRATOS_LOW_SCALE to provide an appropriate value for the minimum chunk size.
    Removing 512 allowed to get the following speed up for lysmer_boundary_stiff_column2d_quad test that is a part of absorbing_boundary.py This test uses a mesh of 50 elements only.
    threads | 1 | 2 | 4 | 6 | 8 | 10
    speed up | 1 | 1.7 | 2.0 | 1.9 | 2.4 | 2.3

  2. block_for_each is another place where OpenMP instructions are used. It is mainly used to make parallel loops over nodes and it uses a number of threads provided by int ParallelUtilities::GetNumProcs() The default return value is 1. Using -DCMAKE_CXX_FLAGS="-DKRATOS_SMP_OPENMP" during compilation makes the function to return a maximum number of threads of a computer. git shows names of Riccardo Rossi and Phillip Bucher as the last people touching this function.
    block_for_each is also used in GeoMechanicsApplication where we can change it on, for example, # pragma omp for schedule(guided) nowait instruction.
    There are two things to pay an attention. a) OpenMP uses an index for-loop with integral type of an index; therefore, b) adding the following in PointerVectorSet class is needed

    reference at(size_type i)
    {
        return *(mData[i]);
    }
    const_reference at(size_type i) const
    {
        return *(mData[i]);
    }

by now block_for_each has been replaced with an index for-loop only in GeneralizedNewmarkScheme::UpdateVariablesDerivatives and this showed almost no effect on the test speed up.

Future Actions

  1. Add KRATOS_LARGE_SCALE flag for compilations
  2. Contact Riccardo Rossi and Phillip Bucher to discuss about OpenMP implementation in Kratos
  3. Verify a use of UDSM/UMAT models for a multithread calculation and fix if is needed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions