-
Notifications
You must be signed in to change notification settings - Fork 275
Description
As a user I would like to perform computations faster. Kratos uses OpenMP to use thread to speed up computations.
A number of threads to be used is provided by the user in ProjectParameters.json file.
By default, Krtatos is compiled at Deltares without OpenMP. To activate it, cmake shall include -DKRATOS_SHARED_MEMORY_PARALLELIZATION=OpenMP
OpenMP instructions are mainly used on kratos non-applications level:
-
ResidualBasedBlockBuilderAndSolverclass has a number of instruction
# pragma omp for schedule(guided, 512) nowait
for for-loop over elements.
512 is the minimum size of a chunk; therefore, Kratos is tuned to use threads for large scale calculations with thousands elements.
For small scale calculations, this value shall be decreased. It is possible to do on a compiler level, for example, by introducing a flag like KRATOS_LARGE_SCALE or KRATOS_LOW_SCALE to provide an appropriate value for the minimum chunk size.
Removing 512 allowed to get the following speed up for lysmer_boundary_stiff_column2d_quad test that is a part of absorbing_boundary.py This test uses a mesh of 50 elements only.
threads | 1 | 2 | 4 | 6 | 8 | 10
speed up | 1 | 1.7 | 2.0 | 1.9 | 2.4 | 2.3 -
block_for_eachis another place where OpenMP instructions are used. It is mainly used to make parallel loops over nodes and it uses a number of threads provided byint ParallelUtilities::GetNumProcs()The default return value is 1. Using-DCMAKE_CXX_FLAGS="-DKRATOS_SMP_OPENMP"during compilation makes the function to return a maximum number of threads of a computer. git shows names of Riccardo Rossi and Phillip Bucher as the last people touching this function.
block_for_eachis also used in GeoMechanicsApplication where we can change it on, for example,# pragma omp for schedule(guided) nowaitinstruction.
There are two things to pay an attention. a) OpenMP uses an index for-loop with integral type of an index; therefore, b) adding the following inPointerVectorSetclass is needed
reference at(size_type i)
{
return *(mData[i]);
}
const_reference at(size_type i) const
{
return *(mData[i]);
}
by now block_for_each has been replaced with an index for-loop only in GeneralizedNewmarkScheme::UpdateVariablesDerivatives and this showed almost no effect on the test speed up.
Future Actions
- Add KRATOS_LARGE_SCALE flag for compilations
- Contact Riccardo Rossi and Phillip Bucher to discuss about OpenMP implementation in Kratos
- Verify a use of UDSM/UMAT models for a multithread calculation and fix if is needed.