-
Notifications
You must be signed in to change notification settings - Fork 38
Description
With planned large-scale studies, we need to evaluate the performance and scalability of CADET-Core’s parallelization. The goal is to identify bottlenecks, memory issues, and opportunities for improvement.
Parallelization Overview
CADET uses two levels of parallelization:
- System-level parallelization: Parallelization per unit in a system of unit operations
- Unit-operation-level parallelization:
- Spatial scheme parallelization (FV only; DG is currently serial)
- Parallelization of model components (particles, bulk, etc.)
- Jacobian factorization parallelization
- Supported: GRM, LRMP + FV
- Not supported: Eigen/SparseLU-based units
Parallelization is controlled at compile-time using
-DENABLE_THREADING=ON (default: OFF).
At run-time, NTHREADS determines the number of threads (see CADET docs).
Known Issue
A previously observed issue:
The threaded build running with NTHREADS=1 was slower than the non-threaded build.
This is the main reason we currently distribute CADET-Core without parallelization enabled.
I think this was observed by @schmoelder. Do we have an MRE for this behavior?
Specific goals
- Measure performance for
NTHREADS = 1, 2, 4, 8, 16on representative cases - Identify bottlenecks
- Investigate the slowdown in 1-thread mode
Reproduction Inputs
@jbreue16 will configure some test cases for the performance measure
@schmoelder pls look for an example of the single-threaded slow-down