Testcases for Strong Scaling Analysis #5931

lnghrdntcr · 2025-06-04T16:04:02Z

lnghrdntcr
Jun 4, 2025

Hi all!
I'm just starting benchmarking WarpX and so far loving the developer experience (don't remember ever taking just 5' to get to a compiled binary - gotta say: other PIC sims need to take a page out of this book 😆).

Wanted to perform some coarse-grained strong scaling analysis, keeping it 1 MPI and scaling the number of OMP processes. However I'm finding that most of the testcases provided in the Examples/Tests and Examples/Physics_applications do not scale very much, varying the number of threads. Documentation notes that Uniform Plasma is the testcase usually employed for these kinds of experiments, but I'm having a tough time achieving a good scaling, even bumping up number amr.n_cells.

Here's a snippet of the modification I applied:

amr.n_cell =  256 256 256 (up from 64 32 32)
amr.max_grid_size = 64    (up from 32) 
amr.blocking_factor = 64  (up from 16)

Also, I'm using Avg. [time] per step as the metric to assess the scaling, do you think it's correct?

Misc info: Testing on the Intel Max 9460, scaling OMP threads on a single socket, WarpX compiled with clang-19, and with everything more or less default, except precision set to single instead of double.

Thanks a bunch!

ax3l · 2025-07-01T22:57:02Z

ax3l
Jul 1, 2025
Maintainer

I'm just starting benchmarking WarpX and so far loving the developer experience (don't remember ever taking just 5' to get to a compiled binary - gotta say: other PIC sims need to take a page out of this book 😆).

Wondering if praise or sarcasm :D

But we try our best: #5984

Also, CCache works well with WarpX :)

0 replies

ax3l · 2025-07-01T23:29:22Z

ax3l
Jul 1, 2025
Maintainer

Hi @lnghrdntcr,

Thank you for the question!

Wanted to perform some coarse-grained strong scaling analysis, keeping it 1 MPI and scaling the number of OMP processes. However I'm finding that most of the testcases provided in the Examples/Tests and Examples/Physics_applications do not scale very much, varying the number of threads.
[...]
Documentation notes that Uniform Plasma is the testcase usually employed for these kinds of experiments
[...]
Misc info: Testing on the Intel Max 9460, scaling OMP threads on a single socket, WarpX compiled with clang-19, and with everything more or less default, except precision set to single instead of double.

That's a good test and a nice CPU!

So yes, we usually do a mix of MPI + OpenMP for modern CPUs, usually because OpenMP works best when used on a single socket, as you noted, and even more so: on the same memory bus of the CPU. This chips seems to utilizes a multi-chip module (MCM) approach based on a tiled architecture.
The 40 physical cores of the Xeon Max 9460 are distributed evenly across four of these tiles, meaning each tile contains 10 physical CPU cores. Each physical CPU core then supports 2x SMT/hyperthreading, but we usually do not see a benefit from that.

Did not find much here about the layout of that chip [and here] so this info above is what I searched & asked Google Gemini about your chip, to be verified, e.g., in an Intel whitepaper.

Assuming what I write above is correct, I would do up to 10 OpenMP threads per process, and then scale further with up to 4 MPI processes on that single chip. Make sure to pin the OpenMP threads to the cores closest to the MPI process that governs them and spread the MPI processes out.
To test the impact of SMT/Hyperthreading, go up to 20 OpenMP threads per process.

Does that help?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testcases for Strong Scaling Analysis #5931

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Testcases for Strong Scaling Analysis #5931

Uh oh!

lnghrdntcr Jun 4, 2025

Replies: 2 comments

Uh oh!

ax3l Jul 1, 2025 Maintainer

Uh oh!

Uh oh!

ax3l Jul 1, 2025 Maintainer

lnghrdntcr
Jun 4, 2025

ax3l
Jul 1, 2025
Maintainer

ax3l
Jul 1, 2025
Maintainer