Testcases for Strong Scaling Analysis #5931
Replies: 2 comments
-
Wondering if praise or sarcasm :D But we try our best: #5984 Also, CCache works well with WarpX :) |
Beta Was this translation helpful? Give feedback.
-
|
Hi @lnghrdntcr, Thank you for the question!
That's a good test and a nice CPU! So yes, we usually do a mix of MPI + OpenMP for modern CPUs, usually because OpenMP works best when used on a single socket, as you noted, and even more so: on the same memory bus of the CPU. This chips seems to utilizes a multi-chip module (MCM) approach based on a tiled architecture. Did not find much here about the layout of that chip [and here] so this info above is what I searched & asked Google Gemini about your chip, to be verified, e.g., in an Intel whitepaper. Assuming what I write above is correct, I would do up to 10 OpenMP threads per process, and then scale further with up to 4 MPI processes on that single chip. Make sure to pin the OpenMP threads to the cores closest to the MPI process that governs them and spread the MPI processes out. Does that help? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all!
I'm just starting benchmarking WarpX and so far loving the developer experience (don't remember ever taking just 5' to get to a compiled binary - gotta say: other PIC sims need to take a page out of this book 😆).
Wanted to perform some coarse-grained strong scaling analysis, keeping it 1 MPI and scaling the number of OMP processes. However I'm finding that most of the testcases provided in the
Examples/TestsandExamples/Physics_applicationsdo not scale very much, varying the number of threads. Documentation notes that Uniform Plasma is the testcase usually employed for these kinds of experiments, but I'm having a tough time achieving a good scaling, even bumping up numberamr.n_cells.Here's a snippet of the modification I applied:
Also, I'm using
Avg. [time] per stepas the metric to assess the scaling, do you think it's correct?Misc info: Testing on the Intel Max 9460, scaling OMP threads on a single socket, WarpX compiled with clang-19, and with everything more or less default, except precision set to single instead of double.
Thanks a bunch!
Beta Was this translation helpful? Give feedback.
All reactions