Skip to content

Clang UB sanitizer CI test: increase coverage #5597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lucafedeli88
Copy link
Member

@lucafedeli88 lucafedeli88 commented Jan 23, 2025

In CI test based on clang UB sanitizers, most of the time (~ 1h30) is spent in compiling the code, while just a few minutes are spent actually running some simulations. This means that we can increase the coverage of the test by adding some more simulations to the tests with a negligible increase of the total runtime.
This PR does just that: now most of the cases in Examples/Physics_applications are tested with the UB sanitizer.

Note that some cases cannot run in double precision (see below). For this reason, the PR also splits the UB sanitizer test into single precision and double precision (in double precision only the cases that cannot run in single precision are tested).

Updates:

1) Issue found while running inputs_test_3d_beam_beam_collision --> We need to run this case in double precision

The tool has found an issue while running mpirun -n 2 ./build/bin/warpx.3d Examples/Physics_applications/beam_beam_collision/inputs_test_3d_beam_beam_collision :

STEP 1 starts ...
/home/runner/work/WarpX/WarpX/build/_deps/fetchedpicsar-src/multi_physics/QED/include/picsar_qed/containers/picsar_tables.hpp:310:17: runtime error: -nan is outside the range of representable values of type 'int'
/home/runner/work/WarpX/WarpX/build/_deps/fetchedpicsar-src/multi_physics/QED/include/picsar_qed/containers/picsar_tables.hpp:310:17: runtime error: -nan is outside the range of representable values of type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/runner/work/WarpX/WarpX/build/_deps/fetchedpicsar-src/multi_physics/QED/include/picsar_qed/containers/picsar_tables.hpp:310:17 in 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/runner/work/WarpX/WarpX/build/_deps/fetchedpicsar-src/multi_physics/QED/include/picsar_qed/containers/picsar_tables.hpp:310:17 in 

I've temporarily commented out this case while I investigate the cause. For the moment, I am not able to reproduce the issue on my local machine. The issue is using single precision for this specific test case! Specifically, momenta end up being NaN and the sanitizer detects the attempt to convert a floating point NaN into an integer.
I have added an ABLASTR_ALWAYS_ASSERT_WITH_MESSAGE in PoissonSolver.H so that a more readable error message is provided to the user.

2) Issue found while running inputs_test_2d_background_mcc --> We need to run this case in double precision

MLMG does not converge in single precision for this simulation case. We need to run it in double precision.

3) Issue found while running free_electron_laser --> We need to run this case in double precision

I have observed this issue:

 STEP 444 starts ...
0::1::Assertion `m_current_z_lab[i_buffer] >= m_buffer_domain_lab[i_buffer].lo(m_moving_window_dir) and m_current_z_lab[i_buffer] <= m_buffer_domain_lab[i_buffer].hi(m_moving_window_dir)' failed, file "/home/runner/work/WarpX/WarpX/Source/Diagnostics/BTDiagnostics.cpp", line 870, Msg: 
 ### ERROR   : z-slice in lab-frame (0.299976) is outside the buffer domain
#            physical extent (0.299976 to 0.299988).
 !!!

which seems to be related to using single precision instead of double precision. Therefore, we need to run this case in double precision.

4) Issue found while running inputs_test_2d_laser_ion_acc --> bugfix in WarpX

inputs_test_2d_laser_ion_acc case has the following issue in single precision:

--- INFO    : Writing openPMD file diags/openPMDbw000000
terminate called after throwing an instance of 'std::runtime_error'
  what():  Datatypes of chunk data (FLOAT) and record component (DOUBLE) do not match.
SIGABRT
See Backtrace.0.0 file for details

This comes from the fact that the datatype of this dataset in ParticleHistogram2D.cpp is hard-coded as double:

    auto dataset = io::Dataset(
            io::determineDatatype<double>(),
            {static_cast<unsigned long>(m_bin_num_ord), static_cast<unsigned long>(m_bin_num_abs)});

this PR modifies these lines as follows:

    auto dataset = io::Dataset(
            io::determineDatatype<amrex::Real>(),
            {static_cast<unsigned long>(m_bin_num_ord), static_cast<unsigned long>(m_bin_num_abs)});

5) Perform all the simulations in double precision

See #5936

6) Use ctest instead of running the tests directly

7) In IntervalsParser prevent the use of an uninitialized object

This was discovered by adding a new test case.
temp_slice.getStart() > m_slices[i_slice].getStart() && i_slice < static_cast<int>(m_slices.size()) must be reversed: i_slice < static_cast<int>(m_slices.size()) && temp_slice.getStart() > m_slices[i_slice].getStart() otherwise we call a method of a non-existent object.

@lucafedeli88 lucafedeli88 added the component: tests Tests and CI label Jan 23, 2025
@lucafedeli88 lucafedeli88 changed the title Clang UB sanitizer CI test: increase coverage [WIP] Clang UB sanitizer CI test: increase coverage Jan 23, 2025
@lucafedeli88 lucafedeli88 changed the title [WIP] Clang UB sanitizer CI test: increase coverage Clang UB sanitizer CI test: increase coverage Jan 24, 2025
@lucafedeli88 lucafedeli88 requested review from EZoni and ax3l January 24, 2025 09:32
@lucafedeli88
Copy link
Member Author

ping, @EZoni !

@lucafedeli88 lucafedeli88 requested a review from dpgrote April 8, 2025 13:52
Copy link
Member

@dpgrote dpgrote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks!

I will comment that NaNs always make me nervous that there is something bad lurking in the code. Do you know why they appear in that case with single precision?

@lucafedeli88
Copy link
Member Author

This looks good, thanks!

I will comment that NaNs always make me nervous that there is something bad lurking in the code. Do you know why they appear in that case with single precision?

Thanks, @dpgrote . The NaN comes from this line in PoissonSolver.H. beta_solver[2] is 1.00059783 here due to numerical errors, so the square root is negative. We could indeed try to fix that.

geom[lev].CellSize(2)/std::sqrt(1._rt-beta_solver[2]*beta_solver[2]))};

@lucafedeli88
Copy link
Member Author

This looks good, thanks!
I will comment that NaNs always make me nervous that there is something bad lurking in the code. Do you know why they appear in that case with single precision?

Thanks, @dpgrote . The NaN comes from this line in PoissonSolver.H. beta_solver[2] is 1.00059783 here due to numerical errors, so the square root is negative. We could indeed try to fix that.

geom[lev].CellSize(2)/std::sqrt(1._rt-beta_solver[2]*beta_solver[2]))};

Actually, in this case, we have a 250 GeV electron beam. So we can't represent beta reliably in single precision for such high energies. I think that the best we can do is to add an assert to ensure that beta < 1 .

@lucafedeli88
Copy link
Member Author

@dpgrote , I've added this check inside PoissonSolver.H :

    ABLASTR_ALWAYS_ASSERT_WITH_MESSAGE (
        std::all_of(beta_solver.begin(), beta_solver.end(),
            [](const auto b){return (b<1.0_rt);}),
        "Components of beta_solver must be < 1.");

so that at least the user has a more readable error message.

@lucafedeli88 lucafedeli88 enabled auto-merge (squash) April 10, 2025 12:16
auto-merge was automatically disabled April 29, 2025 14:47

Pull request was closed

@lucafedeli88 lucafedeli88 reopened this Apr 29, 2025
@lucafedeli88 lucafedeli88 requested a review from EZoni May 13, 2025 15:23
@lucafedeli88 lucafedeli88 changed the title Clang UB sanitizer CI test: increase coverage [WIPClang UB sanitizer CI test: increase coverage Jun 11, 2025
@lucafedeli88 lucafedeli88 changed the title [WIPClang UB sanitizer CI test: increase coverage [WIP] Clang UB sanitizer CI test: increase coverage Jun 11, 2025
@lucafedeli88 lucafedeli88 changed the title [WIP] Clang UB sanitizer CI test: increase coverage Clang UB sanitizer CI test: increase coverage Jun 27, 2025
EZoni
EZoni previously approved these changes Jun 27, 2025
Copy link
Member

@EZoni EZoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Member

@EZoni EZoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one last comment, in case it helps make this more compact and robust.

@EZoni EZoni dismissed their stale review June 27, 2025 18:55

I left one more comment to address.

@EZoni EZoni self-requested a review June 27, 2025 19:08
@EZoni EZoni merged commit 468e3f3 into BLAST-WarpX:development Jun 28, 2025
50 checks passed
atmyers pushed a commit to atmyers/WarpX that referenced this pull request Jul 3, 2025
In CI test based on clang UB sanitizers, most of the time (~ 1h30) is spent in compiling the code, while just a few minutes are spent actually running some simulations. This means that we can increase the coverage of the test by adding some more simulations to the tests with a negligible increase of the total runtime. This PR does just that: now all tests in `Examples/Physics_applications` are tested with the UB sanitizer.

Co-authored-by: Axel Huebl <[email protected]>
Co-authored-by: Edoardo Zoni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: tests Tests and CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants