Skip to content

Conversation

@dpgrote
Copy link
Member

@dpgrote dpgrote commented Feb 12, 2025

The values of the physical constants were from CODATA 2018. These should be updated to the current accepted values as specified in CODATA 2022.

This breaks many CI benchmarks since the checks are at a higher precision than the changes in the constants.

This PR change is also needed since the older version of Ubuntu (which is used for the Azure CI tests) is being deprecated. This PR switches to the newest Ubuntu which includes a newer version of scipy where the constants have also been updated (since version 1.15.0 on January 3).

Note that this PR has fine tuned values of mu0 and alpha so that the relation between the constants is exact. The is means that these values will differ slightly from the values in scipy.constants.

Many of the CI tests were using locally defined constants in the input files and analysis scripts. These were all changed to rely on the built in constants in the parser and scipy.constants.

@EZoni
Copy link
Member

EZoni commented Feb 12, 2025

Thanks, Dave.

If there is a large number of benchmarks that need to be reset, this could be a good opportunity to test again our tool Tools/DevUtils/update_benchmarks_from_azure_output.py and its instructions in our documentation.

In theory, I updated and tested the tool manually in #5372. However, it is not tested automatically yet.

@dpgrote
Copy link
Member Author

dpgrote commented Feb 12, 2025

Thanks, Dave.

If there is a large number of benchmarks that need to be reset, this could be a good opportunity to test again our tool Tools/DevUtils/update_benchmarks_from_azure_output.py and its instructions in our documentation.

In theory, I updated and tested the tool manually in #5372. However, it is not tested automatically yet.

Thanks @EZoni ! It worked and was easy to do. BTW, to download the raw log file, I copied the URL from he location bar and pasted it into the curl command, "curl https://dev.azure.com/ECP-WarpX/... > raw_log", making it easy to download it.

Note that almost all of the changes in the benchmarks are small as expected, ~1.e-9 or smaller. One exception is the test_3d_beam_beam_collision test with errors of order 10% are seen, presumably because it runs a long time allowing the differences to grow.

@EZoni
Copy link
Member

EZoni commented Feb 13, 2025

Note that almost all of the changes in the benchmarks are small as expected, ~1.e-9 or smaller. One exception is the test_3d_beam_beam_collision test with errors of order 10% are seen, presumably because it runs a long time allowing the differences to grow.

I agree. I think that test has relatively large tolerances anyways, if I remember correctly. @aeriforme, what do you think?

@EZoni
Copy link
Member

EZoni commented Feb 13, 2025

Thanks @EZoni ! It worked and was easy to do. BTW, to download the raw log file, I copied the URL from he location bar and pasted it into the curl command, "curl https://dev.azure.com/ECP-WarpX/... > raw_log", making it easy to download it.

Thanks for pointing this out! I added this hint to our documentation in #5663.

@ax3l ax3l added the component: core Core WarpX functionality label Feb 19, 2025
@ax3l ax3l mentioned this pull request Feb 19, 2025
3 tasks
Copy link
Member

@lucafedeli88 lucafedeli88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dpgrote ! I think that updating constants to CODATA 2022 is a good idea.

I've reviewed the logs of the tests that have failed.

I agree that issues with test_3d_beam_beam_collision.checksum (and also with test_2d_collision_xz_picmi.checksum ) are likely due to the fact that these tests are relatively long.

We need to investigate a bit better the cases test_1d_ohm_solver_ion_beam_picmi, test_1d_ohm_solver_ion_beam_picmi, and test_3d_qed_schwinger_2 show non-negligible discrepancies (I will have a look at the QED-related one).

We need to increase the tolerance of several analysis scrips, since they seem to be a bit too strict :
test_2d_theta_implicit_jfnk_vandb, test_2d_theta_implicit_jfnk_vandb_filtered, test_2d_theta_implicit_jfnk_vandb_picmi, test_2d_theta_implicit_strang_psatd, test_2d_pec_field_insulator_implicit_restart,test_3d_particle_boundaries, test_3d_load_external_field_grid_picmi, test_3d_load_external_field_grid_picmi,test_3d_particle_fields_diags , test_3d_particle_fields_diags, test_3d_reduced_diags.

@lucafedeli88
Copy link
Member

Thanks @dpgrote ! I think that updating constants to CODATA 2022 is a good idea.

I've reviewed the logs of the tests that have failed.

I agree that issues with test_3d_beam_beam_collision.checksum (and also with test_2d_collision_xz_picmi.checksum ) are likely due to the fact that these tests are relatively long.

We need to investigate a bit better the cases test_1d_ohm_solver_ion_beam_picmi, test_1d_ohm_solver_ion_beam_picmi, and test_3d_qed_schwinger_2 show non-negligible discrepancies (I will have a look at the QED-related one).

We need to increase the tolerance of several analysis scrips, since they seem to be a bit too strict : test_2d_theta_implicit_jfnk_vandb, test_2d_theta_implicit_jfnk_vandb_filtered, test_2d_theta_implicit_jfnk_vandb_picmi, test_2d_theta_implicit_strang_psatd, test_2d_pec_field_insulator_implicit_restart,test_3d_particle_boundaries, test_3d_load_external_field_grid_picmi, test_3d_load_external_field_grid_picmi,test_3d_particle_fields_diags , test_3d_particle_fields_diags, test_3d_reduced_diags.

I can provide a possible explanation for the QED-related test. Looking at the analysis script I found this comment:

elif test_number == "2":
    # Second Schwinger test with stronger EM field. Many pairs are created and a Gaussian
    # distribution is used to get the weights of the particles. This is the most sensitive test
    # because the relative std is extremely low.
    Ex_test = 1.0e18
    Bx_test = 1679288857.0516706
    By_test = 525665014.1557486
    Bz_test = 1836353079.9561853

Note that the analysis script uses these constants (CODATA 2014, I think):

c = 299792458.
m_e = 9.10938356e-31
e = 1.6021766208e-19
hbar = 1.054571800e-34

while PICSAR-QED uses (CODATA 2018):

template<typename RealType = double>
constexpr auto light_speed = RealType(299792458.);
    
template<typename RealType = double>
constexpr auto electron_mass = RealType(9.1093837015e-31);

template<typename RealType = double>
constexpr auto elementary_charge = RealType(1.602176634e-19);
    
template<typename RealType = double>
constexpr auto reduced_plank = RealType(1.054571817e-34);

@dpgrote
Copy link
Member Author

dpgrote commented Feb 20, 2025

@lucafedeli88 Thanks for looking over this PR. I think a number of the issues are related to the use of scipy.constants in the analysis scripts, since the constants are inconsistent, i.e. still the CODATA2018 values. Until this issue is resolved, I don't think this PR should be merged. Unfortunately, there is no easy solution. The simplest is probably to use the most recent version of ubuntu for the CI tests, which uses the most recent scipy with updated constants.

A more robust longer term solution would be to create a new light-weight Python module that has the constants with the values consistent with the ones in C++, and then use this everywhere instead of relying on scipy.constants. This would guarantee that the values are always consistent.

@dpgrote dpgrote changed the title Update constants to use values from CODATA2022 [WIP]Update constants to use values from CODATA2022 Feb 25, 2025
@lucafedeli88
Copy link
Member

@lucafedeli88 Thanks for looking over this PR. I think a number of the issues are related to the use of scipy.constants in the analysis scripts, since the constants are inconsistent, i.e. still the CODATA2018 values. Until this issue is resolved, I don't think this PR should be merged. Unfortunately, there is no easy solution. The simplest is probably to use the most recent version of ubuntu for the CI tests, which uses the most recent scipy with updated constants.

A more robust longer term solution would be to create a new light-weight Python module that has the constants with the values consistent with the ones in C++, and then use this everywhere instead of relying on scipy.constants. This would guarantee that the values are always consistent.

We could maybe discuss this in the upcoming developers' meeting. Using Ubuntu 24.04 in CI tests should be rather straightforward.

@EZoni
Copy link
Member

EZoni commented Mar 3, 2025

We could maybe discuss this in the upcoming developers' meeting. Using Ubuntu 24.04 in CI tests should be rather straightforward.

Yes, we can do that soon. There had been discussions about the "need" to have less strict tolerances for the checksums to be compatible with version upgrades like this one. The work done in #5456 has set up things so that we can do that easily (see point "Add logic to reset tolerances based on environment variables" in the follow-up list of that PR description). This said, Ubuntu LTS version upgrades come once every two years, so I personally think that the tolerance fine tuning is not a real roadblock for this particular update, we could simply upgrade the Ubuntu version and reset the checksums that need to be reset. I will get to this, one way or another, as soon as possible.

@EZoni

This comment was marked as outdated.

@EZoni
Copy link
Member

EZoni commented Mar 6, 2025

Ideally I thought it could be good to have the Ubuntu upgrade run through first without prior checksums changes and record how tests fail, like I did in #5731 (comment), to make it easier to discuss if/when a tolerance upgrade could be appropriate. But let's see if we can still assess this despite the pre-existing checksums changes.

@EZoni
Copy link
Member

EZoni commented Mar 6, 2025

As mentioned in of the last comments in #5731, we can try to merge development to incorporate #5736 in order to address some of the Python errors that we see here. If that does not work, we need to add the workaround I had added in #5731, before the installation of Regression/requirements.txt in .azure-pipelines.yml:

      # (remove system copy of Matplotlib to avoid conflict
      # with version set in the requirements file - see, e.g.,
      # https://github.com/matplotlib/matplotlib/issues/28768)
      sudo apt remove python3-matplotlib

@dpgrote
Copy link
Member Author

dpgrote commented Apr 2, 2025

test_3d_photon_pusher.analysis:
Based on the analysis script, I understand that momentum should be conserved exactly. The relative error that we seem to get here is 1.7217529609352883e-16. This is lower than 2.220446049250313e-16, which is what np.finfo(np.float64).eps returns as machine limit for double precision floating point type. I would consider this to be equivalent to zero in our discrete computations. In fact, I would replace tol_mom = 0.0 with tol_mom = np.finfo(np.float64).eps.

What I find puzzling about this result is that the momentum should not change at all... I think that if it changes, even by something at machine precision, understanding why is still worth of investigation

This test is still failing. I looked into it some and don't understand what is happening. The momentum is never modified but the values somehow change slightly at some point, from 299792458 to 299792458.00000006.

@EZoni
Copy link
Member

EZoni commented Apr 3, 2025

This test is still failing. I looked into it some and don't understand what is happening. The momentum is never modified but the values somehow change slightly at some point, from 299792458 to 299792458.00000006.

@dpgrote @lucafedeli88

Should we relax that tolerance to the machine limit for double precision floating point to finalize this PR and then open a new, follow-up PR where someone familiar with the test can reset the tolerance to strict zero (if it is indeed true that the momentum variable is never touched by any part of the code at any point in the test) and debug further?

@dpgrote
Copy link
Member Author

dpgrote commented Apr 3, 2025

This test is still failing. I looked into it some and don't understand what is happening. The momentum is never modified but the values somehow change slightly at some point, from 299792458 to 299792458.00000006.

@dpgrote @lucafedeli88

Should we relax that tolerance to the machine limit for double precision floating point to finalize this PR and then open a new, follow-up PR where someone familiar with the test can reset the tolerance to strict zero (if it is indeed true that the momentum variable is never touched by any part of the code at any point in the test) and debug further?

To get this finished, I agree that the tolerance should be increased. I'll set it to np.finfo(np.float64).eps. There is something in the code causing that last bit to be jiggled, but I couldn't find it.

Future work as examine why the change is happening and reduce to tolerance back to zero when fixed.

Even though the momentum is not modified, somehow the last bit
is jiggled changing the value. So, instead of a tolerance of zero,
it is set to the smallest difference resolvable for double precision.
@dpgrote
Copy link
Member Author

dpgrote commented Apr 4, 2025

@EZoni There is still some weirdness. The values for the CI test test_3d_ionization_ion_dsmc keep changing. I updated the values after the small adjustment in the constants and after the picsar version was updated but they changed again. Any thoughts?

@EZoni
Copy link
Member

EZoni commented Apr 4, 2025

@EZoni There is still some weirdness. The values for the CI test test_3d_ionization_ion_dsmc keep changing. I updated the values after the small adjustment in the constants and after the picsar version was updated but they changed again. Any thoughts?

@dpgrote

Kind of difficult to know if this was just some kind of Azure runner glitch or not. I think it could be worth trying to reset the checksums again and see if they still fail. @roelof-groenewald might be familiar with that test as well, not sure if he observed anything like this already in the past.

@roelof-groenewald
Copy link
Member

roelof-groenewald commented Apr 4, 2025

@EZoni There is still some weirdness. The values for the CI test test_3d_ionization_ion_dsmc keep changing. I updated the values after the small adjustment in the constants and after the picsar version was updated but they changed again. Any thoughts?

I don't know why the ion impact ionization test would be impacted but not the electron impact ionization one. They run the same DSMC routine, just with different colliding species. So it is odd for only one to have it's checksum values changed. The ion version of that test is pretty new though, so maybe it is a bit more unstable than the others. We can keep an eye on it with future PRs to see if it fails "randomly".

@dpgrote
Copy link
Member Author

dpgrote commented Apr 4, 2025

@EZoni There is still some weirdness. The values for the CI test test_3d_ionization_ion_dsmc keep changing. I updated the values after the small adjustment in the constants and after the picsar version was updated but they changed again. Any thoughts?

@dpgrote

Kind of difficult to know if this was just some kind of Azure runner glitch or not. I think it could be worth trying to reset the checksums again and see if they still fail. @roelof-groenewald might be familiar with that test as well, not sure if he observed anything like this already in the past.

@EZoni @roelof-groenewald I did update the benchmarks again and it failed again.

@EZoni
Copy link
Member

EZoni commented Apr 4, 2025

@EZoni @roelof-groenewald I did update the benchmarks again and it failed again.

This is very strange, I don't understand it. Especially, as @roelof-groenewald said, based on the input file, this test seems identical to the electron one, just with an ion species instead. I don't understand why repeating the test on Azure runners changes the checksums of the ion test (and only of that one).

@EZoni
Copy link
Member

EZoni commented Apr 4, 2025

If I repeat the test multiple times locally on my computer, I get the same checksums. Once I reset them the first time, they seem to be okay each other subsequent time.

…_velocity_for_diagnostics

This seems to fix the issue. There was apparently an out of bounds
access during the momentum synchronization that was being done before
the particle boundaries were applied at the last step.
The option turned on fixes this.
@dpgrote
Copy link
Member Author

dpgrote commented Apr 8, 2025

@EZoni @roelof-groenewald The CI test is now fixed. The issue was a long expected bug related to how the synchronization of the velocity is done for the final diagnostic. The fix was to use the new option warpx.synchronize_velocity_for_diagnostics, which does a clean synchronization without the risk of an out of memory access. I suspected this issue when I started seeing NaNs appearing in the output and getting arithmetic exception errors sometimes when running the tests.

@roelof-groenewald
Copy link
Member

roelof-groenewald commented Apr 8, 2025

Thanks for digging into that test failure, @dpgrote!

Copy link
Member

@roelof-groenewald roelof-groenewald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this really big lift @dpgrote and @EZoni!

Co-authored-by: Roelof Groenewald <[email protected]>
Copy link
Member

@lucafedeli88 lucafedeli88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR!
One question: if not setting warpx.synchronize_velocity_for_diagnostics = 1 can lead to out-of-bounds accesses, shouldn't we set this to 1 by default and maybe warn the user if they set it to 0?

@lucafedeli88
Copy link
Member

Thanks a lot for this PR! One question: if not setting warpx.synchronize_velocity_for_diagnostics = 1 can lead to out-of-bounds accesses, shouldn't we set this to 1 by default and maybe warn the user if they set it to 0?

Ah, yes, we probably should, otherwise @dpgrote would not have opened this #5816

@EZoni
Copy link
Member

EZoni commented Apr 8, 2025

Thanks a lot for the fix and for all the work, @dpgrote.

And thanks all, @lucafedeli88 and @roelof-groenewald, for your work and feedback as well.

Is it clear why this issue started to appear when we updated the physical constants? Why did it not occur before?

Anyways, I will approve and merge the PR in a minute.

@EZoni EZoni merged commit 9c9f188 into BLAST-WarpX:development Apr 8, 2025
37 checks passed
@dpgrote
Copy link
Member Author

dpgrote commented Apr 8, 2025

Thanks a lot for the fix and for all the work, @dpgrote.

And thanks all, @lucafedeli88 and @roelof-groenewald, for your work and feedback as well.

Is it clear why this issue started to appear when we updated the physical constants? Why did it not occur before?

Anyways, I will approve and merge the PR in a minute.

Thanks @EZoni ! I don't know why that CI test started failing here - I can only guess that it was chance. I didn't dig into finding exactly what was going wrong.

@dpgrote dpgrote deleted the update_constants_to_CODATA2022 branch April 8, 2025 16:27
dpgrote pushed a commit to dpgrote/WarpX that referenced this pull request Apr 15, 2025
The Ubuntu 20.04 runner image will be deprecated soon:
actions/runner-images#11101.
Only support CUDA 11.7+ going forward.

To-do:
- [x] GitHub Actions workflows
- [ ] ~~Azure DevOps workflows~~ (transfered to BLAST-WarpX#5661)

### Notes
I renamed the following workflows, with the aim of not having to rename
them again in the future when we upgrade version numbers:
- `NVCC 11.3 SP`, now `NVCC SP`
- `NVCC 11.8.0 GNUmake`, now `NVC GNU Make`
- `[email protected] NVCC/NVC++ Release [tests]`, now `NVHPC`

If you are okay with the renaming, the list of "required" workflows
should be updated to reflect the new names, and I don't have the
necessary access to do so.
dpgrote pushed a commit to dpgrote/WarpX that referenced this pull request Apr 15, 2025
This PR adds a simple hint suggested by @dpgrote in
BLAST-WarpX#5661 (comment)
related to the how-to guide on how to use our
[update_benchmarks_from_azure_output.py](https://github.com/ECP-WarpX/WarpX/blob/development/Tools/DevUtils/update_benchmarks_from_azure_output.py)
to update checksum benchmarks from the Azure raw log output.
dpgrote added a commit to dpgrote/WarpX that referenced this pull request Apr 15, 2025
…AST-WarpX#5661)

The values of the physical constants were from CODATA 2018. These should
be updated to the current accepted values as specified in CODATA 2022.

This breaks many CI benchmarks since the checks are at a higher
precision than the changes in the constants.

This PR change is also needed since the older version of Ubuntu (which
is used for the Azure CI tests) is being deprecated. This PR switches to
the newest Ubuntu which includes a newer version of scipy where the
constants have also been updated (since version 1.15.0 on January 3).

Note that this PR has fine tuned values of `mu0` and `alpha` so that the
relation between the constants is exact. The is means that these values
will differ slightly from the values in scipy.constants.

Many of the CI tests were using locally defined constants in the input
files and analysis scripts. These were all changed to rely on the built
in constants in the parser and `scipy.constants`.

---------

Co-authored-by: Edoardo Zoni <[email protected]>
Co-authored-by: Luca Fedeli <[email protected]>
Co-authored-by: Roelof Groenewald <[email protected]>
atmyers pushed a commit to atmyers/WarpX that referenced this pull request Jul 3, 2025
…AST-WarpX#5661)

The values of the physical constants were from CODATA 2018. These should
be updated to the current accepted values as specified in CODATA 2022.

This breaks many CI benchmarks since the checks are at a higher
precision than the changes in the constants.

This PR change is also needed since the older version of Ubuntu (which
is used for the Azure CI tests) is being deprecated. This PR switches to
the newest Ubuntu which includes a newer version of scipy where the
constants have also been updated (since version 1.15.0 on January 3).

Note that this PR has fine tuned values of `mu0` and `alpha` so that the
relation between the constants is exact. The is means that these values
will differ slightly from the values in scipy.constants.

Many of the CI tests were using locally defined constants in the input
files and analysis scripts. These were all changed to rely on the built
in constants in the parser and `scipy.constants`.

---------

Co-authored-by: Edoardo Zoni <[email protected]>
Co-authored-by: Luca Fedeli <[email protected]>
Co-authored-by: Roelof Groenewald <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: core Core WarpX functionality component: tests Tests and CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants