add regression tests for AMR particle checkpoint-restart (issue #1544)#1877
add regression tests for AMR particle checkpoint-restart (issue #1544)#1877chongchonghe wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces regression tests for a crash during checkpoint-restart with particles at the finest AMR level by adding a new TOML configuration and updating the CMake test suite. Feedback points out that the fix in the AMReX dependency is missing from the changes. Further suggestions include reordering command-line arguments to ensure parameter overrides are correctly applied and adding the REQUIRED_FILES property to the restart test for more descriptive error handling.
| add_test(NAME BinaryOrbitCICRefactor2 COMMAND ${JOB_NAME} ../inputs/BinaryOrbit_refactor_splitparticle.toml ${QuokkaTestParams} WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}/tests) | ||
| add_test(NAME ${JOB_NAME} COMMAND ${JOB_NAME} ../inputs/${JOB_NAME}.toml ${QuokkaTestParams} WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}/tests) | ||
| add_test(NAME BinaryOrbitCICAMR COMMAND ${JOB_NAME} ../inputs/BinaryOrbitAMR.toml ${QuokkaTestParams} WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}/tests) | ||
| # Regression test for issue #1544: crash on checkpoint-restart with particles at finest AMR level. |
There was a problem hiding this comment.
The pull request description states that this PR fixes a crash in extern/amrex/Src/Particle/AMReX_ParticleIO.H. However, the provided diff does not include a submodule update for extern/amrex or any code changes implementing this fix. Without the corresponding fix in the AMReX dependency, these regression tests will likely fail in CI. Please ensure the fix is included in the PR.
| # Note: current CI/unit tests run single-rank, so blocking_factor does not change the box | ||
| # decomposition in those runs. The test is kept so it will cover multi-rank restarts once | ||
| # MPI unit tests are introduced. | ||
| add_test(NAME BinaryOrbitCICAMRCheckpointInit COMMAND ${JOB_NAME} ../inputs/BinaryOrbitAMR.toml max_timesteps=3 checkpoint_interval=3 ${QuokkaTestParams} WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}/tests) |
There was a problem hiding this comment.
To ensure that the specific test overrides (max_timesteps=3 and checkpoint_interval=3) take precedence over any global parameters defined in ${QuokkaTestParams}, they should be placed at the end of the command line. AMReX's ParmParse uses the last value provided for a given key, so if ${QuokkaTestParams} contains conflicting values (e.g., a smaller max_timesteps), the test might fail to produce the required checkpoint.
add_test(NAME BinaryOrbitCICAMRCheckpointInit COMMAND ${JOB_NAME} ../inputs/BinaryOrbitAMR.toml ${QuokkaTestParams} max_timesteps=3 checkpoint_interval=3 WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}/tests)
| set_tests_properties(BinaryOrbitCICRefactorInit2 PROPERTIES DEPENDS "BinaryOrbitCICRefactor") | ||
| set_tests_properties(BinaryOrbitCICRefactor2 PROPERTIES DEPENDS "BinaryOrbitCICRefactorInit2") | ||
| # BinaryOrbitCICAMRCheckpointRestart reads chk0000003 written by BinaryOrbitCICAMRCheckpointInit | ||
| set_tests_properties(BinaryOrbitCICAMRCheckpointRestart PROPERTIES DEPENDS "BinaryOrbitCICAMRCheckpointInit") |
There was a problem hiding this comment.
It is recommended to also set the REQUIRED_FILES property for the restart test. This ensures that CTest skips the test with an informative message if the checkpoint directory chk0000003 was not created by the initialization test, rather than failing with a less descriptive error during execution.
set_tests_properties(BinaryOrbitCICAMRCheckpointRestart PROPERTIES DEPENDS "BinaryOrbitCICAMRCheckpointInit" REQUIRED_FILES "chk0000003")
The crash reported in #1544 (restarting with particles at the finest AMR level using a different number of MPI ranks) was fixed upstream in AMReX by PR #2276 (83de03244). Add two regression tests to validate the fix: - BinaryOrbitCICAMRCheckpointInit: runs BinaryOrbitAMR.toml for 3 steps with checkpoint_interval=3, writing chk0000003 with particles at the finest AMR level (Level_1). - BinaryOrbitCICAMRCheckpointRestart: restarts from that checkpoint with blocking_factor=8 (vs 32 in the init). Under multi-rank execution this produces different level-0 box arrays, exercising the dual-grid restart code path fixed by AMReX PR #2276. Current CI runs single-rank so the blocking_factor difference is a no-op there; the test will cover the multi-rank path once MPI unit tests are added. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
88081f0 to
ea0a6c8
Compare
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
It looks like something went wrong with the CheckpointRestart test: https://github.com/quokka-astro/quokka/actions/runs/25540180148/job/74964969223?pr=1877 |
Description
Adds regression tests for issue #1544: crash on checkpoint-restart with particles at the finest AMR level when using a different number of MPI ranks.
The crash was already fixed upstream in AMReX by PR #2276 (
83de03244), which is included in the current AMReX submodule. This PR validates that the fix holds by adding two tests toBinaryOrbitCIC:BinaryOrbitCICAMRCheckpointInit— runsBinaryOrbitAMR.tomlfor 3 steps withcheckpoint_interval=3, writingchk0000003with particles at the finest AMR level (Level_1).BinaryOrbitCICAMRCheckpointRestart— restarts from that checkpoint withblocking_factor=8(vs 32 in the init). Under multi-rank execution this produces different level-0 box arrays, exercising the dual-grid restart code path fixed by AMReX PR #2276. Current CI runs single-rank soblocking_factordoes not change the actual box decomposition; the test will automatically cover the multi-rank path once MPI unit tests are added.Note: during development a separate, related edge case was found — when restarting with the same number of MPI ranks but a different
blocking_factor, the dual-grid flag is not set for levels without aParticle_Hfile even though the box arrays differ. This needs to be fixed upstream in AMReX and is not addressed here.Related issues
Closes #1544 and #1545
Checklist
Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an
xinside the square brackets[ ]in the Markdown source below:/azp run.