Skip to content

add regression tests for AMR particle checkpoint-restart (issue #1544)#1877

Open
chongchonghe wants to merge 1 commit into
developmentfrom
chong/claude/issue-1544
Open

add regression tests for AMR particle checkpoint-restart (issue #1544)#1877
chongchonghe wants to merge 1 commit into
developmentfrom
chong/claude/issue-1544

Conversation

@chongchonghe
Copy link
Copy Markdown
Contributor

@chongchonghe chongchonghe commented May 8, 2026

Description

Adds regression tests for issue #1544: crash on checkpoint-restart with particles at the finest AMR level when using a different number of MPI ranks.

The crash was already fixed upstream in AMReX by PR #2276 (83de03244), which is included in the current AMReX submodule. This PR validates that the fix holds by adding two tests to BinaryOrbitCIC:

  • BinaryOrbitCICAMRCheckpointInit — runs BinaryOrbitAMR.toml for 3 steps with checkpoint_interval=3, writing chk0000003 with particles at the finest AMR level (Level_1).
  • BinaryOrbitCICAMRCheckpointRestart — restarts from that checkpoint with blocking_factor=8 (vs 32 in the init). Under multi-rank execution this produces different level-0 box arrays, exercising the dual-grid restart code path fixed by AMReX PR #2276. Current CI runs single-rank so blocking_factor does not change the actual box decomposition; the test will automatically cover the multi-rank path once MPI unit tests are added.

Note: during development a separate, related edge case was found — when restarting with the same number of MPI ranks but a different blocking_factor, the dual-grid flag is not set for levels without a Particle_H file even though the box arrays differ. This needs to be fixed upstream in AMReX and is not addressed here.

Related issues

Closes #1544 and #1545

Checklist

Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an x inside the square brackets [ ] in the Markdown source below:

  • I have added a description (see above).
  • I have added a link to any related issues (if applicable; see above).
  • I have read the Contributing Guide.
  • I have added tests for any new physics that this PR adds to the code.
  • (For quokka-astro org members) I have manually triggered the GPU tests with the magic comment /azp run.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces regression tests for a crash during checkpoint-restart with particles at the finest AMR level by adding a new TOML configuration and updating the CMake test suite. Feedback points out that the fix in the AMReX dependency is missing from the changes. Further suggestions include reordering command-line arguments to ensure parameter overrides are correctly applied and adding the REQUIRED_FILES property to the restart test for more descriptive error handling.

add_test(NAME BinaryOrbitCICRefactor2 COMMAND ${JOB_NAME} ../inputs/BinaryOrbit_refactor_splitparticle.toml ${QuokkaTestParams} WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}/tests)
add_test(NAME ${JOB_NAME} COMMAND ${JOB_NAME} ../inputs/${JOB_NAME}.toml ${QuokkaTestParams} WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}/tests)
add_test(NAME BinaryOrbitCICAMR COMMAND ${JOB_NAME} ../inputs/BinaryOrbitAMR.toml ${QuokkaTestParams} WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}/tests)
# Regression test for issue #1544: crash on checkpoint-restart with particles at finest AMR level.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The pull request description states that this PR fixes a crash in extern/amrex/Src/Particle/AMReX_ParticleIO.H. However, the provided diff does not include a submodule update for extern/amrex or any code changes implementing this fix. Without the corresponding fix in the AMReX dependency, these regression tests will likely fail in CI. Please ensure the fix is included in the PR.

# Note: current CI/unit tests run single-rank, so blocking_factor does not change the box
# decomposition in those runs. The test is kept so it will cover multi-rank restarts once
# MPI unit tests are introduced.
add_test(NAME BinaryOrbitCICAMRCheckpointInit COMMAND ${JOB_NAME} ../inputs/BinaryOrbitAMR.toml max_timesteps=3 checkpoint_interval=3 ${QuokkaTestParams} WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}/tests)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure that the specific test overrides (max_timesteps=3 and checkpoint_interval=3) take precedence over any global parameters defined in ${QuokkaTestParams}, they should be placed at the end of the command line. AMReX's ParmParse uses the last value provided for a given key, so if ${QuokkaTestParams} contains conflicting values (e.g., a smaller max_timesteps), the test might fail to produce the required checkpoint.

    add_test(NAME BinaryOrbitCICAMRCheckpointInit COMMAND ${JOB_NAME} ../inputs/BinaryOrbitAMR.toml ${QuokkaTestParams} max_timesteps=3 checkpoint_interval=3 WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}/tests)

set_tests_properties(BinaryOrbitCICRefactorInit2 PROPERTIES DEPENDS "BinaryOrbitCICRefactor")
set_tests_properties(BinaryOrbitCICRefactor2 PROPERTIES DEPENDS "BinaryOrbitCICRefactorInit2")
# BinaryOrbitCICAMRCheckpointRestart reads chk0000003 written by BinaryOrbitCICAMRCheckpointInit
set_tests_properties(BinaryOrbitCICAMRCheckpointRestart PROPERTIES DEPENDS "BinaryOrbitCICAMRCheckpointInit")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is recommended to also set the REQUIRED_FILES property for the restart test. This ensures that CTest skips the test with an informative message if the checkpoint directory chk0000003 was not created by the initialization test, rather than failing with a less descriptive error during execution.

    set_tests_properties(BinaryOrbitCICAMRCheckpointRestart PROPERTIES DEPENDS "BinaryOrbitCICAMRCheckpointInit" REQUIRED_FILES "chk0000003")

@chongchonghe chongchonghe changed the title add regression test for AMR particle checkpoint-restart (issue #1544) add regression tests for AMR particle checkpoint-restart (issue #1544) May 8, 2026
The crash reported in #1544 (restarting with particles at the finest AMR
level using a different number of MPI ranks) was fixed upstream in AMReX
by PR #2276 (83de03244). Add two regression tests to validate the fix:

- BinaryOrbitCICAMRCheckpointInit: runs BinaryOrbitAMR.toml for 3 steps
  with checkpoint_interval=3, writing chk0000003 with particles at the
  finest AMR level (Level_1).
- BinaryOrbitCICAMRCheckpointRestart: restarts from that checkpoint with
  blocking_factor=8 (vs 32 in the init). Under multi-rank execution this
  produces different level-0 box arrays, exercising the dual-grid restart
  code path fixed by AMReX PR #2276. Current CI runs single-rank so the
  blocking_factor difference is a no-op there; the test will cover the
  multi-rank path once MPI unit tests are added.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@chongchonghe chongchonghe force-pushed the chong/claude/issue-1544 branch from 88081f0 to ea0a6c8 Compare May 8, 2026 06:13
@chongchonghe
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@chongchonghe chongchonghe marked this pull request as ready for review May 8, 2026 06:25
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. particles labels May 8, 2026
@BenWibking
Copy link
Copy Markdown
Collaborator

It looks like something went wrong with the CheckpointRestart test: https://github.com/quokka-astro/quokka/actions/runs/25540180148/job/74964969223?pr=1877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

particles size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash in restarting from a checkpoint with different number of MPI ranks

2 participants