Skip to content

Conversation

@ax3l
Copy link
Member

@ax3l ax3l commented Nov 16, 2025

After restarts, all diagnostics lo/hi bounds were not properly restored. This was specifically the case if the moving window had a non-zero start step and/or a stop step before the checkpoint step. After a restart, this causes new checkpoints and diagnostics to become corrupted, as the wrong spatial data gets filtered.

This fixes it. Existing checkpoints (from simulations that started from 0) are still readable with this fix.

Fix #6392

To Do

  • debug
  • semi-vibe debug (Cursor)
  • semi-vibe fix
  • review & clean up

Cleanup Notice

This bug shows how risky it is to duplicate moving-window logic at multiple places. The moving/shift logic should go into functions and they should be re-used. This PR makes this anti-pattern even worse by doubling down.

A follow-up or additional commit should deduplicate the logic to make it safer: #6400

After restarts, all diagnostics lo/hi were not properly
restored. This causes new checkpoints and diagnostics to
become corrupted, as the wrong spatial data gets filtered.

This fixes it.
@ax3l ax3l added bug Something isn't working bug: affects latest release Bug also exists in latest release version component: diagnostics all types of outputs component: checkpoint/restart Checkpointing & restarts labels Nov 16, 2025
@ax3l
Copy link
Member Author

ax3l commented Nov 16, 2025

@titoiride can you potentially test this PR with your data, too? :) Please restart from a checkpoint that was written from a simulation that ran from the beginning (not a checkpoint created by a restarted simulation).

Copy link
Member

@RemiLehe RemiLehe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a for this PR!

In addition to adapting the code to the start/end of the moving window, it seems that this PR is making more changes. Was this intentional?

For instance, the previous code used warpx.getmoving_window_x() to infer the current position of the moving window, while the new code relies instead on current_step to infer the current position.

In principle, warpx.getmoving_window_x() should be able to work with starting/stopping moving window. To avoid duplicating code (as you pointed out), should we remove the function warpx.getmoving_window_x() (since it is not used anymore) or should we try to fix it and introduce it again?

Comment on lines -535 to -538
const amrex::Real displacement =
warpx.getmoving_window_x() - warpx.Geom(0).ProbLo(moving_dir);
const int shift_num_base = static_cast<int>
(displacement / warpx.Geom(0).CellSize(moving_dir));
Copy link
Member Author

@ax3l ax3l Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is subtle.

In FullDiagnostics::MovingWindowAndGalileanDomainShift this shift is only done for steps where if (WarpX::moving_window_active(step+1)) an accumulates the m_lo/m_hi on truncated integer (cell) locations.

The implementation here omitted these details and thus introduces a drift to the real dimensions on restart.

Copy link
Member

@EZoni EZoni Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for context, some of this had been changed recently in #5985. More information on what bug this was fixing in the PR description.

Copy link
Member Author

@ax3l ax3l Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attn @bnara : can you double-check this PR does not introduce any issues with your existing moving-window workflows? This should improve the logic when moving windows start later than step 0 or stop at a certain step.

@RemiLehe RemiLehe changed the title Diagnostics: Fix Restart with Start/Stop Moving Window [WIP] Diagnostics: Fix Restart with Start/Stop Moving Window Nov 19, 2025
@ax3l
Copy link
Member Author

ax3l commented Nov 20, 2025

@RemiLehe

In principle, warpx.getmoving_window_x() should be able to work with starting/stopping moving window. To avoid duplicating code (as you pointed out), should we remove the function warpx.getmoving_window_x() (since it is not used anymore) or should we try to fix it and introduce it again?

I was a bit puzzled as well why I could not find a solution with warpx.getmoving_window_x. Looking at the moving window implementation in WarpX as of today, I think it has a general flaw that the diags prob domain can shift over time compared to the simulation geometry (stored among others in getmoving_window_x) because of the step-wise (accumulative) integer rounding to align with cells:
https://github.com/BLAST-WarpX/warpx/blob/25.11/Source/Diagnostics/FullDiagnostics.cpp#L1007-L1008

I think this can be fixed by fully removing the double book-keeping in FullDiagnostics::MovingWindowAndGalileanDomainShift but that will render existing restart points unusable.

In other words, WarpX::MoveWindow should be the only source of truth, but the FullDiags does its own tracking and actually is not doing it well: compared to
https://github.com/BLAST-WarpX/warpx/blob/25.11/Source/Utils/WarpXMovingWindow.cpp#L356-L376
it cumulative looses rounds on the order of a cells size, every time it updates.

While the over all moving window does the same:
https://github.com/BLAST-WarpX/warpx/blob/25.11/Source/Utils/WarpXMovingWindow.cpp#L392
I think this causes an issue in the situation where the moving window is not active 100% of the sim steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug: affects latest release Bug also exists in latest release version bug Something isn't working component: checkpoint/restart Checkpointing & restarts component: diagnostics all types of outputs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Diagnostics: Geometry Wrong After Restart

3 participants