Skip to content

Conversation

@JamieJQuinn
Copy link
Contributor

@JamieJQuinn JamieJQuinn commented Jun 27, 2024

This PR adds integration with ADIOS2 to provide parallel IO. This is primarily for saving:

  1. Checkpoints; files with enough information to restart the simulation at a later time, and
  2. Snapshots; files containing data for analysis & visualisation.

I'm writing this draft PR mainly as a handover doc for @CFD-Xing. The low-level pieces of the ADIOS integration have been mostly worked out, with the remaining TODOs mainly focusing on bugfixing and the solver-level work of writing and loading from complete checkpoint files.

One useful way to look at the outputted .bp files is using ADIOS's bpls tool, found in build/_deps/adios2-build/bin.

TODO:

  • Implement field output
  • Optional striding of fields before output (e.g. only saving every 2nd cell)
  • Optional lowering of precision of fields before output
  • Implement scalar output
  • Add unit tests for field output (field on host)
  • Add unit tests for field output (field on GPU)
  • Fix tests after mesh update
  • Investigate possible memory corruption (segfaults every few runs)
  • Output initial conditions at start of run
  • Include simulation parameters in checkpoints
  • Implement snapshot output with settings for striding, precision, steps between snapshots, and which variables to save.
  • Implement loading from checkpoint
  • Implement removing old checkpoints after writing latest
  • Checkpoint main variables, u, v, w and pressure
  • Implement time integrator checkpoint procedure
  • Investigate which other values require saving for full checkpoint
  • Implement sanity checks when loading from checkpoint; are parameters the same?
  • Shift/interpolate pressure to velocity grid before saving for simpler analysis (only for snapshots)
  • Add setting for number of steps between checkpoints
  • Implement file naming for both snapshots and checkpoints
  • Test checkpoint files - simulation running from 0-100 should produce bit identical results to another simulation run from 0-50, checkpointed, then restarted and run 50-100.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants