-
Notifications
You must be signed in to change notification settings - Fork 390
Workarounds for Lustre I/O issues #4426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Workarounds for Lustre I/O issues #4426
Conversation
403631b
to
2b010ce
Compare
@@ -1111,7 +1111,7 @@ VisMF::Write (const FabArray<FArrayBox>& mf, | |||
nfi.Stream().flush(); | |||
delete [] allFabData; | |||
|
|||
} else { // ---- write fabs individually | |||
} else { // ---- write fabs individually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The white space change is unnecessary and I think is incorrect.
} | ||
Real const* fabdata = fab.dataPtr(); | ||
#ifdef AMREX_USE_GPU | ||
#ifdef AMREX_USE_GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will revert the spurious whitespace changes.
The flush calls were added in the past to avoid I/O issues on titan. It might still be needed on some systems. So maybe we can make this (and the mpi_barrier) a runtime parameter. We could make not flushing the default. Or we could make the default to be different on different machines. For example |
Ok, I'd be happy with making it a runtime parameter. I will update the PR. |
A WarpX user reported,
|
It may be necessary to manually add a Without doing this at the analogous location in our code, I also see hangs on Frontier. |
Also, I set the striping on Frontier manually:
and also set the number of files to 1 per node:
Sometimes, it still hangs in the particle writes every few checkpoints. I don't have a workaround for that. |
The change now worked for me after applying
and setting the striping correctly. |
That's great to hear. Maybe after the most recent maintenance window, it works now without hanging. What is the size of each of your checkpoints? I'm curious to know what the effective write bandwidth you're seeing is. |
Uhm, I actually wonder if it was just a system fix. I tried the regular code too (latest release without changes) with the same output and checkpoints now work as long as I set the striping. The only note is that I did not go very far in the interaction and so I haven't really tested a very "mixed" situation (although the simulation is pretty unbalanced in the beginning). |
Summary
This adds workarounds for Lustre I/O write issues at scale (~128 nodes or more):
ParallelDescriptor::Barrier()
after writing each level (both for MultiFabs and particles).This reduces plotfile write time on 1024 nodes on Frontier from 30+ minutes to 1 minute. It performs best with a stripe count of 1 and stripe size of 16M and 1 file per node.
Additional background
Checklist
The proposed changes: