I've opted for the questionably severe "bug" label given how much this behavior impacts runtime.
In particular, it seems like the default way hdf5 writes files from a large (~256) number of nodes on stampede2 is far slower than it should be. Presumably, this problem scales with the number of nodes, and this is particularly concerning when trying to run large simulations (e.g., 448x224x224) in a reasonable amount of wall time.
In addition to optimizing the mpi write on stampede2, it seems like writing smaller dump files (not outputting gamma or diagnostics like extras/fixup, extras/divB, extras/fail) should speed things up as well.
I've opted for the questionably severe "bug" label given how much this behavior impacts runtime.
In particular, it seems like the default way hdf5 writes files from a large (~256) number of nodes on stampede2 is far slower than it should be. Presumably, this problem scales with the number of nodes, and this is particularly concerning when trying to run large simulations (e.g., 448x224x224) in a reasonable amount of wall time.
In addition to optimizing the mpi write on stampede2, it seems like writing smaller dump files (not outputting
gammaor diagnostics likeextras/fixup,extras/divB,extras/fail) should speed things up as well.