Optimizations for BP5 engine of ADIOS2: Use span-based API and avoid flushing before closing an Iteration #3200

franzpoeschel · 2022-06-24T08:37:44Z

Unlike previous engines in ADIOS2, BP5 can avoid copying data to internal buffers if one uses an optimized workflow. The following sequence of operations should be avoided:

engine.Put(dataPtr, …, adios2::Mode::Deferred);
// …
engine.PerformPuts(); // this provokes a copy to internal buffers
// …
engine.EndStep();

This PR uses two optimizations to avoid that workflow:

Use the Span API of openPMD/ADIOS2. Instead of creating buffers inside WarpX, this asks the IO backend to create a buffer for us to write into it. The problem of data copies is avoided entirely since the data is created at the right place already.
This optimization benefits all ADIOS2 engines that support the Span API (e.g. BP4), openPMD-api automatically switches to a fallback otherwise.
In other places, the data already exists in WarpX in the right format as static data, or needs to be downloaded into pinned memory which is not supported by the Span API. The "normal" storeChunk/Put API must be used, so the above workflow should be avoided. Do this by deleting instances of Series::flush() between RecordComponent::storeChunk() and Iteration::endStep().

Notes:

For creating a memory profile and seeing if this PR makes a difference: https://github.com/KDE/heaptrack
Optimization (2) is hard to maintain. There is nothing that stops someone to put a Series::flush() somewhere in these routines at a later point, this way destroying the optimization again.
Suggestion: Add something like Series::promiseNoFlushesUntilEndstep() to the openPMD-api that throws an error if a flush does indeed happen.
Another idea we had was to add a unique_ptr overload to RecordComponent::storeChunk(). This way, the backend knows that it is the unique owner of the data and can autonomously decide to delay showing the data to ADIOS2 until directly before EndStep. This is semantically only applicable to the pinned memory instance here in WarpX, the other places are static data, not unique data.

Question to Axel: The following shared_ptr does correctly handle destroying the pinned memory again, right? There is no custom destructor given, so I'm wondering.

#ifdef AMREX_USE_GPU
                if (fab.arena()->isManaged() || fab.arena()->isDevice()) {
                    amrex::BaseFab<amrex::Real> foo(local_box, 1, amrex::The_Pinned_Arena());
                    std::shared_ptr<amrex::Real> data_pinned(foo.release());
                    amrex::Gpu::dtoh_memcpy_async(data_pinned.get(), fab.dataPtr(icomp), local_box.numPts()*sizeof(amrex::Real));
                    // intentionally delayed until before we .flush(): amrex::Gpu::streamSynchronize();
                    mesh_comp.storeChunk(data_pinned, chunk_offset, chunk_size);
                } else
#endif

Close #3133

This reverts commit b0f93b4.

franzpoeschel · 2022-06-27T10:50:56Z

I've reverted the last commit (erasing instances of flush()) to check if that was what made the tests fail.
Also, I've begun working on the unique_ptr thing: openPMD/openPMD-api#1294

EDIT: Yep, looks like those flushes are (currently) needed. Erasing them should be what will help BP5 avoid those copies, but there might be further restructuring needed on WarpX's side?

franzpoeschel · 2022-06-28T11:18:54Z

Also relevant for memory usage in BP5: Specifying the right BufferchunkSize: ComputationalRadiationPhysics/picongpu#4127

If you know that WarpX will run on systems that have virtual memory, it's worth using 2GB as a default. In the upcoming openPMD-api release, openPMD::json::merge() can be used to specify defaults easily.

franzpoeschel added 3 commits June 24, 2022 10:15

Use span-based API where applicable

456a68a

Avoid flushing before Iteration::close()

b0f93b4

Revert "Avoid flushing before Iteration::close()"

a5de601

This reverts commit b0f93b4.

ax3l self-assigned this Jun 27, 2022

ax3l requested review from ax3l and guj June 27, 2022 15:00

ax3l added Performance optimization component: openPMD openPMD I/O labels Jun 27, 2022

ax3l mentioned this pull request Jan 25, 2023

openPMD Output BLAST-ImpactX/impactx#299

Merged

33 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimizations for BP5 engine of ADIOS2: Use span-based API and avoid flushing before closing an Iteration #3200

Optimizations for BP5 engine of ADIOS2: Use span-based API and avoid flushing before closing an Iteration #3200

Uh oh!

franzpoeschel commented Jun 24, 2022 •

edited by ax3l

Loading

Uh oh!

franzpoeschel commented Jun 27, 2022 •

edited

Loading

Uh oh!

franzpoeschel commented Jun 28, 2022

Uh oh!

Uh oh!

Optimizations for BP5 engine of ADIOS2: Use span-based API and avoid flushing before closing an Iteration #3200

Are you sure you want to change the base?

Optimizations for BP5 engine of ADIOS2: Use span-based API and avoid flushing before closing an Iteration #3200

Uh oh!

Conversation

franzpoeschel commented Jun 24, 2022 • edited by ax3l Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franzpoeschel commented Jun 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franzpoeschel commented Jun 28, 2022

Uh oh!

Uh oh!

franzpoeschel commented Jun 24, 2022 •

edited by ax3l

Loading

franzpoeschel commented Jun 27, 2022 •

edited

Loading