Skip to content

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Oct 14, 2025

Seen on #1791, fixes:

.. [4] We usually open iterations delayed on first access. This first access is usually the ``flush()`` call after a ``storeChunk``/``loadChunk`` operation. If the first access is non-collective, an explicit, collective ``Iteration::open()`` can be used to have the files already open.
Alternatively, iterations might be accessed for the first time by immediate operations such as ``::availableChunks()``.
.. [5] The Span-based ``storeChunk`` API calls return a backend-allocated pointer, requiring flush operations. These API calls hence inherit the collective properties of flushing. Use store calls with zero-size extent if a rank does not contribute anything.
Copy link
Member

@ax3l ax3l Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @guj

Oh, but that has some severe implications for particle flushes and codes with dynamic load balancing, where one does have a non-equal number of calls to storeChunk.

This makes usage very complicated for codes like WarpX.

@franzpoeschel Maybe we can relax the constraint t keep storeChunk indepdnent - instead put more logic in resetDataset or so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative:

  • Introduce RecordComponent::open() to do all necessary collective preparations
  • Check that SkeletonOnly flush does nothing collective

Copy link
Member

@ax3l ax3l Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed:

  • collective component only comes in for the first storeChunk (that means only particles would have a problem in most cases)
  • there is some confusion in the table there between the MPI concepts: collective/independent (described) and synchronizing Doc: MPI Sync? #1796 (not described)

Copy link
Member

@ax3l ax3l Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Franz comment: Collective might only be the ADIOS engine open -- we probably do not need at all to delay this and could do this collectively and blocking in Series open...?

To double check if Iteration::open() really opens the engine (x-ref)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is collective a performance boost requirement:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we can keep the current API:

  • Iteration::open() must be used in parallel setups to ensure collective file access
  • This was not working properly in random-access mode, now fixed
  • The Span API can then be used non-collectively

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and:

  • Due to the bug mentioned above, current releases require a workaround: Each rank must contribute at least one storeChunk() operation that may be empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix works for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and:

  • Due to the bug mentioned above, current releases require a workaround: Each rank must contribute at least one storeChunk() operation that may be empty.

so span should require the next release version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably for the best, yeah. For older versions, you can alternatively include the workaround described above.

@franzpoeschel franzpoeschel force-pushed the fix-collective-span-api branch from dfc8cea to 58c08ea Compare October 15, 2025 10:49
@franzpoeschel franzpoeschel changed the title Span API is collective Fix Iteration::open(), needed for correct use of Span API Oct 15, 2025
IOHandler()->flush(internal::defaultFlushParams);
break;
}
// IOHandler()->flush(internal::defaultFlushParams);

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
@franzpoeschel franzpoeschel merged commit b62f2e7 into openPMD:dev Oct 15, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants