Skip to content

Conversation

guj
Copy link
Contributor

@guj guj commented Mar 28, 2025

from time to time, I see exception was caught for joined array cases here:

for (unsigned int i = 0; i < actualDim; i++)
{
if (!(joinedDim.has_value() && *joinedDim == i) &&
offset[i] + extent[i] > shape[I])
}
because offset is given as {} for joined array variable.

the shape of the joined var is 18446744073709551615

joined array:

  • from application:
    - particle variable is initialized to openPMD::Dataset::JOINED_DIMENSION
  • definition in openPMD:
    - Dataset.hpp: JOINED_DIMENSION = std::numeric_limitsstd::uint64_t::max()
    - ADIOS2IOHandler.cpp: finding if (dims[i] == adios2::JoinedDim)
  • from ADIOS:
    - ADIOSTypes.h: constexpr uint64_t JoinedDim = MaxU64 - 1;

The reason it reached here is because the joined dimension from ADIOS is 18446744073709551614, one less than the supposed max of size_t, 18446744073709551615 , which is the shape of the variable.

This pull request checked the variable shape ID and acknowledge as joined array if adios2::ShapeID::JoinedArary is returned.

"[ADIOS2] Dataset access out of bounds.");
"[ADIOS2] Offset must be an empty vector in case of "
"joined "
"array.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the condition checked in the if branch from line 492 to 509.
Can't we just change the if condition in line 492 to:

if (joinedDim.has_value() || var.ShapeID() == adios2::ShapeID::JoinedArray)

This would be a single-line change and do basically the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will break the for loop at 500

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That issue only comes up when joinedDim.has_value() is false, but var.ShapeID() == adios2::ShapeID::JoinedArray is true, which we should treat as an error anyway?
I think I'll add a check for that circumstance.

Copy link
Contributor Author

@guj guj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will break the for loop at 500

"[ADIOS2] Dataset access out of bounds.");
"[ADIOS2] Offset must be an empty vector in case of "
"joined "
"array.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will break the for loop at 500

@guj guj changed the title working around an unusal encounter when the joined_dim has actual value "max::size_t - 1" working around an unusual encounter when the joined_dim has actual value "max::size_t - 1" Mar 31, 2025
@franzpoeschel
Copy link
Contributor

I went through the implementation of joined arrays again and added some further safety guards. I think that I might have an idea on what's going on in your crashing simulations: Joined arrays were until now ignored in the extendDataset() task, meaning that openPMD::Dataset::JOINED_DIMENSION will be propagated as-is to the ADIOS2 variable. This would explain why you see ~(size_t)0 and not ~(size_t)0 - 1 (which would be adios2::JoinedDim.
However, I'm not sure ADIOS2 even supports extending joined array datasets. Also, since you are using joined arrays for particle output (which is 1D), there is nothing to be extended since the joined extent is calculated on the fly anyway.

@franzpoeschel franzpoeschel force-pushed the 20250328-joinedarray branch 2 times, most recently from b4be4a4 to 8c9845a Compare April 1, 2025 13:08
@franzpoeschel
Copy link
Contributor

@guj Can you try if you still observe the crashes with the recent commits on this PR? I hope that I might have found and fixed the issue.
I will not merge this yet, since I found another bug on the way and still have to find the cause for that.

@franzpoeschel
Copy link
Contributor

I will not merge this yet, since I found another bug on the way and still have to find the cause for that.

Found and fixed the bug
Please still check if this fixes the issue you had, I think this might have been the problem

@franzpoeschel
Copy link
Contributor

CI failures due to deployment of CMake 4.0 to CI runners, will be fixed with #1742. I'll select the bugfix for the double-write from unique-pointers into a separate PR as it has nothing to do with this PR. Will rebase this onto that then.

@franzpoeschel franzpoeschel force-pushed the 20250328-joinedarray branch 2 times, most recently from de6838b to 9eb620f Compare April 2, 2025 13:54
@guj
Copy link
Contributor Author

guj commented Apr 2, 2025

@guj Can you try if you still observe the crashes with the recent commits on this PR? I hope that I might have found and fixed the issue. I will not merge this yet, since I found another bug on the way and still have to find the cause for that.

The test job was finally scheduled. Yes this version works!

@guj
Copy link
Contributor Author

guj commented Apr 2, 2025

I will not merge this yet, since I found another bug on the way and still have to find the cause for that.

Found and fixed the bug Please still check if this fixes the issue you had, I think this might have been the problem

OK I will submit another test to double check.

@franzpoeschel
Copy link
Contributor

The test job was finally scheduled. Yes this version works!

Nice, thanks for checking!

OK I will submit another test to double check.

I think it should not alter your use case since that's an entirely different issue, but feel free to run a test anyway

@franzpoeschel franzpoeschel force-pushed the 20250328-joinedarray branch from 9eb620f to ef62f62 Compare April 2, 2025 16:43
@guj
Copy link
Contributor Author

guj commented Apr 3, 2025

I will not merge this yet, since I found another bug on the way and still have to find the cause for that.

Found and fixed the bug Please still check if this fixes the issue you had, I think this might have been the problem

OK I will submit another test to double check.

This one passed too

@franzpoeschel franzpoeschel force-pushed the 20250328-joinedarray branch from ef62f62 to 240e25f Compare April 3, 2025 11:44
@franzpoeschel franzpoeschel enabled auto-merge (squash) April 3, 2025 11:47
@franzpoeschel franzpoeschel merged commit 0a7e0a7 into openPMD:dev Apr 3, 2025
30 checks passed
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Jun 3, 2025
…lue "max::size_t - 1" (openPMD#1740)

* working around an unusal encounter when the joined_dim has actual
value "max::size_t - 1"

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add test for redundant resetDataset()

* Merge check into above logic

* Better error messages in verifyDataset

* Add further safety guards to createDataset and extendDataset tasks

* Move joinedDim logic into middle-end for extendDataset

* Update include/openPMD/IO/ADIOS/ADIOS2IOHandler.hpp

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Franz Pöschel <[email protected]>
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Jun 3, 2025
…lue "max::size_t - 1" (openPMD#1740)

* working around an unusal encounter when the joined_dim has actual
value "max::size_t - 1"

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add test for redundant resetDataset()

* Merge check into above logic

* Better error messages in verifyDataset

* Add further safety guards to createDataset and extendDataset tasks

* Move joinedDim logic into middle-end for extendDataset

* Update include/openPMD/IO/ADIOS/ADIOS2IOHandler.hpp

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Franz Pöschel <[email protected]>
franzpoeschel added a commit that referenced this pull request Jun 4, 2025
* Fix: Late unique_ptr puts without CLOSE_FILE or ADVANCE operations (#1744)

* Add failing test

* Add failing test

* Revert "Add failing test"

This reverts commit 5e04ece.

* Reactivate writing from unique_ptr in finalize()

* BP5+groupbased: allow only up to 100 steps (#1732)

* BP5+groupbased: allow only up to 1000 steps

* Configure this via env variable

OPENPMD_BP5_GROUPENCODING_MAX_STEPS=1000

* Add documentation

* Lower limit to 100

* Add compile-time check for #1720 (#1722)

* WarpX: Repo Moved (#1733)

Update a link to WarpX.

* Fix zero-sized storeChunk for Span API in Python (#1738)

* working around an unusual encounter when the joined_dim has actual value "max::size_t - 1" (#1740)

* working around an unusal encounter when the joined_dim has actual
value "max::size_t - 1"

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add test for redundant resetDataset()

* Merge check into above logic

* Better error messages in verifyDataset

* Add further safety guards to createDataset and extendDataset tasks

* Move joinedDim logic into middle-end for extendDataset

* Update include/openPMD/IO/ADIOS/ADIOS2IOHandler.hpp

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Franz Pöschel <[email protected]>

* ADIOS2 bugfix: Always use CurrentStep() in mode::Read (#1749)

* Always use CurrentStep() in mode::Read

* Remove manual step counting

m_currentStep only necessary for SetStepSelection, it seems

* Clean up logic that is no longer needed

* Add test

---------

Co-authored-by: Axel Huebl <[email protected]>
Co-authored-by: Junmin Gu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants