Skip to content

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Aug 5, 2024

Description

For MPI-parallel I/O output, we developed a new method in ADIOS2 that does not need an initial metadata gather ("JoinedArrays"). To be able to use this mode, we need to relax the requirements to write a shape for constant records in a species (particle group), because otherwise we still have to do a collective gather.

This adds the need for a slight additional read fallback implementation on the reader side.

Affected Components

  • base

Logic Changes

The required attribute shape in constant record components is now optional for records in particle groups (species), if there is at least another record to recover the shape from in the same particle species.

Writer Changes

The required attribute shape in constant record components is now optional for records in particle groups (species).

Reader Changes

The required attribute shape in constant record components is now optional for records in particle groups (species).
If the attribute is missing, go through other records and components of the same species and pick the first one that has a shape (e.g., non-constant record component full extent or a constant record component with a shape) and use that information to recover.

What would a reader need to change? Link implementation examples!

Data Converter

No changes needed. Files from 1.X will be forward compatible with regards to this change.

@ax3l ax3l added the major change non-backwards compatible change label Aug 5, 2024
@ax3l ax3l added this to the openPMD 2.X milestone Aug 5, 2024
@ax3l ax3l requested review from RemiLehe, franzpoeschel and guj August 5, 2024 17:30
@ax3l ax3l changed the title Const particle shape Const Records: Relax shape for Particles Aug 5, 2024
@ax3l ax3l changed the base branch from latest to upcoming-2.0.0 August 5, 2024 17:32
For advanced, highly parallel I/O output, we developed a new
method in ADIOS2 that does not need an initial metadata gather
("JoinedArrays"). To be able to use this mode, we need to relax
the requirements to write a shape for constant records in a species
(particle group), because otherwise we still have to do a collective
gather.

This adds the need for a slight additional read fallback implementation
on the reader side.
@franzpoeschel
Copy link

This has practical consequences for parsing a corner case of openPMD data, namely constant scalar particle records.
Take for example this constant scalar particle record:

  int32_t   /data/500/particles/e_all/mass/macroWeighted                        attr   = 0                                                          
  uint64_t  /data/500/particles/e_all/mass/shape                                attr   = {1829775}                                                  
  double    /data/500/particles/e_all/mass/timeOffset                           attr   = 0                                                          
  double    /data/500/particles/e_all/mass/unitDimension                        attr   = {0, 1, 0, 0, 0, 0, 0}                                      
  double    /data/500/particles/e_all/mass/unitSI                               attr   = 8.26366e-28                                                
  double    /data/500/particles/e_all/mass/value                                attr   = 0.00110234                                                 
  double    /data/500/particles/e_all/mass/weightingPower                       attr   = 1

How does a parser distinguish /data/500/particles/e_all/mass from e.g. /data/500/particles/e_all/momentum:

  float     /data/500/particles/e_all/momentum/x                          {1829775} = 0 / 0
  float     /data/500/particles/e_all/momentum/y                          {1829775} = 0 / 0
  float     /data/500/particles/e_all/momentum/z                          {1829775} = 0 / 0

Currently, the openPMD-api implements this corner case in ParticleSpecies.cpp by:

            auto value = std::find(att_begin, att_end, "value");
            auto shape = std::find(att_begin, att_end, "shape");
            if (value != att_end && shape != att_end)
            {
                RecordComponent &rc = r;
                IOHandler()->enqueue(IOTask(&rc, pOpen));
                IOHandler()->flush(internal::defaultFlushParams);
                rc.get().m_isConstant = true;
            }
            try
            {
                r.read();
            }

I.e., when parsing a group /data/500/particles/e_all/MYSTERY, it checks if the group contains the attributes value and shape. If yes, it's a constant scalar component, otherwise it's a normal record.
With this standard change, I see two options:

  1. We only check for the attribute value. MIght lead to problems with datasets that for some reason use an attribute named value.
    Follow-up question: For legacy (1.0) datasets, do we keep checking for both attributes? If yes, the relaxed standard definition cannot be applied to old data. If no, some old datasets might be broken.
  2. We check if the record contains subgroups or datasets and treat it as a constant component if not. Might lead to problems with partially written data, e.g. after a crash.

I suggest that we decide for one clearly defined scheme to detect constant scalar components and standardize that. 1. is easier to implement.

Also, do I understand it correctly that the relaxed constant component markup is not applicable to Mesh records?

@ax3l
Copy link
Member Author

ax3l commented Mar 17, 2025

Thanks, good points!

  1. I think the value descriminator for a constant record component is sufficient, because value is a reserved attribute in the openPMD standard and using it without the well-defined meaning is ignoring/violating the standard.
    1.1.: yes, for 1.* datasets we will could require shape, but technically we could support it already in implementations with low risk of breaks.
  2. I would go with 1.

Also, do I understand it correctly that the relaxed constant component markup is not applicable to Mesh records?

At least for now that is how we define it. I think there is no big benefit in mesh records to support skipping shape, but it would be analogous (needs at least another record component present with shape).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

major change non-backwards compatible change

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

2 participants