-
Notifications
You must be signed in to change notification settings - Fork 8
Description
HDF5 supports two ways of storing an array of strings: fixed-length and variable-length.
openPMD uses arrays of strings for some attributes, for example, for axisLabels
. When a fixed-length array is used,
// h5dump output
ATTRIBUTE "axisLabels" {
DATATYPE H5T_STRING {
STRSIZE 2;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): "x", "z"
}
}
openPMD-validator considers that a valid attribute. However, when a variable-length array is used,
ATTRIBUTE "axisLabels" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): "x", "z"
}
}
openPMD-validator fails with the following error message:
Error: Attribute axisLabels in `/data/0/meshes/inv` is not of type ndarray of '<map object at 0x7fe5256acbb0>' (is ndarray of 'object_')!
As variable-length string arrays are a legitimate feature of the HDF5 data format, and the openPMD standard does not explicitly ban using this feature (it only states that axisLabels
should be "1-dimensional array containing N (string) elements", which is satisfied in both cases), I believe using variable-length should not violate the openPMD standard, and thus the openPMD-validator should not fail in this case.
This probably happens because internally h5py represents variable-length string arrays as np.ndarray
with dtype=object
instead of numpy string type (see https://docs.h5py.org/en/stable/special.html). Because of that, instead of using arr.dtype.type
(which gives np.object_
for variable-length arrays), the validator should use the h5py.check_string_dtype(arr.dtype)
method which correctly works both with fixed- and variable-length string arrays.
Attached are two example output files with fixed- and variable-length used for axisLabels
: examples.zip