Description
Describe the bug
Process gets killed when opening an ADIOS BP5 series generated with PIConGPU
in READ_ONLY
mode. I generated a fairly large openPMD dataset (107TiB) with the following PIConGPU
IO configuration:
toml
# The following parameters need not be specified
# If a parameter is left unspecified, it falls back to its default value
file = "simData" # replaces --openPMD.file,
# given value is the default
infix = "" # replaces --openPMD.infix,
# default is "%06T"
ext = "bp5" # replaces --openPMD.ext,
# given value is the default
backend_config = "@../input/etc/picongpu/adios_config.json" # replaces --openPMD.json,
# default is "{}"
data_preparation_strategy = "mappedMemory" # replaces --openPMD.dataPreparationStrategy,
# default is "doubleBuffer"
# Periods and data sources are specified independently per reading application
# The application names can be arbitrary and are not interpreted, except
# potentially for logging and other messages.
[sink.disk.period]
# Each entry here denotes a periodicity combined with data sources requested
# by the reading code from PIConGPU at the specified periodicity
"4356:4356,450" = "species_all"
"1:1,252" = "fields_all"
"0:0,4356:9072:2" = ["e_all_density", "e_all_densityOverGammaSquared", "B"]
adios_config.json
{"adios2": {
"engine": {
"parameters": {
"AggregatorRatio" : "1"
,"BufferChunkSize" : "2147381248"
}
}
, "dataset": {
"operators": [ {
"type": "blosc"
, "parameters": {
"clevel": "1"
, "compressor": "zstd"
, "doshuffle": "BLOSC_BITSHUFFLE"
}
} ]
}
}}
openpd-ls simData.bp5
, bpls simData.bp5
, as well as io.Series("<...>/simData.bp5", io.Access.read_only)
crash and go out of memory.
Using io.Series("<...>/simData.bp5", io.Access.read_only), {\"defer_iteration_parsing\": true}")
doesn't help, but switching to io.Access.read_linear
does. It looks like the series was written with variable encoding.
Executing the following script with the memory profiler memray
shows that the ADIOS reader is allocating a gigantic buffer somewhere in readGorVBased
. The full stack with the allocation sizes can be found in the attached html file memray-flamegraph-test_open_series.py.20346 .html.txt.
from pathlib import Path
import openpmd_api as io
base_path = Path(

"/global/cfs/cdirs/m4251/pordyna/runs/hydrogen_toy_density/3D/2023_06_02_N04_final")
path = base_path / 'simOutput/openPMD/simData.bp5'
series = io.Series(str(path), io.Access.read_only, "{\"defer_iteration_parsing\": true}")
for attr in series.attributes:
print(f"{attr}: {series.get_attribute(attr)}")
Expected behavior and some questions
openpmd-ls
andbpls
should work even with variable encoding- Is this out of memory crash already expected due to the known limitations of using
io.Access.read_only
with variable encoding, or is it an extra bug? - Is it the desired behavior that using
.bp5
together with no infix (like%06T
) creates a series with variable encoding even though it does not support random access? Is this anopenpmd-api
default or is it set inpicongpu
@franzpoeschel? - Can I somehow use BP5 with group based encoding? I was trying to avoid using file based encoding with so meany iterations.
Software Environment
- version of openPMD-api: 0.15.1
- installed openPMD-api via: from source (using the python package compiled together with the api for running picongpu)
- operating system: [name and version]
- machine: perlmutter
- name and version of Python implementation: CPython 3.9.16
- version of HDF5: 1.14.0
- version of ADIOS2: 2.9.0
- name and version of MPI: mpich 8.1.25 (3.1 standard version)