Skip to content

Problems with reading BP5 output #1457

Open
@pordyna

Description

@pordyna

Describe the bug
Process gets killed when opening an ADIOS BP5 series generated with PIConGPU in READ_ONLY mode. I generated a fairly large openPMD dataset (107TiB) with the following PIConGPU IO configuration:

toml

# The following parameters need not be specified
# If a parameter is left unspecified, it falls back to its default value
file = "simData"    # replaces --openPMD.file,
                   # given value is the default

infix = ""         # replaces --openPMD.infix,
                   # default is "%06T"
ext = "bp5"         # replaces --openPMD.ext,
                   # given value is the default
backend_config = "@../input/etc/picongpu/adios_config.json"    # replaces --openPMD.json,
                                           # default is "{}"
data_preparation_strategy = "mappedMemory" # replaces --openPMD.dataPreparationStrategy,
                                           # default is "doubleBuffer"


# Periods and data sources are specified independently per reading application
# The application names can be arbitrary and are not interpreted, except
# potentially for logging and other messages.
[sink.disk.period]
# Each entry here denotes a periodicity combined with data sources requested
# by the reading code from PIConGPU at the specified periodicity
"4356:4356,450" = "species_all"
"1:1,252" = "fields_all"
"0:0,4356:9072:2" = ["e_all_density", "e_all_densityOverGammaSquared", "B"]

adios_config.json

{"adios2": {                                       
    "engine": {                                 
        "parameters": {                         
             "AggregatorRatio" : "1"            
             ,"BufferChunkSize" : "2147381248"  
        }                                           
    }                                         
    , "dataset": {                            
        "operators": [ {                        
            "type": "blosc"                     
            , "parameters": {                     
                "clevel": "1"                     
                , "compressor": "zstd"            
                , "doshuffle": "BLOSC_BITSHUFFLE" 
            }                                       
        } ]                                       
    }                                             
}}    

openpd-ls simData.bp5, bpls simData.bp5, as well as io.Series("<...>/simData.bp5", io.Access.read_only) crash and go out of memory.
Using io.Series("<...>/simData.bp5", io.Access.read_only), {\"defer_iteration_parsing\": true}") doesn't help, but switching to io.Access.read_linear does. It looks like the series was written with variable encoding.

Executing the following script with the memory profiler memray shows that the ADIOS reader is allocating a gigantic buffer somewhere in readGorVBased. The full stack with the allocation sizes can be found in the attached html file memray-flamegraph-test_open_series.py.20346 .html.txt.

from pathlib import Path
import openpmd_api as io
base_path = Path(
![newplot](https://github.com/openPMD/openPMD-api/assets/41141557/31fc9979-5487-4ee9-8758-aa420428709a)
"/global/cfs/cdirs/m4251/pordyna/runs/hydrogen_toy_density/3D/2023_06_02_N04_final")
path = base_path / 'simOutput/openPMD/simData.bp5'
series = io.Series(str(path), io.Access.read_only,  "{\"defer_iteration_parsing\": true}")
for attr in series.attributes:
    print(f"{attr}: {series.get_attribute(attr)}")

newplot

Expected behavior and some questions

  • openpmd-ls and bpls should work even with variable encoding
  • Is this out of memory crash already expected due to the known limitations of using io.Access.read_only with variable encoding, or is it an extra bug?
  • Is it the desired behavior that using .bp5 together with no infix (like %06T) creates a series with variable encoding even though it does not support random access? Is this an openpmd-api default or is it set in picongpu @franzpoeschel?
  • Can I somehow use BP5 with group based encoding? I was trying to avoid using file based encoding with so meany iterations.

Software Environment

  • version of openPMD-api: 0.15.1
  • installed openPMD-api via: from source (using the python package compiled together with the api for running picongpu)
  • operating system: [name and version]
  • machine: perlmutter
  • name and version of Python implementation: CPython 3.9.16
  • version of HDF5: 1.14.0
  • version of ADIOS2: 2.9.0
  • name and version of MPI: mpich 8.1.25 (3.1 standard version)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions