Skip to content

JSON file backend reserializes whole file for each call to storeChunk() #1781

@dantargz

Description

@dantargz

Describe the bug
If JSON backend is used, and calling code makes N calls to storeChunk() and then calls flush(), the produced JSON file will be deleted and rewritten to N times. Expected behavior is one JSON serialization per flush() call regardless of the number of storeChunk() calls.

To Reproduce

Build & run WarpX against the laser accelerator example with default build options

  1. Get WarpX source
  2. Compile with default configuration (description here)
  3. Run WarpX against the 3d laser accelerator example warpx.3d inputs_test_3d_laser_acceleration
  4. Observe that it takes 60 seconds (or longer) to generate a ~50MB JSON file with ~2 million doubles

Expected behavior
Expected behavior is one JSON serialization per flush() call regardless of the number of storeChunk() calls.

Software Environment

  • version of openPMD-api: 0.16.1
  • installed openPMD-api via: WarpX cmake build system dependency
  • operating system: OSX Sequoia 15.6.1
  • machine: Mac Book Pro 2024 / M4 Pro
  • name and version of Python implementation: N/A
  • version of HDF5: N/A
  • version of ADIOS2: N/A
  • name and version of MPI: N/A

Additional context
Removing the call to putJsonContents(file); in JSONIOHandlerImpl::writeDataset mostly solves the issue -- the JSON file is still re-serialized about 8 times per time the WarpX application goes to serialize, but that is a significant improvement over 1000+ it currently does. Adding a configuration option or flag for "only serialize to JSON on flush" would be acceptable for my use case.

Code flow:

  • WarpX stores data in openPMD via storeChunk/storeChunkRaw calls (I think the main caller is here https://github.com/BLAST-WarpX/warpx/blob/development/Source/Diagnostics/WarpXOpenPMD.cpp#L942) -- the data is not all in a single contiguous buffer for particle data in default mode for many simulations
  • Each call to storeChunk creates a WRITE_DATASET IOTask
  • WarpX calls openPMD flush()
  • flush() iterates through IOTasks and handles each one-by-one
  • When handling WRITE_DATASET IO tasks, it eventually invokes JSONIOHandlerImpl::writeDataset
  • Each call to JSONIOHandlerImpl::writeDataset re-serializes entire full json file by invoking putJsonContents

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions