Skip to content

Design new HDF5 interface #5033

Open
@jngrad

Description

@jngrad

TL;DR: ESPResSo needs to rewrite its hdf5 bridge to replace the aging h5xx library.

Background

Here are some of the main hdf5 C++ APIs available:

  • h5md/h5xx is now longer actively maintained and doesn't compile on modern toolchains due to SFINAE bugs
  • HighFive is no longer funded and is currently maintained in a fork by a single person
  • HDFGroup/hdf5 has C++11 serial bindings and C bindings

Requirements:

  • either header-only, or easily installable on HPC via EESSI/Spack, or supports CMake integration via FetchContent
  • preferably C++ API, although C is also an option
  • must support parallel I/O

The hdf5 project is actively maintained. They are planning release 2.0.0 for end of 2025. It will use CMake, which is a requirement for integration in ESPResSo since these bindings are not header-only. Its C++ bindings currently do not support parallel I/O, which is a hard requirement for ESPResSo.

Problem statement

ESPResSo needs to interface with efficient parallel I/O libraries to read and write data in a portable file format, such as HDF51 and GSD2 for particle data and VTK for lattice data. File formats for particle data need to support variable particle numbers for Monte Carlo, and metadata specifications (e.g. H5md3) for applications that need SI units (e.g. VOTCA4). The HDF5 file format fulfills these criteria and is already supported by ESPResSo, however its scaling is much poorer than MPI-IO in the current implementation, and it relies on a header-only C++ interface that cannot be compiled in HPC due to known portability bugs.

Roadmap

Investigate suitable hdf5 C++ API replacements for h5xx. When none is found, consider using the hdf5 C API directly. This has the advantage that we don't need CMake integration, don't need an extra library, and header files are already packaged on HPC (both EESSI and Spack) and on most Linux distributions.

The new bridge could be designed like this:

  • create a new CMake shared object target that depends on the chosen hdf5 API
  • implement a hdf5 File class that can write basic particle properties to a file, like masses and positions
  • implement a h5md specifications to store SI units of those properties
  • write a C++ unit test to check the interface (warning: beware of file system latency when reading a file that was just created)
  • implement all remaining particle and box properties and replace the existing espresso_hdf5 target with the new one

This project is part of MultiXscale Task 3.4 from Deliverable 3.3 due M30 (June 2025).

Footnotes

  1. HDF Group, API Reference, Introduction to HDF5.

  2. Glotzer Group , GSD documentation, HOOMD Schema.

  3. de Buyl et al. 2014, H5MD: A structured, efficient, and portable file format for molecular data, Computer Physics Communications 185(6):1546.

  4. Mashayak et al. 2015, Relative Entropy and Optimization-Driven Coarse-Graining Methods in VOTCA, PLoS ONE 10(7): e0131754.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions