Skip to content
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
245e19d
Add JSON schema for openPMD
franzpoeschel Jun 26, 2023
a860269
Add convert-json-toml tool
franzpoeschel Jun 26, 2023
4c957b4
Add script for checking openPMD file against the schema
franzpoeschel Jun 26, 2023
a510957
Don't use spaces in SerialIOTest attribute names
franzpoeschel Jun 26, 2023
facfa15
Fix bugs detected by this verifier
franzpoeschel Jun 26, 2023
3749056
Add GitHub workflow
franzpoeschel Jun 26, 2023
ba3f5b1
Shorthand attributes
franzpoeschel Aug 7, 2023
a719bb6
Add dataset template mode
franzpoeschel Aug 7, 2023
3a5fc19
Fix path
franzpoeschel Jul 16, 2024
334f2d5
Fix reading from stdin
franzpoeschel Jul 16, 2024
90ce201
toml11 4.0 compatibility
franzpoeschel Aug 5, 2024
779cf12
Only check for existing Iterations in writeOnly mode
franzpoeschel Feb 17, 2025
f6b1f24
Some additions to schema
franzpoeschel Feb 17, 2025
f18a88b
Remove deprecated jsonschema.validators.RefResolver
franzpoeschel Feb 17, 2025
5e4a870
Use most recent version of jsonschema
franzpoeschel Mar 3, 2025
252c0d4
Allow empty variable-based series
franzpoeschel Mar 3, 2025
65aa12d
Use if-then-else for better-steered parsing
franzpoeschel Mar 3, 2025
83ed23a
hmm
franzpoeschel Mar 26, 2025
579e7b0
Remove json cfg after test
franzpoeschel Apr 7, 2025
81533b0
Update documentation, rename convert-toml-json tool
franzpoeschel Jul 15, 2025
985d505
Apply suggestions from code review
franzpoeschel Jul 15, 2025
3a2929f
Add reference to openPMD-validator
franzpoeschel Jul 15, 2025
d578167
Update README.md
franzpoeschel Jul 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion .github/workflows/linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ jobs:
- name: Install
run: |
sudo apt-get update
sudo apt-get install g++ libopenmpi-dev libhdf5-openmpi-dev python3 python3-numpy python3-mpi4py python3-pandas python3-h5py-mpi
sudo apt-get install g++ libopenmpi-dev libhdf5-openmpi-dev python3 python3-numpy python3-mpi4py python3-pandas python3-h5py-mpi python3-pip
# TODO ADIOS2
- name: Build
env: {CXXFLAGS: -Werror, PKG_CONFIG_PATH: /usr/lib/x86_64-linux-gnu/pkgconfig}
Expand All @@ -275,6 +275,22 @@ jobs:
cmake --build build --parallel 4
ctest --test-dir build --output-on-failure

python3 -m pip install jsonschema==4.* referencing
cd share/openPMD/json_schema
PATH="../../../build/bin:$PATH" make -j 2
# We need to exclude the thetaMode example since that has a different
# meshesPath and the JSON schema needs to hardcode that.
Comment on lines +281 to +282
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to patch this in check.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very easily. The JSON schema is on the file system and the single .json files refer to each other by their file names. Changing this would require (1) traversing the entire JSON schema and overriding the meshes path, the particles path and the references and (2) somehow setting up python-jsonschema to cross-reference in-memory schemas which I don't even know if it supports that, both at runtime of check.py.

find ../../../build/samples/ \
! -path '*thetaMode*' \
! -path '/*many_iterations/*' \
! -name 'profiling.json' \
! -name '*config.json' \
-iname '*.json' \
| while read i; do
echo "Checking $i"
./check.py "$i"
done

musllinux_py10:
runs-on: ubuntu-22.04
if: github.event.pull_request.draft == false
Expand Down
6 changes: 5 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -685,11 +685,12 @@ set(openPMD_TEST_NAMES
# command line tools
set(openPMD_CLI_TOOL_NAMES
ls
convert-json-toml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add this to setup.py -> entry_points -> console_scripts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a Python module, but written in C++, so I doubt that will work?

)
set(openPMD_PYTHON_CLI_TOOL_NAMES
pipe
)
set(openPMD_PYTHON_CLI_MODULE_NAMES ${openPMD_CLI_TOOL_NAMES})
set(openPMD_PYTHON_CLI_MODULE_NAMES ls)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line change looks like a hack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Until now, both openPMD_CLI_TOOL_NAMES and openPMD_PYTHON_CLI_MODULE_NAMES were identical since both contained only openpmd-ls.
But they're not identical in general; and now, we additionally have openpmd-convert-json-toml as a CLI tool, but it is not written in Python.

# examples
set(openPMD_EXAMPLE_NAMES
1_structure
Expand Down Expand Up @@ -894,6 +895,9 @@ if(openPMD_BUILD_CLI_TOOLS)
endif()

target_link_libraries(openpmd-${toolname} PRIVATE openPMD)
target_include_directories(openpmd-${toolname} SYSTEM PRIVATE
$<TARGET_PROPERTY:openPMD::thirdparty::nlohmann_json,INTERFACE_INCLUDE_DIRECTORIES>
$<TARGET_PROPERTY:openPMD::thirdparty::toml11,INTERFACE_INCLUDE_DIRECTORIES>)
endforeach()
endif()

Expand Down
15 changes: 12 additions & 3 deletions include/openPMD/auxiliary/JSON_internal.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -219,16 +219,25 @@ namespace json
* @param options as a parsed JSON object.
* @param considerFiles If yes, check if `options` refers to a file and read
* from there.
* @param convertLowercase If yes, lowercase conversion is applied
* recursively to keys and values, except for some hardcoded places
* that should be left untouched.
*/
ParsedConfig parseOptions(std::string const &options, bool considerFiles);
ParsedConfig parseOptions(
std::string const &options,
bool considerFiles,
bool convertLowercase = true);

#if openPMD_HAVE_MPI

/**
* Parallel version of parseOptions(). MPI-collective.
*/
ParsedConfig
parseOptions(std::string const &options, MPI_Comm comm, bool considerFiles);
ParsedConfig parseOptions(
std::string const &options,
MPI_Comm comm,
bool considerFiles,
bool convertLowercase = true);

#endif

Expand Down
15 changes: 15 additions & 0 deletions share/openPMD/json_schema/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
convert := openpmd-convert-json-toml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming it openpmd-convert-toml-json seems to be more intuitive for what we use it here, but you mentioned anyway that it works bi-directional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bi-directional, but we can flip the name, since that's the intended use here.


json_files = attribute_defs.json attributes.json dataset_defs.json iteration.json mesh.json mesh_record_component.json particle_patches.json particle_species.json patch_record.json record.json record_component.json series.json

.PHONY: all
all: $(json_files)

# The target file should only be created if the conversion succeeded
$(json_files): %.json: %.toml
$(convert) @$^ > $@.tmp
mv $@.tmp $@

.PHONY: clean
clean:
for file in $(json_files); do rm -f "$$file" "$$file.tmp"; done
47 changes: 47 additions & 0 deletions share/openPMD/json_schema/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# JSON Validation

This folder contains a JSON schema for validation of openPMD files written as `.json` files.

## Usage

### Generating the JSON schema

For improved readability, maintainability and documentation purposes, the JSON schema is written in `.toml` format and needs to be "compiled" to `.json` files first before usage.
To do this, the openPMD-api installs a tool named `openpmd-convert-json-toml` which can be used to convert between JSON and TOML files in both directions, e.g.:

```bash
openpmd_convert-json-toml @series.toml > series.json
```

A `Makefile` is provided in this folder to simplify the application of this conversion tool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makefile? Isn't this just built with CMake as one of the CLI tools?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. The makefile does the conversion of the schema toml files

Copy link
Member

@ax3l ax3l Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A `Makefile` is provided in this folder to simplify the application of this conversion tool.
A `Makefile` is provided in this folder to simplify the application of this conversion tool to the `.toml` files in this folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a bit clearer: A Makefile is provided in this folder to automate generating the needed JSON files from the TOML files.
?


### Verifying a file against the JSON schema

In theory, the JSON schema should be applicable by any JSON validator. This JSON schema is written in terms of multiple files however, and most validators require special care to properly set up the links between the single files. A Python script `check.py` is provided in this folder which sets up the [Python jsonschema](https://python-jsonschema.readthedocs.io) library and verifies a file against it, e.g.:

```bash
./check.py path/to/my/dataset.json
```

For further usage notes check the documentation of the script itself `./check.py --help`.

## Caveats

The openPMD standard is not entirely expressible in terms of a JSON schema:

* Many semantic dependencies, e.g. that the `position/x` and `position/y` vector of a particle species be of the same size, or that the `axisLabels` have the same dimensionality as the dataset itself, will go unchecked.
* The `meshesPath` is assumed to be `meshes/` and the `particlesPath` is assumed to be `particles/`. This dependency cannot be expressed.

While a large part of the openPMD standard can indeed be verified by checking against a JSON schema, the standard is generally large enough to make this approach come to its limits. Verification of a JSON schema is similar to the use of a naive recursive-descent parser. Error messages will often be unexpectedly verbose and not very informative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might clarify a bit:

Suggested change
While a large part of the openPMD standard can indeed be verified by checking against a JSON schema, the standard is generally large enough to make this approach come to its limits. Verification of a JSON schema is similar to the use of a naive recursive-descent parser. Error messages will often be unexpectedly verbose and not very informative.
While a large part of the openPMD standard can indeed be verified by checking against a static JSON schema, the standard is generally large enough to make this approach come to its limits. Verification of a JSON schema is similar to the use of a naive recursive-descent parser. Error messages will often be unexpectedly verbose and not very informative.

A challenge for the JSON validator are disjunctive statements such as "A Record is either a scalar Record Component or a vector of non-scalar Record Components". If there is even a tiny mistake somewhere down in the hierarchy, the entire disjunctive branch will fail evaluating.

The layout of attributes is assumed to be that which is created by the JSON backend of the openPMD-api, e.g.:

```json
"meshesPath": {
"datatype": "STRING",
"value": "meshes/"
}
```

Support for an abbreviated notation such as `"meshesPath": "meshes/"` is currently not (yet) available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section, you might want to advertise & link openPMD-validator again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. We might additionally advertise the JSON schema in the validator. Adding the JSON validation to our CI brought up a number of bugs after all.

Loading
Loading