Skip to content

Commit d4f3e2a

Browse files
franzpoeschelax3l
andauthored
Add JSON schema (#1426)
* Add JSON schema for openPMD Written as .toml files for ease of documentation, maintailability and readability. * Add convert-json-toml tool Needed for "compiling" the schema to JSON Also add a Makefile to further simplify this * Add script for checking openPMD file against the schema Workflow documented in README.md * Don't use spaces in SerialIOTest attribute names The JSON schema verification package does not like that * Fix bugs detected by this verifier Both of the form "data not found in places where data was expected" * Add GitHub workflow Verify all JSON-openPMD files written by testing against the schema * Shorthand attributes * Add dataset template mode * Fix path * Fix reading from stdin * toml11 4.0 compatibility * Only check for existing Iterations in writeOnly mode * Some additions to schema 1. Support UNDEFINED datasets in template mode 2. gridUnitSI may now be a vector * Remove deprecated jsonschema.validators.RefResolver Apparently it's better to make everything 100 times more complicated * Use most recent version of jsonschema * Allow empty variable-based series * Use if-then-else for better-steered parsing anyOf and oneOf now only used for trivial distinctions, this makes schemas much more robust since errors can be caught early and error messages become actually useful. * hmm * Remove json cfg after test Otherwise CI thinks this is an openPMD file * Update documentation, rename convert-toml-json tool * Apply suggestions from code review Co-authored-by: Axel Huebl <[email protected]> * Add reference to openPMD-validator * Update README.md --------- Co-authored-by: Axel Huebl <[email protected]>
1 parent 9f42a5f commit d4f3e2a

24 files changed

+1420
-34
lines changed

.github/workflows/linux.yml

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ jobs:
260260
- name: Install
261261
run: |
262262
sudo apt-get update
263-
sudo apt-get install g++ libopenmpi-dev libhdf5-openmpi-dev python3 python3-numpy python3-mpi4py python3-pandas python3-h5py-mpi
263+
sudo apt-get install g++ libopenmpi-dev libhdf5-openmpi-dev python3 python3-numpy python3-mpi4py python3-pandas python3-h5py-mpi python3-pip
264264
# TODO ADIOS2
265265
- name: Build
266266
env: {CXXFLAGS: -Werror, PKG_CONFIG_PATH: /usr/lib/x86_64-linux-gnu/pkgconfig}
@@ -275,6 +275,22 @@ jobs:
275275
cmake --build build --parallel 4
276276
ctest --test-dir build --output-on-failure
277277
278+
python3 -m pip install jsonschema==4.* referencing
279+
cd share/openPMD/json_schema
280+
PATH="../../../build/bin:$PATH" make -j 2
281+
# We need to exclude the thetaMode example since that has a different
282+
# meshesPath and the JSON schema needs to hardcode that.
283+
find ../../../build/samples/ \
284+
! -path '*thetaMode*' \
285+
! -path '/*many_iterations/*' \
286+
! -name 'profiling.json' \
287+
! -name '*config.json' \
288+
-iname '*.json' \
289+
| while read i; do
290+
echo "Checking $i"
291+
./check.py "$i"
292+
done
293+
278294
musllinux_py10:
279295
runs-on: ubuntu-22.04
280296
if: github.event.pull_request.draft == false

CMakeLists.txt

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -691,11 +691,12 @@ set(openPMD_TEST_NAMES
691691
# command line tools
692692
set(openPMD_CLI_TOOL_NAMES
693693
ls
694+
convert-toml-json
694695
)
695696
set(openPMD_PYTHON_CLI_TOOL_NAMES
696697
pipe
697698
)
698-
set(openPMD_PYTHON_CLI_MODULE_NAMES ${openPMD_CLI_TOOL_NAMES})
699+
set(openPMD_PYTHON_CLI_MODULE_NAMES ls)
699700
# examples
700701
set(openPMD_EXAMPLE_NAMES
701702
1_structure
@@ -900,6 +901,9 @@ if(openPMD_BUILD_CLI_TOOLS)
900901
endif()
901902

902903
target_link_libraries(openpmd-${toolname} PRIVATE openPMD)
904+
target_include_directories(openpmd-${toolname} SYSTEM PRIVATE
905+
$<TARGET_PROPERTY:openPMD::thirdparty::nlohmann_json,INTERFACE_INCLUDE_DIRECTORIES>
906+
$<TARGET_PROPERTY:openPMD::thirdparty::toml11,INTERFACE_INCLUDE_DIRECTORIES>)
903907
endforeach()
904908
endif()
905909

include/openPMD/auxiliary/JSON_internal.hpp

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -219,16 +219,25 @@ namespace json
219219
* @param options as a parsed JSON object.
220220
* @param considerFiles If yes, check if `options` refers to a file and read
221221
* from there.
222+
* @param convertLowercase If yes, lowercase conversion is applied
223+
* recursively to keys and values, except for some hardcoded places
224+
* that should be left untouched.
222225
*/
223-
ParsedConfig parseOptions(std::string const &options, bool considerFiles);
226+
ParsedConfig parseOptions(
227+
std::string const &options,
228+
bool considerFiles,
229+
bool convertLowercase = true);
224230

225231
#if openPMD_HAVE_MPI
226232

227233
/**
228234
* Parallel version of parseOptions(). MPI-collective.
229235
*/
230-
ParsedConfig
231-
parseOptions(std::string const &options, MPI_Comm comm, bool considerFiles);
236+
ParsedConfig parseOptions(
237+
std::string const &options,
238+
MPI_Comm comm,
239+
bool considerFiles,
240+
bool convertLowercase = true);
232241

233242
#endif
234243

share/openPMD/json_schema/Makefile

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
convert := openpmd-convert-toml-json
2+
3+
json_files = attribute_defs.json attributes.json dataset_defs.json iteration.json mesh.json mesh_record_component.json particle_patches.json particle_species.json patch_record.json record.json record_component.json series.json
4+
5+
.PHONY: all
6+
all: $(json_files)
7+
8+
# The target file should only be created if the conversion succeeded
9+
$(json_files): %.json: %.toml
10+
$(convert) @$^ > $@.tmp
11+
mv $@.tmp $@
12+
13+
.PHONY: clean
14+
clean:
15+
for file in $(json_files); do rm -f "$$file" "$$file.tmp"; done
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# JSON Validation
2+
3+
This folder contains a JSON schema for validation of openPMD files written as `.json` files.
4+
5+
## Usage
6+
7+
### Generating the JSON schema
8+
9+
For improved readability, maintainability and documentation purposes, the JSON schema is written in `.toml` format and needs to be "compiled" to `.json` files first before usage.
10+
To do this, the openPMD-api installs a tool named `openpmd-convert-toml-json` which can be used to convert between JSON and TOML files in both directions, e.g.:
11+
12+
```bash
13+
openpmd-convert-toml-json @series.toml > series.json
14+
```
15+
16+
A `Makefile` is provided in this folder to automate generating the needed JSON files from the TOML files.
17+
18+
### Verifying a file against the JSON schema
19+
20+
In theory, the JSON schema should be applicable by any JSON validator. This JSON schema is written in terms of multiple files however, and most validators require special care to properly set up the links between the single files. A Python script `check.py` is provided in this folder which sets up the [Python jsonschema](https://python-jsonschema.readthedocs.io) library and verifies a file against it, e.g.:
21+
22+
```bash
23+
./check.py path/to/my/dataset.json
24+
```
25+
26+
For further usage notes check the documentation of the script itself `./check.py --help`.
27+
28+
## Caveats
29+
30+
The openPMD standard is not entirely expressible in terms of a JSON schema:
31+
32+
* Many semantic dependencies, e.g., that the `position/x` and `position/y` vectors of a particle species need to be of the same size, or that the `axisLabels` have the same dimensionality as the dataset itself, will go unchecked.
33+
* The `meshesPath` is assumed to be `meshes/` and the `particlesPath` is assumed to be `particles/`. This dependency cannot be expressed.
34+
35+
While a large part of the openPMD standard can indeed be verified by checking against a static JSON schema, the standard is generally large enough to make this approach come to its limits. Verification of a JSON schema is similar to the use of a naive recursive-descent parser. Error messages may become unexpectedly verbose and not very informative, especially when parsing disjunctive statements such as "A Record is either a scalar Record Component or a vector of non-scalar Record Components". We have taken care to decide disjunctive statements early on, e.g. with json-schema's support for `if` statements, but error messages may in general become unwieldy even due to tiny mistakes far down in the parse tree.
36+
37+
The layout of attributes is assumed to be that which is created by the JSON backend of the openPMD-api. Both the longhand and shorthand forms are recognized:
38+
39+
```json
40+
"meshesPath": {
41+
"datatype": "STRING",
42+
"value": "meshes/"
43+
},
44+
"particlesPath": "particles/"
45+
```
46+
47+
For a custom-written verification of openPMD datasets, also consider using the [openPMD-validator](https://github.com/openPMD/openPMD-validator).

0 commit comments

Comments
 (0)