Skip to content

Conversation

@franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Apr 18, 2023

TODO:

  • Define particlePatches
  • Fill in things like "description", "type", "title", ...
  • Maybe add support for abbreviated JSON representations Not now
  • Dataset layout

I have added a CLI tool openpmd-convert_json_toml, exposing internally existing functionality to convert between TOML and JSON. I think it's better to define and maintain the schema in terms of TOML (it can be commented and is more legible), and just compile it to JSON for verification.

x-ref #1436

@ax3l ax3l self-requested a review April 24, 2023 20:36
@ax3l ax3l self-assigned this Apr 24, 2023
@franzpoeschel franzpoeschel force-pushed the json-schema branch 3 times, most recently from 199a223 to 5b09f81 Compare May 31, 2023 09:53
@franzpoeschel franzpoeschel force-pushed the json-schema branch 3 times, most recently from b6b057b to e99b0fb Compare June 26, 2023 14:35
@franzpoeschel franzpoeschel force-pushed the json-schema branch 3 times, most recently from 3fb1ea9 to e866604 Compare August 10, 2023 09:10
Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thank you!

Wondering a bit if validation of an XML backend would be easier.... not sure.

Comment on lines +259 to +282
# We need to exclude the thetaMode example since that has a different
# meshesPath and the JSON schema needs to hardcode that.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to patch this in check.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very easily. The JSON schema is on the file system and the single .json files refer to each other by their file names. Changing this would require (1) traversing the entire JSON schema and overriding the meshes path, the particles path and the references and (2) somehow setting up python-jsonschema to cross-reference in-memory schemas which I don't even know if it supports that, both at runtime of check.py.

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Aug 17, 2023

Wondering a bit if validation of an XML backend would be easier.... not sure.

The major issue of the JSON schema approach is that it is not a predictive parser (https://en.wikipedia.org/wiki/Recursive_descent_parser), meaning that it deals with {"anyOf": [<option1>, <option2>, ...]}-like statements very naively. If there is an error somewhere deep down the hierarchy, the JSON schema just sees that all options have failed and cannot give any more details where the failure was.

If we want to seriously consider XML as an alternative, my opinion is that it must at least solve this particular issue .. which I doubt it can properly, since this involves a bit of parser theory and grammar transformation.

This job is better done by handwritten parsers, which the openPMD-api and the openPMD-validator are.

However, if we add an XML backend at some point, an XML schema is definitely at least worth a consideration.

Written as .toml files for ease of documentation, maintailability and
readability.
Needed for "compiling" the schema to JSON
Also add a Makefile to further simplify this
The JSON schema verification package does not like that
Both of the form "data not found in places where data was expected"
Verify all JSON-openPMD files written by testing against the schema
1. Support UNDEFINED datasets in template mode
2. gridUnitSI may now be a vector
Apparently it's better to make everything 100 times more complicated
anyOf and oneOf now only used for trivial distinctions, this makes
schemas much more robust since errors can be caught early and error
messages become actually useful.
Otherwise CI thinks this is an openPMD file
openpmd_convert-json-toml @series.toml > series.json
```

A `Makefile` is provided in this folder to simplify the application of this conversion tool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makefile? Isn't this just built with CMake as one of the CLI tools?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. The makefile does the conversion of the schema toml files

Copy link
Member

@ax3l ax3l Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A `Makefile` is provided in this folder to simplify the application of this conversion tool.
A `Makefile` is provided in this folder to simplify the application of this conversion tool to the `.toml` files in this folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a bit clearer: A Makefile is provided in this folder to automate generating the needed JSON files from the TOML files.
?

pipe
)
set(openPMD_PYTHON_CLI_MODULE_NAMES ${openPMD_CLI_TOOL_NAMES})
set(openPMD_PYTHON_CLI_MODULE_NAMES ls)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line change looks like a hack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Until now, both openPMD_CLI_TOOL_NAMES and openPMD_PYTHON_CLI_MODULE_NAMES were identical since both contained only openpmd-ls.
But they're not identical in general; and now, we additionally have openpmd-convert-json-toml as a CLI tool, but it is not written in Python.

CMakeLists.txt Outdated
# command line tools
set(openPMD_CLI_TOOL_NAMES
ls
convert-json-toml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add this to setup.py -> entry_points -> console_scripts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a Python module, but written in C++, so I doubt that will work?

@@ -0,0 +1,15 @@
convert := openpmd-convert-json-toml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming it openpmd-convert-toml-json seems to be more intuitive for what we use it here, but you mentioned anyway that it works bi-directional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bi-directional, but we can flip the name, since that's the intended use here.

* Many semantic dependencies, e.g. that the `position/x` and `position/y` vector of a particle species be of the same size, or that the `axisLabels` have the same dimensionality as the dataset itself, will go unchecked.
* The `meshesPath` is assumed to be `meshes/` and the `particlesPath` is assumed to be `particles/`. This dependency cannot be expressed.

While a large part of the openPMD standard can indeed be verified by checking against a JSON schema, the standard is generally large enough to make this approach come to its limits. Verification of a JSON schema is similar to the use of a naive recursive-descent parser. Error messages will often be unexpectedly verbose and not very informative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might clarify a bit:

Suggested change
While a large part of the openPMD standard can indeed be verified by checking against a JSON schema, the standard is generally large enough to make this approach come to its limits. Verification of a JSON schema is similar to the use of a naive recursive-descent parser. Error messages will often be unexpectedly verbose and not very informative.
While a large part of the openPMD standard can indeed be verified by checking against a static JSON schema, the standard is generally large enough to make this approach come to its limits. Verification of a JSON schema is similar to the use of a naive recursive-descent parser. Error messages will often be unexpectedly verbose and not very informative.

}
```

Support for an abbreviated notation such as `"meshesPath": "meshes/"` is currently not (yet) available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section, you might want to advertise & link openPMD-validator again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. We might additionally advertise the JSON schema in the validator. Adding the JSON validation to our CI brought up a number of bugs after all.

Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@ax3l ax3l merged commit d4f3e2a into openPMD:dev Aug 6, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants