-
Notifications
You must be signed in to change notification settings - Fork 32
Added json schema #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Added json schema #347
Conversation
| @@ -0,0 +1,639 @@ | |||
| { | |||
| "$schema": "https://json-schema.org/draft/2020-12/schema", | |||
| "$id": "https://zarr-specs.readthedocs.io/v3/json-schema/array.json", | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to find a suitable URI for the schema itself. Ideally we can publish the schema on tags to this repository through CI/CD.
STAC uses https://schemas.stacspec.org//item-spec/json-schema/.json, for example https://schemas.stacspec.org/v1.1.0/item-spec/json-schema/item.json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thing is if that is resolvable. You can also use this prefix if do not plan to keep own domain 'forever' https://schemas.opengis.net/. As you'll see here not all JSON schemas has Id (xml namespaces were more disciplined here) and these which has not always use the host e.g. https://schemas.opengis.net/os-geojson/1.0/example-1-eo-collections.json
| "name": { | ||
| "type": "string", | ||
| "not": { | ||
| "anyOf": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a common pattern for all the extension objects (data_type, codec, etc.). We're using the oneOf keyword to ensure that the data type, say, matches exactly one data type definition.
To ensure that a data type like "bool" doesn't match against both the core bool data type and an extension data type, we need to prohibit extension types from shadowing a core data type.
| @@ -0,0 +1,58 @@ | |||
| { | |||
| "$schema": "https://json-schema.org/draft/2020-12/schema", | |||
| "$id": "https://zarr-specs.readthedocs.io/v3/json-schema/group.json", | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs a permanent URI.
|
this is awesome work tom! |
|
This is great work. It does seem rather unfortunate to have to list all of the ids defined in the core spec redundantly in order to exclude them as valid extension names. One idea would be to just pull in all of the schemas from zarr-extensions automatically (e.g. via a program that generates the schema), and disallow in the schema unknown IDs. We could update zarr-extensions to include separate schemas for the core ids also. That way almost everything could be pulled in just from zarr-extensions. |
My natural preference is for simple / dumb solutions. In this case I'm probably fine with repeating the names since CI should immediately fail if we add some new core object but forget to update the list of fields. I think it's impossible for these to get out of sync.
I wasn't aware of the But it'd be good to figure out some way to share what's already be done there (CI / tooling maybe?) with what's proposed here. I'll take a closer look when I get a chance. |
| "$schema": "https://json-schema.org/draft/2020-12/schema", | ||
| "$id": "https://zarr-specs.readthedocs.io/v3/json-schema/group.json", | ||
| "title": "Zarr v3 Group Metadata Schema", | ||
| "description": "JSON Schema for Zarr v3 Group metadata documents.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered factoring out the common part? https://json-schema.org/blog/posts/modelling-inheritance
It will be useful for the further schemas like CF profile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like array.json schema could use $ref top level to this file not to replicate definitions. Resolvable URI thath can be changed later would help here, indeed.
If something is missing from the list of exclusions then it will just also validate as an extension, meaning the configuration doesn't get checked. Additionally, if you make a typo in an identifier it will also just be considered an extension and validate successfully.
Putting the separate schemas in this repo instead would also be fine, or zarr-extensions could even be merged into this repo.
That repo basically has the complement of what you have here represented as a schema. |
Mmm here's what I had in mind: With a diff like this that "forgets" to add ❯ git diff
diff --git a/json-schema/array.json b/json-schema/array.json
index c9d2085..f69dd1c 100644
--- a/json-schema/array.json
+++ b/json-schema/array.json
@@ -559,7 +559,6 @@
"type": "string",
"not": {
"enum": [
- "default",
"v2"
]
}We get an error, thanks to that that matching both the But if I forget to add What do you think about a tool that checks that we didn't forget any keys, rather than generating the json-schema files? That sounds pretty straightforward to write and run in CI.
Yeah, that seems like a problem... But the |
|
Thanks, @TomAugspurger! Happy to help get the permanent, resolvable URI. (My instinct is to put all of this under a v3/ directory.) |
This adds a pair of json schema schemas to the repository. One for Array metadata and one for Group metadata.
For those unfamiliar with json-schema, it's a language for validating JSON documents. You write schemas (in JSON) and tools can validate JSON objects ("instances") against that schema. For example, the following Group would be flagged as invalid, because it lacks a
zarr_formatfield:{ "node_type": "group", "attributes": { "spam": "ham", "eggs": 42 } }The check-jsonschema tool can be used to validate this, but there are many alternative tools that could be used:
Note that this only validates metadata stored within the
zarr.jsonobjects. It has no bearing on the actual data in the chunk files.In addition to the schemas, I've included the metadata for a few examples, and have validated them against the json schema.
This is motivated by zarr-developers/geozarr-spec#72. geozarr can define its own json schema for the additional properties it adds.