Description
This originates downstream in python-jsonschema/check-jsonschema#376 . I was just able to look at the offending schema and trace things out enough to find the true cause. I've stuck with using the CLI for my reproducers, but I could work up some python for this if it would be useful.
check-jsonschema
is using jsonschema
+referencing
. It keeps the jsonschema
behavior of checking a schema against its metaschema. Therefore, the following schema is caught as invalid:
invalid-caught-by-metaschema
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"definitions": {
"foo": "invalid"
},
"properties": {
"foo": {
"$ref": "#/definitions/foo"
}
}
}
(Note how the definition for "foo"
is a string, rather than a schema.)
However, if the invalid definition is nested, as follows, the metaschema check is not sufficient:
invalid-not-caught-by-metaschema
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"definitions": {
"sub": {
"foo": "invalid"
}
},
"properties": {
"foo": {
"$ref": "#/definitions/sub/foo"
}
}
}
As a result, it is possible, with jsonschema
+referencing
, to be handling this in a validator. The next step is to try to use it.
With the check-jsonschema
CLI, we get the following trace (trimmed to relevant parts):
full-cli-traceback
$ check-jsonschema --schemafile badrefschema.json <(echo '{"foo": "bar"}')
Traceback (most recent call last):
<<<<<check-jsonschema invocation path shows here>>>>>
File "/home/sirosen/projects/jsonschema/check-jsonschema/src/check_jsonschema/checker.py", line 73, in _build_result
for err in validator.iter_errors(data):
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 368, in iter_errors
for error in errors:
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/_keywords.py", line 295, in properties
yield from validator.descend(
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 416, in descend
for error in errors:
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/_keywords.py", line 274, in ref
yield from validator._validate_reference(ref=ref, instance=instance)
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 410, in descend
for k, v in applicable_validators(schema):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/_legacy_keywords.py", line 17, in ignore_ref_siblings
ref = schema.get("$ref")
^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'
This shows an error in jsonschema
(not quite at referencing
yet!). I had to tinker around with things to find the right trigger conditions, but this seems to do the trick, as a nasty schema:
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"definitions": {
"sub": {
"foo": {
"type": "object",
"properties": {
"bar": "invalid"
}
}
}
},
"properties": {
"foo": {
"$ref": "#/definitions/sub/foo"
}
}
}
Now, trying to use it triggers an attempt to call .get()
on a string:
$ check-jsonschema --schemafile badrefschema.json <(echo '{"foo": {"bar": "baz"}}')
(trimmed trace)
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 408, in descend
resolver = self._resolver.in_subresource(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/referencing/_core.py", line 689, in in_subresource
id = subresource.id()
^^^^^^^^^^^^^^^^
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/referencing/_core.py", line 225, in id
id = self._specification.id_of(self.contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/referencing/jsonschema.py", line 50, in _legacy_dollar_id
id = contents.get("$id")
^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'
Proposed Resolution
Obviously, the best resolution is for such a schema to never get written, since it's invalid. 😁
And I'll be trying to advise the upstream provider of the schema against nesting things under definitions
, since it weakens the kind of checking which is possible.
But should various referencing
internals be more wary that an input may be a dict, a bool, or something unwanted/unexpected? Or should referencing
check for bad inputs more aggressively when loading data?
The goal here is to have a better and clearer error in this case, not to ignore the malformed schema.
I think the best thing here is that one of the levels of document loading in Referencing
checks if the value is dict | boolean
. If we only consider JSON Schema, it could be a validator on Resource.contents
. That makes the most sense to me, conceptually, for JSON Schema, but it ties in non-generic details of that spec (vs more general $ref
resolution). Perhaps JSONSchemaResource(Resource)
is the right thing, which adds said validation?
(NB: I'm reading a lot of the internals pretty quickly and for the first time, so my ideas here may not hold up.)