-
Notifications
You must be signed in to change notification settings - Fork 94
Open
Description
After changes introduced in #932 creating a Dataset object will fail if the @context provided doesn't have a @language tag:
Minimal example:
{
"@context": {
"@vocab": "https://schema.org/",
"citeAs": "cr:citeAs",
"column": "cr:column",
"conformsTo": "dct:conformsTo",
"cr": "http://mlcommons.org/croissant/",
"data": {
"@id": "cr:data",
"@type": "@json"
},
"dataType": {
"@id": "cr:dataType",
"@type": "@vocab"
},
"dct": "http://purl.org/dc/terms/",
"examples": {
"@id": "cr:examples",
"@type": "@json"
},
"excludes": "cr:excludes",
"extract": "cr:extract",
"field": "cr:field",
"fileObject": "cr:fileObject",
"fileProperty": "cr:fileProperty",
"fileSet": "cr:fileSet",
"format": "cr:format",
"includes": "cr:includes",
"isLiveDataset": "cr:isLiveDataset",
"jsonPath": "cr:jsonPath",
"key": "cr:key",
"md5": "cr:md5",
"parentField": "cr:parentField",
"path": "cr:path",
"rai": "http://mlcommons.org/croissant/RAI/",
"recordSet": "cr:recordSet",
"references": "cr:references",
"regex": "cr:regex",
"repeated": "cr:repeated",
"replace": "cr:replace",
"sc": "https://schema.org/",
"separator": "cr:separator",
"source": "cr:source",
"subField": "cr:subField",
"transform": "cr:transform"
},
"@type": "sc:Dataset",
"name": "minimal_example",
"description": "This is a minimal example."
}
mlcroissant validate --jsonld minimal.jsonld
W0924 14:42:44.972785 131393281629056 rdf.py:87] WARNING: The JSON-LD `@context` is not standard. Refer to the official @context (e.g., from the example datasets in https://github.com/mlcommons/croissant/tree/main/datasets/1.0). The different keys are: {'@language', 'samplingRate'}
Traceback (most recent call last):
File "/home/adria/.pyenv/versions/ckan-dcat-310/bin/mlcroissant", line 7, in <module>
sys.exit(main())
File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/scripts/cli.py", line 32, in main
app.run(module.main)
File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/absl/app.py", line 316, in run
_run_main(main, args)
File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/absl/app.py", line 261, in _run_main
sys.exit(main(argv))
File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/scripts/validate.py", line 52, in main
mlc.Dataset(jsonld, debug=debug)
File "<string>", line 6, in __init__
File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/_src/datasets.py", line 89, in __post_init__
self.metadata = Metadata.from_file(ctx=ctx, file=self.jsonld)
File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/_src/structure_graph/nodes/metadata.py", line 460, in from_file
return cls.from_json(ctx=ctx, json_=json_)
File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/_src/structure_graph/nodes/metadata.py", line 470, in from_json
jsonld = expand_jsonld(json_, ctx=ctx)
File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/_src/core/json_ld.py", line 258, in expand_jsonld
recursively_populate_jsonld(entry_node, id_to_node, context)
File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/_src/core/json_ld.py", line 187, in recursively_populate_jsonld
and value[0].get("@language", context["@language"])
KeyError: '@language'
One could argue it is a best practice defining a @language tag as fallback, but it is not mandatory as part of the JSON-LD or Croissant spec, so perhaps this particular check could be relaxed.
Metadata
Metadata
Assignees
Labels
No labels