Skip to content

Latest version of the package expects a @language tag to be in the @context #950

@amercader

Description

@amercader

After changes introduced in #932 creating a Dataset object will fail if the @context provided doesn't have a @language tag:

Minimal example:

{
    "@context": {
        "@vocab": "https://schema.org/",
        "citeAs": "cr:citeAs",
        "column": "cr:column",
        "conformsTo": "dct:conformsTo",
        "cr": "http://mlcommons.org/croissant/",
        "data": {
            "@id": "cr:data",
            "@type": "@json"
        },
        "dataType": {
            "@id": "cr:dataType",
            "@type": "@vocab"
        },
        "dct": "http://purl.org/dc/terms/",
        "examples": {
            "@id": "cr:examples",
            "@type": "@json"
        },
        "excludes": "cr:excludes",
        "extract": "cr:extract",
        "field": "cr:field",
        "fileObject": "cr:fileObject",
        "fileProperty": "cr:fileProperty",
        "fileSet": "cr:fileSet",
        "format": "cr:format",
        "includes": "cr:includes",
        "isLiveDataset": "cr:isLiveDataset",
        "jsonPath": "cr:jsonPath",
        "key": "cr:key",
        "md5": "cr:md5",
        "parentField": "cr:parentField",
        "path": "cr:path",
        "rai": "http://mlcommons.org/croissant/RAI/",
        "recordSet": "cr:recordSet",
        "references": "cr:references",
        "regex": "cr:regex",
        "repeated": "cr:repeated",
        "replace": "cr:replace",
        "sc": "https://schema.org/",
        "separator": "cr:separator",
        "source": "cr:source",
        "subField": "cr:subField",
        "transform": "cr:transform"
    },

  "@type": "sc:Dataset",
  "name": "minimal_example",
  "description": "This is a minimal example."
}

mlcroissant validate --jsonld minimal.jsonld
W0924 14:42:44.972785 131393281629056 rdf.py:87] WARNING: The JSON-LD `@context` is not standard. Refer to the official @context (e.g., from the example datasets in https://github.com/mlcommons/croissant/tree/main/datasets/1.0). The different keys are: {'@language', 'samplingRate'}
Traceback (most recent call last):
  File "/home/adria/.pyenv/versions/ckan-dcat-310/bin/mlcroissant", line 7, in <module>
    sys.exit(main())
  File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/scripts/cli.py", line 32, in main
    app.run(module.main)
  File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
  File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/scripts/validate.py", line 52, in main
    mlc.Dataset(jsonld, debug=debug)
  File "<string>", line 6, in __init__
  File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/_src/datasets.py", line 89, in __post_init__
    self.metadata = Metadata.from_file(ctx=ctx, file=self.jsonld)
  File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/_src/structure_graph/nodes/metadata.py", line 460, in from_file
    return cls.from_json(ctx=ctx, json_=json_)
  File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/_src/structure_graph/nodes/metadata.py", line 470, in from_json
    jsonld = expand_jsonld(json_, ctx=ctx)
  File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/_src/core/json_ld.py", line 258, in expand_jsonld
    recursively_populate_jsonld(entry_node, id_to_node, context)
  File "/home/adria/.pyenv/versions/3.10.16/envs/ckan-dcat-310/lib/python3.10/site-packages/mlcroissant/_src/core/json_ld.py", line 187, in recursively_populate_jsonld
    and value[0].get("@language", context["@language"])
KeyError: '@language'

One could argue it is a best practice defining a @language tag as fallback, but it is not mandatory as part of the JSON-LD or Croissant spec, so perhaps this particular check could be relaxed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions