Skip to content

cr:FileObject not recognized by the validation script #975

@ihsaan-ullah

Description

@ihsaan-ullah

I have a metadata.json file with minimal croissant metadata for a dataset. The dataset has multiple zip files for which I am using cr:FileObject. When I validate the metadata.json file using mlcroissant python package, I get error for FileObject.

Here is how I validate my metadata.json

mlcroissant validate --jsonld metadata.json

Here is the error I get for FileObjec:

Found the following 9 error(s) during the validation:
  -  "ICLR2024_latest.zip" should have an attribute "@type": "https://schema.org/FileObject" or "@type": "https://schema.org/FileSet". Got https://mlcommons.org/croissant/1.0/FileObject instead.
  -  "ICLR2025_latest.zip" should have an attribute "@type": "https://schema.org/FileObject" or "@type": "https://schema.org/FileSet". Got https://mlcommons.org/croissant/1.0/FileObject instead.
  -  "ICML2024_latest.zip" should have an attribute "@type": "https://schema.org/FileObject" or "@type": "https://schema.org/FileSet". Got https://mlcommons.org/croissant/1.0/FileObject instead.
  -  "ICML2025_latest.zip" should have an attribute "@type": "https://schema.org/FileObject" or "@type": "https://schema.org/FileSet". Got https://mlcommons.org/croissant/1.0/FileObject instead.
  -  "NeurIPS2021_latest.zip" should have an attribute "@type": "https://schema.org/FileObject" or "@type": "https://schema.org/FileSet". Got https://mlcommons.org/croissant/1.0/FileObject instead.
  -  "NeurIPS2022_latest.zip" should have an attribute "@type": "https://schema.org/FileObject" or "@type": "https://schema.org/FileSet". Got https://mlcommons.org/croissant/1.0/FileObject instead.
  -  "NeurIPS2023_latest.zip" should have an attribute "@type": "https://schema.org/FileObject" or "@type": "https://schema.org/FileSet". Got https://mlcommons.org/croissant/1.0/FileObject instead.
  -  "NeurIPS2024_latest.zip" should have an attribute "@type": "https://schema.org/FileObject" or "@type": "https://schema.org/FileSet". Got https://mlcommons.org/croissant/1.0/FileObject instead.
  -  [Metadata(Academic Papers Dataset)] The name "Academic Papers Dataset" contains forbidden characters.

The last error about name is also interesting, what forbidden characters are there in the name? Maybe Dataset

Here is my complete metadata.json

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions