Skip to content

"Tutorial for mlcroissant 🥐" is outdated #965

Description

@jonas-hag

Hi, the example in the notebook https://github.com/mlcommons/croissant/blob/main/python/mlcroissant/recipes/introduction.ipynb is outdated.
Now, one gets 3 warnings when checking the metadata:

Found the following 3 warning(s) during the validation:
  -  [Metadata(gpt-3)] Property "https://schema.org/datePublished" is recommended, but does not exist.
  -  [Metadata(gpt-3)] Property "https://schema.org/license" is recommended, but does not exist.
  -  [Metadata(gpt-3)] Property "https://schema.org/version" is recommended, but does not exist.

When adding the datePublished filed with date_published=datetime.datetime.now(), the metadata can't be serialized anymore:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[48], [line 3](vscode-notebook-cell:?execution_count=48&line=3)
      1 with open("croissant.json", "w") as f:
      2     content = metadata.to_json()
----> [3](vscode-notebook-cell:?execution_count=48&line=3)     content = json.dumps(content, indent=2)
      4     print(content)
      5     f.write(content)

File ~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/__init__.py:238, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    232 if cls is None:
    233     cls = JSONEncoder
    234 return cls(
    235     skipkeys=skipkeys, ensure_ascii=ensure_ascii,
    236     check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    237     separators=separators, default=default, sort_keys=sort_keys,
--> [238](https://file+.vscode-resource.vscode-cdn.net/Users/jonas.hagenberg/Documents/Projects/Foundation_Models/metadata/notebooks/~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/__init__.py:238)     **kw).encode(obj)

File ~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/encoder.py:200, in JSONEncoder.encode(self, o)
    196         return encode_basestring(o)
    197 # This doesn't pass the iterator directly to ''.join() because the
    198 # exceptions aren't as detailed.  The list call should be roughly
    199 # equivalent to the PySequence_Fast that ''.join() would do.
--> [200](https://file+.vscode-resource.vscode-cdn.net/Users/jonas.hagenberg/Documents/Projects/Foundation_Models/metadata/notebooks/~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/encoder.py:200) chunks = self.iterencode(o, _one_shot=True)
    201 if not isinstance(chunks, (list, tuple)):
...
    179     """
--> [180](https://file+.vscode-resource.vscode-cdn.net/Users/jonas.hagenberg/Documents/Projects/Foundation_Models/metadata/notebooks/~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/encoder.py:180)     raise TypeError(f'Object of type {o.__class__.__name__} '
    181                     f'is not JSON serializable')

TypeError: Object of type datetime is not JSON serializable

This can be fixed by adding using content = json.dumps(content, indent=2, default=str).

Therefore, I suggest to adapt the example to the new behavior. I'm happy to contribute a PR either tomorrow or mid of November.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions