Skip to content

"Tutorial for mlcroissant 🥐" is outdated #965

@jonas-hag

Description

@jonas-hag

Hi, the example in the notebook https://github.com/mlcommons/croissant/blob/main/python/mlcroissant/recipes/introduction.ipynb is outdated.
Now, one gets 3 warnings when checking the metadata:

Found the following 3 warning(s) during the validation:
  -  [Metadata(gpt-3)] Property "https://schema.org/datePublished" is recommended, but does not exist.
  -  [Metadata(gpt-3)] Property "https://schema.org/license" is recommended, but does not exist.
  -  [Metadata(gpt-3)] Property "https://schema.org/version" is recommended, but does not exist.

When adding the datePublished filed with date_published=datetime.datetime.now(), the metadata can't be serialized anymore:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[48], [line 3](vscode-notebook-cell:?execution_count=48&line=3)
      1 with open("croissant.json", "w") as f:
      2     content = metadata.to_json()
----> [3](vscode-notebook-cell:?execution_count=48&line=3)     content = json.dumps(content, indent=2)
      4     print(content)
      5     f.write(content)

File ~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/__init__.py:238, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    232 if cls is None:
    233     cls = JSONEncoder
    234 return cls(
    235     skipkeys=skipkeys, ensure_ascii=ensure_ascii,
    236     check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    237     separators=separators, default=default, sort_keys=sort_keys,
--> [238](https://file+.vscode-resource.vscode-cdn.net/Users/jonas.hagenberg/Documents/Projects/Foundation_Models/metadata/notebooks/~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/__init__.py:238)     **kw).encode(obj)

File ~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/encoder.py:200, in JSONEncoder.encode(self, o)
    196         return encode_basestring(o)
    197 # This doesn't pass the iterator directly to ''.join() because the
    198 # exceptions aren't as detailed.  The list call should be roughly
    199 # equivalent to the PySequence_Fast that ''.join() would do.
--> [200](https://file+.vscode-resource.vscode-cdn.net/Users/jonas.hagenberg/Documents/Projects/Foundation_Models/metadata/notebooks/~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/encoder.py:200) chunks = self.iterencode(o, _one_shot=True)
    201 if not isinstance(chunks, (list, tuple)):
...
    179     """
--> [180](https://file+.vscode-resource.vscode-cdn.net/Users/jonas.hagenberg/Documents/Projects/Foundation_Models/metadata/notebooks/~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/encoder.py:180)     raise TypeError(f'Object of type {o.__class__.__name__} '
    181                     f'is not JSON serializable')

TypeError: Object of type datetime is not JSON serializable

This can be fixed by adding using content = json.dumps(content, indent=2, default=str).

Therefore, I suggest to adapt the example to the new behavior. I'm happy to contribute a PR either tomorrow or mid of November.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions