-
Notifications
You must be signed in to change notification settings - Fork 94
Open
Description
Hi, the example in the notebook https://github.com/mlcommons/croissant/blob/main/python/mlcroissant/recipes/introduction.ipynb is outdated.
Now, one gets 3 warnings when checking the metadata:
Found the following 3 warning(s) during the validation:
- [Metadata(gpt-3)] Property "https://schema.org/datePublished" is recommended, but does not exist.
- [Metadata(gpt-3)] Property "https://schema.org/license" is recommended, but does not exist.
- [Metadata(gpt-3)] Property "https://schema.org/version" is recommended, but does not exist.
When adding the datePublished filed with date_published=datetime.datetime.now(), the metadata can't be serialized anymore:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[48], [line 3](vscode-notebook-cell:?execution_count=48&line=3)
1 with open("croissant.json", "w") as f:
2 content = metadata.to_json()
----> [3](vscode-notebook-cell:?execution_count=48&line=3) content = json.dumps(content, indent=2)
4 print(content)
5 f.write(content)
File ~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/__init__.py:238, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
232 if cls is None:
233 cls = JSONEncoder
234 return cls(
235 skipkeys=skipkeys, ensure_ascii=ensure_ascii,
236 check_circular=check_circular, allow_nan=allow_nan, indent=indent,
237 separators=separators, default=default, sort_keys=sort_keys,
--> [238](https://file+.vscode-resource.vscode-cdn.net/Users/jonas.hagenberg/Documents/Projects/Foundation_Models/metadata/notebooks/~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/__init__.py:238) **kw).encode(obj)
File ~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/encoder.py:200, in JSONEncoder.encode(self, o)
196 return encode_basestring(o)
197 # This doesn't pass the iterator directly to ''.join() because the
198 # exceptions aren't as detailed. The list call should be roughly
199 # equivalent to the PySequence_Fast that ''.join() would do.
--> [200](https://file+.vscode-resource.vscode-cdn.net/Users/jonas.hagenberg/Documents/Projects/Foundation_Models/metadata/notebooks/~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/encoder.py:200) chunks = self.iterencode(o, _one_shot=True)
201 if not isinstance(chunks, (list, tuple)):
...
179 """
--> [180](https://file+.vscode-resource.vscode-cdn.net/Users/jonas.hagenberg/Documents/Projects/Foundation_Models/metadata/notebooks/~/.local/share/uv/python/cpython-3.13.4-macos-aarch64-none/lib/python3.13/json/encoder.py:180) raise TypeError(f'Object of type {o.__class__.__name__} '
181 f'is not JSON serializable')
TypeError: Object of type datetime is not JSON serializable
This can be fixed by adding using content = json.dumps(content, indent=2, default=str).
Therefore, I suggest to adapt the example to the new behavior. I'm happy to contribute a PR either tomorrow or mid of November.
Metadata
Metadata
Assignees
Labels
No labels