Description
Short description
When using a simple example code snippet of the CroissantBuilder to load datasets using the croissant format, it only seems to work on Linux.
The code snippet below correctly downloads and prepares a dataset on Collab, or WSL, but results in an error on Windows. All tested on a clean virtual environment.
Environment information
-
Operating System: Windows 11
-
Python version: 3.11.1
-
tensorflow-datasets
/tfds-nightly
version: tfds-nightly 4.9.6.dev202408050044 -
tensorflow
/tf-nightly
version: tensorflow 2.17.0 -
Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ?
Yes
Reproduction instructions
import mlcroissant as mlc
import tensorflow_datasets as tfds
url = "https://huggingface.co/api/datasets/fashion_mnist/croissant"
builder = tfds.core.dataset_builders.CroissantBuilder(jsonld=url, file_format='array_record')
builder.download_and_prepare()
Link to logs
https://pastebin.com/fRrfn8jj
Expected behavior
A dataset builder is prepared such that I can use .as_data_source() later.
Activity