Skip to content

CroissantBuilder does not work on Windows machines #5546

Open
@zwouter

Description

Short description
When using a simple example code snippet of the CroissantBuilder to load datasets using the croissant format, it only seems to work on Linux.
The code snippet below correctly downloads and prepares a dataset on Collab, or WSL, but results in an error on Windows. All tested on a clean virtual environment.

Environment information

  • Operating System: Windows 11

  • Python version: 3.11.1

  • tensorflow-datasets/tfds-nightly version: tfds-nightly 4.9.6.dev202408050044

  • tensorflow/tf-nightly version: tensorflow 2.17.0

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ?
    Yes

Reproduction instructions

import mlcroissant as mlc
import tensorflow_datasets as tfds

url = "https://huggingface.co/api/datasets/fashion_mnist/croissant"
builder = tfds.core.dataset_builders.CroissantBuilder(jsonld=url, file_format='array_record')
builder.download_and_prepare()

Link to logs
https://pastebin.com/fRrfn8jj

Expected behavior
A dataset builder is prepared such that I can use .as_data_source() later.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions