Skip to content

Cannot load Kaggle datasets #759

@marcenacp

Description

@marcenacp

Initially reported by @goeffthomas.

import tensorflow_datasets as tfds
builder = tfds.dataset_builders.CroissantBuilder(
    jsonld="https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud/croissant/download",
    file_format='array_record',
)
builder.download_and_prepare()
ds = builder.as_data_source()
print(ds['default'][0])

FWIW, even the demo code here doesn't seem to work: https://www.tensorflow.org/datasets/format_specific_dataset_builders#croissantbuilder_2

Addition by @marcenacp:

For me on the latest version of tfds-nightly, it even fails with another error:

**************************** WARNING *********************************
Warning: The dataset you're trying to generate is using Apache Beam,
yet no `beam_runner` nor `beam_options` was explicitly provided.

Some Beam datasets take weeks to generate, so are usually not suited
for single machine generation. Please have a look at the instructions
to setup distributed generation:

https://www.tensorflow.org/datasets/beam_datasets#generating_a_beam_dataset
**********************************************************************
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-3-445fac78df9a> in <cell line: 6>()
      4     file_format='array_record',
      5 )
----> 6 builder.download_and_prepare()
      7 ds = builder.as_data_source()
      8 print(ds['default'][0])

12 frames
/usr/lib/python3.10/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'apache_beam'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions