Fails to load dataset

Since no `default` dataset config is published, and I would like to iterate on diverse data, I tried:
```py
from datasets import get_dataset_config_names, load_dataset, interleave_datasets

configs = get_dataset_config_names("HuggingFaceFW/fineweb-2")
print(configs)

streams = [
    load_dataset("HuggingFaceFW/fineweb-2", c, split="train", streaming=True)
    for c in configs
]

# Option A: round-robin (equal mixing across languages)
ds = interleave_datasets(streams, seed=42)

# ds is now an IterableDataset; languages are naturally mixed as you iterate.
for ex in ds.take(3):
    print(ex.keys())
```

This prints all configs (`['aai_Latn', 'aak_Latn', 'aau_Latn', 'aaz_Latn',....`)
and then:
> ValueError: At least one valid data file must be specified, all the data_files are invalid: {'test': [], 'train': ['hf://datasets/HuggingFaceFW/fineweb-2@af9c13333eb981300149d5ca60a8e9d659b276b9/data/abi_Latn/train/000_00000.parquet']}


Minimally:
```py
from datasets import load_dataset

ds = load_dataset("HuggingFaceFW/fineweb-2", "abi_Latn", split="train", streaming=True)
ds.take(1)
```

Works on my mac, fails on my server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails to load dataset #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fails to load dataset #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions