Skip to content

python core dump when downloading dataset #7879

@hansewetz

Description

@hansewetz

Describe the bug

When downloading a dataset in streamed mode and exiting the program before the download completes, the python program core dumps when exiting:

terminate called without an active exception
Aborted (core dumped)

Tested with python 3.12.3, python 3.9.21

Steps to reproduce the bug

Create python venv:

python -m venv venv
./venv/bin/activate
pip install datasets==4.4.1

Execute the following program:

from datasets import load_dataset
ds = load_dataset("HuggingFaceFW/fineweb-2", 'hrv_Latn', split="test", streaming=True)
for sample in ds:
    break

Expected behavior

Clean program exit

Environment info

described above

note: the example works correctly when using datasets==3.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions