Skip to content

Allow kagglehub.dataset_load to load specific file from nested directory from a dataset #233

Open
@andrei-diaconescu

Description

@andrei-diaconescu

Having a dataset with the following directory structure:

data
├── day_1
├── day_3
├── hour_1
├── hour_12
├── hour_4
├── hour_8
├── minute_15
├── minute_30
├── minute_5
├── monthly
└── weekly

Where each child directory has hundreds of .csv files, I want to be able to only download/load any single file from any single directory, without downloading/loading the whole dataset.

The approach would be:
weekly_candles = kagglehub.dataset_load(KaggleDatasetAdapter.PANDAS, "andreidiaconescu/binancepricedata", "weekly/BTCUSDT.csv")

to which you would get the following response:

KaggleApiHTTPError: 404 Client Error.
Resource not found at URL: https://www.kaggle.com/datasets/andreidiaconescu/binancepricedata/versions/3
The server reported the following issues: Not found
Please make sure you specified the correct resource identifiers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions