Skip to content

feat: Allow reading hive style partitioned parquet files with duckdb backend #10939

Open
@vspinu

Description

@vspinu

Is your feature request related to a problem?

Problem: Partitioning of parquet files is not recognized.

What is the motivation behind your request?

I have a partitioned directories of the form root/company=ABC/day=YYYY-MM-DD/df.parquet.

When I try to read those with duckdb_con.read_parquet("/path/to/root") it does work, but columns company and day are not recognized.

I figured out that the reading is done by the pyarrow dataset as the native duckdb fails to recognize the directory:

>>>     df = duck._read_parquet_duckdb_native(files, "tmp")
IOException: IO Error: No files found that match the pattern "/Users/vitalie/data/smart_slicer/100000"

pyarrow dataset reader has partitioning argument, but that one is not propagated when passed to duckdb_conread_parquet() because the native backend fails with wrong arg error.

Describe the solution you'd like

To be able to seamlessly read the hierarchical data storage by passing the root folder to the read_parquet function.

What version of ibis are you running?

'9.5.0'

What backend(s) are you using, if any?

DuckDB

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeatures or general enhancements

    Type

    No type

    Projects

    • Status

      backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions