Open
Description
Is your feature request related to a problem?
Problem: Partitioning of parquet files is not recognized.
What is the motivation behind your request?
I have a partitioned directories of the form root/company=ABC/day=YYYY-MM-DD/df.parquet
.
When I try to read those with duckdb_con.read_parquet("/path/to/root")
it does work, but columns company
and day
are not recognized.
I figured out that the reading is done by the pyarrow dataset as the native duckdb fails to recognize the directory:
>>> df = duck._read_parquet_duckdb_native(files, "tmp")
IOException: IO Error: No files found that match the pattern "/Users/vitalie/data/smart_slicer/100000"
pyarrow dataset reader has partitioning
argument, but that one is not propagated when passed to duckdb_conread_parquet()
because the native backend fails with wrong arg error.
Describe the solution you'd like
To be able to seamlessly read the hierarchical data storage by passing the root folder to the read_parquet
function.
What version of ibis are you running?
'9.5.0'
What backend(s) are you using, if any?
DuckDB
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Type
Projects
Status
backlog