fix(duckdb): support materializing enum types to pyarrow #11214

NickCrews · 2025-05-14T19:47:06Z

See the added test, and the comment.

I'm not sure if this approach, of pa_table.cast(ibis_table.schema().to_pyarrow()), is too broad.
Instead of casting the entire pyarrow table to the expected schema, we could look for the specific case of
when the ibis type is string and the pyarrow type is dictionary<values=string, indices=[intlike]>), and just convert those columns. For instance, this approach could be masking some other semantic mistake that we are making. I assume that pyarrow is smart enough to not actually perform computation unless the cast is really required, so I don't think this should be computationally wasteful.

This also makes it much more explicit that to get to pandas, we first convert to pyarrow, then to pandas, further cementing pyarrow as our first-class citizen, before pandas. I think I remember seeing this as a desire in a different issue/comment. It certainly makes it easier to reason about consistencies between to_pyarrow() and to_pandas()

cpcloud

Thanks! I Beefed up the tests a bit. Ready to merge!

github-actions bot added the duckdb The DuckDB backend label May 14, 2025

NickCrews force-pushed the duckdb-enum-to-pyarrow branch from 3ca1f4d to 3cec096 Compare May 14, 2025 19:55

github-actions bot added the tests Issues or PRs related to tests label May 14, 2025

cpcloud force-pushed the duckdb-enum-to-pyarrow branch from 3cec096 to a4eab5d Compare June 10, 2025 14:40

cpcloud approved these changes Jun 10, 2025

View reviewed changes

feat(duckdb): handle enums in pandas conversion

6b23dae

cpcloud force-pushed the duckdb-enum-to-pyarrow branch from a4eab5d to 6b23dae Compare June 10, 2025 18:19

cpcloud enabled auto-merge (squash) June 10, 2025 18:33

cpcloud disabled auto-merge June 10, 2025 18:49

cpcloud merged commit d2f8839 into ibis-project:main Jun 10, 2025
107 of 110 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(duckdb): support materializing enum types to pyarrow #11214

fix(duckdb): support materializing enum types to pyarrow #11214

Uh oh!

NickCrews commented May 14, 2025 •

edited

Loading

Uh oh!

cpcloud left a comment

Uh oh!

Uh oh!

Uh oh!

fix(duckdb): support materializing enum types to pyarrow #11214

fix(duckdb): support materializing enum types to pyarrow #11214

Uh oh!

Conversation

NickCrews commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpcloud left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NickCrews commented May 14, 2025 •

edited

Loading