fix(duckdb): support materializing enum types to pyarrow #11214
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See the added test, and the comment.
I'm not sure if this approach, of
pa_table.cast(ibis_table.schema().to_pyarrow())
, is too broad.Instead of casting the entire pyarrow table to the expected schema, we could look for the specific case of
when the ibis type is
string
and the pyarrow type isdictionary<values=string, indices=[intlike]>)
, and just convert those columns. For instance, this approach could be masking some other semantic mistake that we are making. I assume that pyarrow is smart enough to not actually perform computation unless the cast is really required, so I don't think this should be computationally wasteful.This also makes it much more explicit that to get to pandas, we first convert to pyarrow, then to pandas, further cementing pyarrow as our first-class citizen, before pandas. I think I remember seeing this as a desire in a different issue/comment. It certainly makes it easier to reason about consistencies between to_pyarrow() and to_pandas()