fix: Fix to_pandas() on empty enum Series did not preserve enum dictionary#26610
fix: Fix to_pandas() on empty enum Series did not preserve enum dictionary#26610nameexhaustion wants to merge 6 commits intomainfrom
to_pandas() on empty enum Series did not preserve enum dictionary#26610Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #26610 +/- ##
==========================================
+ Coverage 80.76% 81.39% +0.63%
==========================================
Files 1795 1795
Lines 244990 244992 +2
Branches 3079 3079
==========================================
+ Hits 197871 199423 +1552
+ Misses 46333 44783 -1550
Partials 786 786 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| if pa.types.is_dictionary(pa_arr.type): | ||
| pa_arr = pa_arr.cast(pa.dictionary(pa.int64(), pa.large_string())) | ||
| pa_dtype = pa.dictionary(pa.int64(), pa.large_string()) | ||
| # Forcibly have at least 1 row as otherwise cast makes |
There was a problem hiding this comment.
What is this cast? Is this a pyarrow cast? Is this not supposed to be fixed upstream? Seems rather hacky all this.
There was a problem hiding this comment.
Yes, pyarrow cast, indeed very hacky.
Not sure this would be considered a bug in pyarrow; I don't think they promise to preserve the dictionary across operations - it's similar to our old categoricals in that every chunk has its own local mapping.
I've pushed an update where we instead just dispatch to DataFrame.to_pandas, which does this cast using our own cast kernels in polars-compute.
Uh oh!
There was an error while loading. Please reload this page.