Python Polars 1.31.0-beta.1
Pre-release
Pre-release
💥 Breaking changes
- Remove old streaming engine (#23103)
⚠️ Deprecations
- Deprecate
allow_missing_columnsinscan_parquetin favor ofmissing_columns(#22784)
🚀 Performance improvements
- Improve streaming groupby CSE (#23092)
- Move row index materialization in post-apply to occur after slicing (#22995)
- Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
- Don't go through row encoding for most types on
index_of(#22903) - Optimise low-level
nullscans andarg_maxfor bools (when chunked) (#22897) - Optimize multiscan performance (#22886)
✨ Enhancements
- DataType expressions in Python (#23167)
- Native implementation for Iceberg positional deletes (#23091)
- Remove old streaming engine (#23103)
- Basic implementation of
DataTypeExprin Rust DSL (#23049) - Add
required: booltoParquetFieldOverwrites(#23013) - Support serializing
name.map_fields(#22997) - Support serializing
Expr::RenameAlias(#22988) - Remove duplicate verbose logging from
FetchedCredentialsCache(#22973) - Add
keyscolumn infinish_callback(#22968) - Add
extra_columnsparameter toscan_parquet(#22699) - Add CORR function to polars SQL (#22690)
- Add per partition sort and finish callback to sinks (#22789)
- Support descendingly-sorted values in search_sorted() (#22825)
- Derive DSL schema (#22866)
🐞 Bug fixes
- Fix panic reading empty parquet with multiple boolean columns (#23159)
- Raise ComputeError instead of panicking in
truncatewhen mixing month/week/day/sub-daily units (#23176) - Materialize
list.evalwith unknown type (#23186) - Only set sorting flag for 1st column with PQ SortingColumns (#23184)
- Typo in AExprBuilder (#23171)
- Null return from var/std on scalar column (#23158)
- Support Datetime broadcast in
list.concat(#23137) - Ensure projection pushdown maintains right table schema (#22603)
- Add Null dtype support to arg_sort_by (#23107)
- Raise error by default on invalid CSV quotes (#22876)
- Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
- Fix hive partition pruning not filtering out
__HIVE_DEFAULT_PARTITION__(#23074) - Fix
AssertionErrorwhen usingscan_delta()on AWS withstorage_options(#23076) - Fix deadlock on
collect(background=True)/collect_concurrently()(#23075) - Incorrect null count in rolling_min/max (#23073)
- Preserve
file://in LazyFrame node traverser (#23072) - Respect column order in
register_io_sourceschema (#23057) - Don't call unnest for objects implementing
__arrow_c_array__(#23069) - Incorrect output when using
sortwithgroup_byandcum_sum(#23001) - Implement owned arithmetic for Int128 (#23055)
- Do not schema-match structs with different field counts (#23018)
- Fix confusing error message on duplicate row_index (#23043)
- Add
include_nullstoAgg::CountCSE check (#23032) - View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
- Fix incorrect result selecting
pl.len()fromscan_csvwithskip_lines(#22949) - Allow for IO plugins with reordered columns in streaming (#22987)
- Method
str.zfillwas inconsistent with Python and pandas when string contained leading '+' (#22985) - Integer underflow in
propagate_nulls(#22986) - Setting
compat_level=0forsink_ipc(#22960) - Narrow return type for
DataType.is_, improve Pyright's type completeness from 69% to 95% (#22962) - Support arrow Decimal32 and Decimal64 types (#22954)
- Guard against dictionaries being passed to projection keywords (#22928)
- Update arrow format (#22941)
- Fix filter pushdown to IO plugins (#22910)
- Improve numeric stability rolling_mean<f32> (#22944)
- Guard against invalid nested objects in 'map_elements' (#22932)
- Allow subclasses in type equality checking (#22915)
- Return early in
pl.Expr.__array_ufunc__when only single input (#22913) - Add inline implodes in type coercion (#22885)
- Add {top, bottom}_k_by to Series (#22902)
- Correct
int_rangesto raise error on invalid inputs (#22894) - Don't silently overflow for temporal casts (#22901)
- Fix error using
write_csvwithstorage_options(#22881) - Schema resolution
.over(mapping_strategy="join")with non-aggregations (#22875) - Ensure rename behaves the same as select (#22852)
📖 Documentation
- Document aggregations that return identity when there's no non-null values, suggest workaround for those who want SQL-standard behaviour (#23143)
- Fix reference to non-existent
Expr.replace_allinreplace_strictdocs (#23144) - Fix typo on pandas comparison page (#23123)
- Minor improvement to
cum_countdocstring example (#23099) - Add missing
DataFrame.__setitem__to API reference (#22938) - Add missing entry for LazyFrame
__getitem__(#22924) - Add missing
top_k_byandbottom_k_bytoSeriesreference (#22917)
📦 Build system
- Update
pyo3andnumpycrates to version0.25(#22763) - Actually disable
ir_serdeby default (#23046) - Add a feature flag for
serde_ignored(#22957) - Fix warnings, update DSL version and schema hash (#22953)
🛠️ Other improvements
- Added more descriptive error message by replacing
FixedSizeListwithArray(#23168) - Connect Python
assert_series_equal()to Rust back-end (#23141) - Refactor skip_batches to use AExprBuilder (#23147)
- Use
ir_serdeinstead ofserdeforIRFunctionExpr(#23148) - Separate
FunctionExprandIRFunctionExpr(#23140) - Remove
AExpr::Alias(#23070) - Add components for Iceberg deletion file support (#23059)
- Feature gate
StructFunction::JsonEncode(#23060) - Propagate iceberg position delete information to IR (#23045)
- Add environment variable to get Parquet decoding metrics (#23052)
- Turn
pl.cumulative_evalinto its ownAExpr(#22994) - Add make test-streaming (#23044)
- Move scan parameter parsing for parquet to reusable function (#23019)
- Prepare deltalake 1.0 (#22931)
- Implement
Hashand useSpecialEqforRenameAliasFn(#22989) - Turn
list.evalinto anAExpr(#22911) - Fix CI for latest pandas-stubs release (#22971)
- Add a CI check for DSL schema changes (#22898)
- Add schema parameters to
expr.meta(#22906) - Update rust toolchain in nix flake (#22905)
- Update toolchain (#22859)
Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst and @thomasfrederikhoeck