Rust Polars 0.49.0
·
48 commits
to 071bd761ae1188924ee2cdf3b526e88e3476bdc2
since this release
💥 Breaking changes
- Remove old streaming engine (#23103)
🚀 Performance improvements
- Improve streaming groupby CSE (#23092)
- Move row index materialization in post-apply to occur after slicing (#22995)
- Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
- Don't go through row encoding for most types on
index_of(#22903) - Optimise low-level
nullscans andarg_maxfor bools (when chunked) (#22897) - Optimize multiscan performance (#22886)
✨ Enhancements
- Native implementation for Iceberg positional deletes (#23091)
- Remove old streaming engine (#23103)
- Make match_chunks public (#23101)
- Implement StructFunction expressions in into_py (#23022)
- Basic implementation of
DataTypeExprin Rust DSL (#23049) - Add
required: booltoParquetFieldOverwrites(#23013) - Support serializing
name.map_fields(#22997) - Support serializing
Expr::RenameAlias(#22988) - Remove duplicate verbose logging from
FetchedCredentialsCache(#22973) - Add
keyscolumn infinish_callback(#22968) - Add
extra_columnsparameter toscan_parquet(#22699) - Add CORR function to polars SQL (#22690)
- Add per partition sort and finish callback to sinks (#22789)
- Add and test DataFrame equality functionality (#22865)
- Support descendingly-sorted values in search_sorted() (#22825)
- Derive DSL schema (#22866)
🐞 Bug fixes
- Restrict custom
aggregate_functioninpivottopl.element()(#23155) - Don't leak
SourceTokenin in-memory sink linearize (#23201) - Fix panic reading empty parquet with multiple boolean columns (#23159)
- Raise ComputeError instead of panicking in
truncatewhen mixing month/week/day/sub-daily units (#23176) - Materialize
list.evalwith unknown type (#23186) - Only set sorting flag for 1st column with PQ SortingColumns (#23184)
- Typo in AExprBuilder (#23171)
- Null return from var/std on scalar column (#23158)
- Support Datetime broadcast in
list.concat(#23137) - Ensure projection pushdown maintains right table schema (#22603)
- Don't create i128 scalars if dtype-128 is not set (#23118)
- Add Null dtype support to arg_sort_by (#23107)
- Raise error by default on invalid CSV quotes (#22876)
- Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
- Fix hive partition pruning not filtering out
__HIVE_DEFAULT_PARTITION__(#23074) - Fix
AssertionErrorwhen usingscan_delta()on AWS withstorage_options(#23076) - Fix deadlock on
collect(background=True)/collect_concurrently()(#23075) - Incorrect null count in rolling_min/max (#23073)
- Preserve
file://in LazyFrame node traverser (#23072) - Respect column order in
register_io_sourceschema (#23057) - Incorrect output when using
sortwithgroup_byandcum_sum(#23001) - Implement owned arithmetic for Int128 (#23055)
- Do not schema-match structs with different field counts (#23018)
- Fix confusing error message on duplicate row_index (#23043)
- Add
include_nullstoAgg::CountCSE check (#23032) - View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
- Fix incorrect
size_hint()forFlatIter(#23010) - Fix incorrect result selecting
pl.len()fromscan_csvwithskip_lines(#22949) - Allow for IO plugins with reordered columns in streaming (#22987)
- Method
str.zfillwas inconsistent with Python and pandas when string contained leading '+' (#22985) - Integer underflow in
propagate_nulls(#22986) - Fix cum_min and cum_max does not preserve inf or -inf values at series start (#22896)
- Setting
compat_level=0forsink_ipc(#22960) - Support arrow Decimal32 and Decimal64 types (#22954)
- Update arrow format (#22941)
- Fix filter pushdown to IO plugins (#22910)
- Improve numeric stability rolling_mean<f32> (#22944)
- Allow subclasses in type equality checking (#22915)
- Return early in
pl.Expr.__array_ufunc__when only single input (#22913) - Add inline implodes in type coercion (#22885)
- Correct
int_rangesto raise error on invalid inputs (#22894) - Set the sorted flag on Array after it is sorted (#22822)
- Don't silently overflow for temporal casts (#22901)
- Fix error using
write_csvwithstorage_options(#22881) - Schema resolution
.over(mapping_strategy="join")with non-aggregations (#22875) - Ensure rename behaves the same as select (#22852)
📖 Documentation
- Update when_then in user guide (#23245)
- Minor improvement to
cum_countdocstring example (#23099) - Add missing entry for LazyFrame
__getitem__(#22924)
📦 Build system
- Actually disable
ir_serdeby default (#23046) - Add a feature flag for
serde_ignored(#22957) - Fix warnings, update DSL version and schema hash (#22953)
🛠️ Other improvements
- Update Rust Polars versions (#23229)
- Change flake to use venv (#23219)
- Add
default_allocfeature topy-polars(#23202) - Added more descriptive error message by replacing
FixedSizeListwithArray(#23168) - Connect Python
assert_series_equal()to Rust back-end (#23141) - Refactor skip_batches to use AExprBuilder (#23147)
- Use
ir_serdeinstead ofserdeforIRFunctionExpr(#23148) - Separate
FunctionExprandIRFunctionExpr(#23140) - Improve Series equality functionality and prepare for Python integration (#23136)
- Add PolarsPhysicalType and use it to dispatch into_series (#23080)
- Remove
AExpr::Alias(#23070) - Add components for Iceberg deletion file support (#23059)
- Feature gate
StructFunction::JsonEncode(#23060) - Propagate iceberg position delete information to IR (#23045)
- Add environment variable to get Parquet decoding metrics (#23052)
- Turn
pl.cumulative_evalinto its ownAExpr(#22994) - Add make test-streaming (#23044)
- Move scan parameter parsing for parquet to reusable function (#23019)
- Use a ref-counted
UniqueIdinstead ofusizeforcache_id(#22984) - Implement
Hashand useSpecialEqforRenameAliasFn(#22989) - Turn
list.evalinto anAExpr(#22911) - Only check for unknown DSL fields if minor is higher (#22970)
- Don't enable
ir_serdetogether withserde(#22969) - Make dtype field on Logical non-optional (#22966)
- Add new (Frozen)Categories and CategoricalMapping (#22956)
- Add a CI check for DSL schema changes (#22898)
- Add schema parameters to
expr.meta(#22906) - Update rust toolchain in nix flake (#22905)
- Update toolchain (#22859)
Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @math-hiyoko, @mcrumiller, @mrkn, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst, @thomasfrederikhoeck and @zyctree