Releases: pola-rs/polars
Python Polars 1.31.0
💥 Breaking changes
- Remove old streaming engine (#23103)
⚠️ Deprecations
- Deprecate
allow_missing_columnsinscan_parquetin favor ofmissing_columns(#22784)
🚀 Performance improvements
- Improve streaming groupby CSE (#23092)
- Move row index materialization in post-apply to occur after slicing (#22995)
- Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
- Don't go through row encoding for most types on
index_of(#22903) - Optimise low-level
nullscans andarg_maxfor bools (when chunked) (#22897) - Optimize multiscan performance (#22886)
✨ Enhancements
- DataType expressions in Python (#23167)
- Native implementation for Iceberg positional deletes (#23091)
- Remove old streaming engine (#23103)
- Basic implementation of
DataTypeExprin Rust DSL (#23049) - Add
required: booltoParquetFieldOverwrites(#23013) - Support serializing
name.map_fields(#22997) - Support serializing
Expr::RenameAlias(#22988) - Remove duplicate verbose logging from
FetchedCredentialsCache(#22973) - Add
keyscolumn infinish_callback(#22968) - Add
extra_columnsparameter toscan_parquet(#22699) - Add CORR function to polars SQL (#22690)
- Add per partition sort and finish callback to sinks (#22789)
- Support descendingly-sorted values in search_sorted() (#22825)
- Derive DSL schema (#22866)
🐞 Bug fixes
- Remove axis in
show_graph(#23218) - Remove axis ticks in
show_graph(#23210) - Restrict custom
aggregate_functioninpivottopl.element()(#23155) - Don't leak
SourceTokenin in-memory sink linearize (#23201) - Fix panic reading empty parquet with multiple boolean columns (#23159)
- Raise ComputeError instead of panicking in
truncatewhen mixing month/week/day/sub-daily units (#23176) - Materialize
list.evalwith unknown type (#23186) - Only set sorting flag for 1st column with PQ SortingColumns (#23184)
- Typo in AExprBuilder (#23171)
- Null return from var/std on scalar column (#23158)
- Support Datetime broadcast in
list.concat(#23137) - Ensure projection pushdown maintains right table schema (#22603)
- Add Null dtype support to arg_sort_by (#23107)
- Raise error by default on invalid CSV quotes (#22876)
- Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
- Fix hive partition pruning not filtering out
__HIVE_DEFAULT_PARTITION__(#23074) - Fix
AssertionErrorwhen usingscan_delta()on AWS withstorage_options(#23076) - Fix deadlock on
collect(background=True)/collect_concurrently()(#23075) - Incorrect null count in rolling_min/max (#23073)
- Preserve
file://in LazyFrame node traverser (#23072) - Respect column order in
register_io_sourceschema (#23057) - Don't call unnest for objects implementing
__arrow_c_array__(#23069) - Incorrect output when using
sortwithgroup_byandcum_sum(#23001) - Implement owned arithmetic for Int128 (#23055)
- Do not schema-match structs with different field counts (#23018)
- Fix confusing error message on duplicate row_index (#23043)
- Add
include_nullstoAgg::CountCSE check (#23032) - View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
- Fix incorrect result selecting
pl.len()fromscan_csvwithskip_lines(#22949) - Allow for IO plugins with reordered columns in streaming (#22987)
- Method
str.zfillwas inconsistent with Python and pandas when string contained leading '+' (#22985) - Integer underflow in
propagate_nulls(#22986) - Setting
compat_level=0forsink_ipc(#22960) - Narrow return type for
DataType.is_, improve Pyright's type completeness from 69% to 95% (#22962) - Support arrow Decimal32 and Decimal64 types (#22954)
- Guard against dictionaries being passed to projection keywords (#22928)
- Update arrow format (#22941)
- Fix filter pushdown to IO plugins (#22910)
- Improve numeric stability rolling_mean<f32> (#22944)
- Guard against invalid nested objects in 'map_elements' (#22932)
- Allow subclasses in type equality checking (#22915)
- Return early in
pl.Expr.__array_ufunc__when only single input (#22913) - Add inline implodes in type coercion (#22885)
- Add {top, bottom}_k_by to Series (#22902)
- Correct
int_rangesto raise error on invalid inputs (#22894) - Don't silently overflow for temporal casts (#22901)
- Fix error using
write_csvwithstorage_options(#22881) - Schema resolution
.over(mapping_strategy="join")with non-aggregations (#22875) - Ensure rename behaves the same as select (#22852)
📖 Documentation
- Document aggregations that return identity when there's no non-null values, suggest workaround for those who want SQL-standard behaviour (#23143)
- Fix reference to non-existent
Expr.replace_allinreplace_strictdocs (#23144) - Fix typo on pandas comparison page (#23123)
- Minor improvement to
cum_countdocstring example (#23099) - Add missing
DataFrame.__setitem__to API reference (#22938) - Add missing entry for LazyFrame
__getitem__(#22924) - Add missing
top_k_byandbottom_k_bytoSeriesreference (#22917)
📦 Build system
- Update
pyo3andnumpycrates to version0.25(#22763) - Actually disable
ir_serdeby default (#23046) - Add a feature flag for
serde_ignored(#22957) - Fix warnings, update DSL version and schema hash (#22953)
🛠️ Other improvements
- Change flake to use venv (#23219)
- Add
default_allocfeature topy-polars(#23202) - Added more descriptive error message by replacing
FixedSizeListwithArray(#23168) - Connect Python
assert_series_equal()to Rust back-end (#23141) - Refactor skip_batches to use AExprBuilder (#23147)
- Use
ir_serdeinstead ofserdeforIRFunctionExpr(#23148) - Separate
FunctionExprandIRFunctionExpr(#23140) - Remove
AExpr::Alias(#23070) - Add components for Iceberg deletion file support (#23059)
- Feature gate
StructFunction::JsonEncode(#23060) - Propagate iceberg position delete information to IR (#23045)
- Add environment variable to get Parquet decoding metrics (#23052)
- Turn
pl.cumulative_evalinto its ownAExpr(#22994) - Add make test-streaming (#23044)
- Move scan parameter parsing for parquet to reusable function (#23019)
- Prepare deltalake 1.0 (#22931)
- Implement
Hashand useSpecialEqforRenameAliasFn(#22989) - Turn
list.evalinto anAExpr(#22911) - Fix CI for latest pandas-stubs release (#22971)
- Add a CI check for DSL schema changes (#22898)
- Add schema parameters to
expr.meta(#22906) - Update rust toolchain in nix flake (#22905)
- Update toolchain (#22859)
Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @mcrumiller, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst and @thomasfrederikhoeck
Python Polars 1.31.0-beta.1
💥 Breaking changes
- Remove old streaming engine (#23103)
⚠️ Deprecations
- Deprecate
allow_missing_columnsinscan_parquetin favor ofmissing_columns(#22784)
🚀 Performance improvements
- Improve streaming groupby CSE (#23092)
- Move row index materialization in post-apply to occur after slicing (#22995)
- Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
- Don't go through row encoding for most types on
index_of(#22903) - Optimise low-level
nullscans andarg_maxfor bools (when chunked) (#22897) - Optimize multiscan performance (#22886)
✨ Enhancements
- DataType expressions in Python (#23167)
- Native implementation for Iceberg positional deletes (#23091)
- Remove old streaming engine (#23103)
- Basic implementation of
DataTypeExprin Rust DSL (#23049) - Add
required: booltoParquetFieldOverwrites(#23013) - Support serializing
name.map_fields(#22997) - Support serializing
Expr::RenameAlias(#22988) - Remove duplicate verbose logging from
FetchedCredentialsCache(#22973) - Add
keyscolumn infinish_callback(#22968) - Add
extra_columnsparameter toscan_parquet(#22699) - Add CORR function to polars SQL (#22690)
- Add per partition sort and finish callback to sinks (#22789)
- Support descendingly-sorted values in search_sorted() (#22825)
- Derive DSL schema (#22866)
🐞 Bug fixes
- Fix panic reading empty parquet with multiple boolean columns (#23159)
- Raise ComputeError instead of panicking in
truncatewhen mixing month/week/day/sub-daily units (#23176) - Materialize
list.evalwith unknown type (#23186) - Only set sorting flag for 1st column with PQ SortingColumns (#23184)
- Typo in AExprBuilder (#23171)
- Null return from var/std on scalar column (#23158)
- Support Datetime broadcast in
list.concat(#23137) - Ensure projection pushdown maintains right table schema (#22603)
- Add Null dtype support to arg_sort_by (#23107)
- Raise error by default on invalid CSV quotes (#22876)
- Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
- Fix hive partition pruning not filtering out
__HIVE_DEFAULT_PARTITION__(#23074) - Fix
AssertionErrorwhen usingscan_delta()on AWS withstorage_options(#23076) - Fix deadlock on
collect(background=True)/collect_concurrently()(#23075) - Incorrect null count in rolling_min/max (#23073)
- Preserve
file://in LazyFrame node traverser (#23072) - Respect column order in
register_io_sourceschema (#23057) - Don't call unnest for objects implementing
__arrow_c_array__(#23069) - Incorrect output when using
sortwithgroup_byandcum_sum(#23001) - Implement owned arithmetic for Int128 (#23055)
- Do not schema-match structs with different field counts (#23018)
- Fix confusing error message on duplicate row_index (#23043)
- Add
include_nullstoAgg::CountCSE check (#23032) - View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
- Fix incorrect result selecting
pl.len()fromscan_csvwithskip_lines(#22949) - Allow for IO plugins with reordered columns in streaming (#22987)
- Method
str.zfillwas inconsistent with Python and pandas when string contained leading '+' (#22985) - Integer underflow in
propagate_nulls(#22986) - Setting
compat_level=0forsink_ipc(#22960) - Narrow return type for
DataType.is_, improve Pyright's type completeness from 69% to 95% (#22962) - Support arrow Decimal32 and Decimal64 types (#22954)
- Guard against dictionaries being passed to projection keywords (#22928)
- Update arrow format (#22941)
- Fix filter pushdown to IO plugins (#22910)
- Improve numeric stability rolling_mean<f32> (#22944)
- Guard against invalid nested objects in 'map_elements' (#22932)
- Allow subclasses in type equality checking (#22915)
- Return early in
pl.Expr.__array_ufunc__when only single input (#22913) - Add inline implodes in type coercion (#22885)
- Add {top, bottom}_k_by to Series (#22902)
- Correct
int_rangesto raise error on invalid inputs (#22894) - Don't silently overflow for temporal casts (#22901)
- Fix error using
write_csvwithstorage_options(#22881) - Schema resolution
.over(mapping_strategy="join")with non-aggregations (#22875) - Ensure rename behaves the same as select (#22852)
📖 Documentation
- Document aggregations that return identity when there's no non-null values, suggest workaround for those who want SQL-standard behaviour (#23143)
- Fix reference to non-existent
Expr.replace_allinreplace_strictdocs (#23144) - Fix typo on pandas comparison page (#23123)
- Minor improvement to
cum_countdocstring example (#23099) - Add missing
DataFrame.__setitem__to API reference (#22938) - Add missing entry for LazyFrame
__getitem__(#22924) - Add missing
top_k_byandbottom_k_bytoSeriesreference (#22917)
📦 Build system
- Update
pyo3andnumpycrates to version0.25(#22763) - Actually disable
ir_serdeby default (#23046) - Add a feature flag for
serde_ignored(#22957) - Fix warnings, update DSL version and schema hash (#22953)
🛠️ Other improvements
- Added more descriptive error message by replacing
FixedSizeListwithArray(#23168) - Connect Python
assert_series_equal()to Rust back-end (#23141) - Refactor skip_batches to use AExprBuilder (#23147)
- Use
ir_serdeinstead ofserdeforIRFunctionExpr(#23148) - Separate
FunctionExprandIRFunctionExpr(#23140) - Remove
AExpr::Alias(#23070) - Add components for Iceberg deletion file support (#23059)
- Feature gate
StructFunction::JsonEncode(#23060) - Propagate iceberg position delete information to IR (#23045)
- Add environment variable to get Parquet decoding metrics (#23052)
- Turn
pl.cumulative_evalinto its ownAExpr(#22994) - Add make test-streaming (#23044)
- Move scan parameter parsing for parquet to reusable function (#23019)
- Prepare deltalake 1.0 (#22931)
- Implement
Hashand useSpecialEqforRenameAliasFn(#22989) - Turn
list.evalinto anAExpr(#22911) - Fix CI for latest pandas-stubs release (#22971)
- Add a CI check for DSL schema changes (#22898)
- Add schema parameters to
expr.meta(#22906) - Update rust toolchain in nix flake (#22905)
- Update toolchain (#22859)
Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst and @thomasfrederikhoeck
Rust Polars 0.48.1
🚀 Performance improvements
- Switch eligible casts to non-strict in optimizer (#22850)
🐞 Bug fixes
- Fix RuntimeError when serializing the same DataFrame from multiple threads (#22844)
📦 Build system
🛠️ Other improvements
- Update Rust Polars versions (#22854)
Thank you to all our contributors for making this release possible!
@JakubValtar, @bschoenmaeckers, @nameexhaustion and @stijnherfst
Python Polars 1.30.0
🚀 Performance improvements
- Switch eligible casts to non-strict in optimizer (#22850)
- Allow predicate passing set_sorted (#22797)
- Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
- Add elementwise execution mode for
list.eval(#22715) - Support optimised init from non-dict
Mappingobjects infrom_recordsand frame/series constructors (#22638) - Add streaming cross-join node (#22581)
- Switch off
maintain_orderin group-by followed by sort (#22492)
✨ Enhancements
- Load AWS
endpoint_urlusing boto3 (#22851) - Implemented
list.filter(#22749) - Support binaryoffset in search sorted (#22786)
- Add
nulls_equalflag tolist/arr.contains(#22773) - Implement
LazyFrame.match_to_schema(#22726) - Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
- Allow for
.overto be called withoutpartition_by(#22712) - Support
AnyValuetranslation fromPyMappingvalues (#22722) - Support optimised init from non-dict
Mappingobjects infrom_recordsand frame/series constructors (#22638) - Support inference of
Int128dtype from databases that support it (#22682) - Add options to write Parquet field metadata (#22652)
- Add
cast_optionsparameter to control type casting inscan_parquet(#22617) - Allow casting
List<UInt8>toBinary(#22611) - Allow setting of regex size limit using
POLARS_REGEX_SIZE_LIMIT(#22651) - Support use of literal values as "other" when evaluating
Series.zip_with(#22632) - Allow to read and write custom file-level parquet metadata (#21806)
- Support PEP702
@deprecateddecorator behaviour (#22594) - Support grouping by
pl.Array(#22575) - Preserve exception type and traceback for errors raised from Python (#22561)
- Use fixed-width font in streaming phys plan graph (#22540)
🐞 Bug fixes
- Fix RuntimeError when serializing the same DataFrame from multiple threads (#22844)
- Fix map_elements predicate pushdown (#22833)
- Fix reverse list type (#22832)
- Don't require numpy for search_sorted (#22817)
- Add type equality checking for relevant methods (#22802)
- Invalid output for
fill_nullafterwhen.thenon structs (#22798) - Don't panic for cross join with misaligned chunking (#22799)
- Panic on quantile over nulls in rolling window (#22792)
- Respect BinaryOffset metadata (#22785)
- Correct the output order of
PartitionByKeyandPartitionParted(#22778) - Fallback to non-strict casting for deprecated casts (#22760)
- Clippy on new stable version (#22771)
- Handle sliced out remainder for bitmaps (#22759)
- Don't merge
Enumcategories on append (#22765) - Fix unnest() not working on empty struct columns (#22391)
- Fix the default value type in
Schemainit (#22589) - Correct name in
unnesterror message (#22740) - Provide "schema" to
DataFrame, even if empty JSON (#22739) - Properly account for nulls in the
is_not_nancheck made indrop_nans(#22707) - Incorrect result from SQL
count(*)withpartition by(#22728) - Fix deadlock joining scanned tables with low thread count (#22672)
- Don't allow deserializing incompatible DSL (#22644)
- Incorrect null dtype from binary ops in empty group_by (#22721)
- Don't mark
str.replace_manywith Mapping as deprecated (#22697) - Gzip has maximum compression of 9, not 10 (#22685)
- Fix predicate pushdown of fallible expressions (#22669)
- Fix
index out of boundspanic when scanning hugging face (#22661) - Panic on
group_bywith literal and empty rows (#22621) - Return input instead of panicking if empty subset in
drop_nulls()anddrop_nans()(#22469) - Bump argminmax to 0.6.3 (#22649)
- DSL version deserialization endianness (#22642)
- Allow Expr.round() to be called on integer dtypes (#22622)
- Fix panic when filtering based on row index column in parquet (#22616)
- WASM and PyOdide compile (#22613)
- Resolve
get()SchemaMismatch panic (#22350) - Panic in group_by_dynamic on single-row df with group_by (#22597)
- Add
new_streamingfeature topolarscrate (#22601) - Consistently use Unix epoch as origin for
dt.truncate(except weekly buckets which start on Mondays) (#22592) - Fix interpolate on dtype Decimal (#22541)
- CSV count rows skipped last line if file did not end with newline (#22577)
- Make nested strict casting actually strict (#22497)
- Make
replaceandreplace_strictmapping use list literals (#22566) - Allow pivot on
Timecolumn (#22550) - Fix error when providing CSV schema with extra columns (#22544)
- Panic on bitwise op between Series and Expr (#22527)
- Multi-selector regex expansion (#22542)
📖 Documentation
- Add pre-release policy (#22808)
- Fix broken link to service account page in Polars Cloud docs (#22762)
- Add
match_to_schemato API reference (#22777) - Provide additional explanation and examples for the
value_counts"normalize" parameter (#22756) - Rework documentation for
drop/fillfor nulls/nans (#22657) - Add documentation to new
RoundModeparameter inround(#22555) - Add missing
repeat_byto API reference, fixuplist.get(#22698) - Fix non-rendering bullet points in
scan_iceberg(#22694) - Improve
insert_columndocstring (description and examples) (#22551) - Improve
joindocumentation (#22556)
📦 Build system
- Fix building
polars-lazywith certain features (#22846) - Add missing features (#22839)
- Patch pyo3 to disable recompilation (#22796)
🛠️ Other improvements
- Update Rust Polars versions (#22854)
- Add basic smoke test for free-threaded python (#22481)
- Update Polars Rust versions (#22834)
- Fix
nix build(#22809) - Fix flake.nix to work on macos (#22803)
- Unused variables on release build (#22800)
- Update cloud docs (#22624)
- Fix unstable
list.evalperformance test (#22729) - Add proptest implementations for all Array types (#22711)
- Dispatch
.write_*to.lazy().sink_*(engine='in-memory')(#22582) - Move to all optimization flags to
QueryOptFlags(#22680) - Add test for
str.replace_many(#22615) - Stabilize
sink_*(#22643) - Add proptest for row-encode (#22626)
- Update rust version in nix flake (#22627)
- Add a nix flake with a devShell and package (#22246)
- Use a wrapper struct to store time zone (#22523)
- Add
proptesttesting for for parquet decoding kernels (#22608) - Include equiprobable as valid quantile method (#22571)
- Remove confusing error context calling
.collect(_eager=True)(#22602) - Fix test_truncate_path test case (#22598)
- Unify function flags into 1 bitset (#22573)
- Display the operation behind
in-memory-map(#22552)
Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JakubValtar, @Julian-J-S, @LucioFranco, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @mcrumiller, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-
Rust Polars 0.48.0
💥 Breaking changes
- Use a wrapper struct to store time zone (#22523)
🚀 Performance improvements
- Allow predicate passing set_sorted (#22797)
- Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
- Add elementwise execution mode for
list.eval(#22715) - Support optimised init from non-dict
Mappingobjects infrom_recordsand frame/series constructors (#22638) - Add streaming cross-join node (#22581)
- Switch off
maintain_orderin group-by followed by sort (#22492)
✨ Enhancements
- Format named functions (#22831)
- Implemented
list.filter(#22749) - Support binaryoffset in search sorted (#22786)
- Add
nulls_equalflag tolist/arr.contains(#22773) - Allow named opaque functions for serde (#22734)
- Implement
LazyFrame.match_to_schema(#22726) - Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
- Allow for
.overto be called withoutpartition_by(#22712) - Support
AnyValuetranslation fromPyMappingvalues (#22722) - Support optimised init from non-dict
Mappingobjects infrom_recordsand frame/series constructors (#22638) - Add options to write Parquet field metadata (#22652)
- Allow casting
List<UInt8>toBinary(#22611) - Allow setting of regex size limit using
POLARS_REGEX_SIZE_LIMIT(#22651)
🐞 Bug fixes
- Fix reverse list type (#22832)
- Add type equality checking for relevant methods (#22802)
- Invalid output for
fill_nullafterwhen.thenon structs (#22798) - Don't panic for cross join with misaligned chunking (#22799)
- Panic on quantile over nulls in rolling window (#22792)
- Respect BinaryOffset metadata (#22785)
- Correct the output order of
PartitionByKeyandPartitionParted(#22778) - Fallback to non-strict casting for deprecated casts (#22760)
- Clippy on new stable version (#22771)
- Handle sliced out remainder for bitmaps (#22759)
- Don't merge
Enumcategories on append (#22765) - Fix unnest() not working on empty struct columns (#22391)
- Correct name in
unnesterror message (#22740) - Properly account for nulls in the
is_not_nancheck made indrop_nans(#22707) - Incorrect result from SQL
count(*)withpartition by(#22728) - Fix deadlock joining scanned tables with low thread count (#22672)
- Don't allow deserializing incompatible DSL (#22644)
- Incorrect null dtype from binary ops in empty group_by (#22721)
- Don't mark
str.replace_manywith Mapping as deprecated (#22697) - Gzip has maximum compression of 9, not 10 (#22685)
- Fix predicate pushdown of fallible expressions (#22669)
- Fix
index out of boundspanic when scanning hugging face (#22661) - Fix polars crate not compiling when lazy feature enabled (#22655)
- Panic on
group_bywith literal and empty rows (#22621) - Return input instead of panicking if empty subset in
drop_nulls()anddrop_nans()(#22469) - Bump argminmax to 0.6.3 (#22649)
- DSL version deserialization endianness (#22642)
- Fix nested dtype row encoding (#22557)
- Allow Expr.round() to be called on integer dtypes (#22622)
- Fix panic when filtering based on row index column in parquet (#22616)
- WASM and PyOdide compile (#22613)
- Resolve
get()SchemaMismatch panic (#22350)
📖 Documentation
- Add pre-release policy (#22808)
- Fix broken link to service account page in Polars Cloud docs (#22762)
- Rework documentation for
drop/fillfor nulls/nans (#22657)
📦 Build system
- Patch pyo3 to disable recompilation (#22796)
🛠️ Other improvements
- Update Polars Rust versions (#22834)
- Cleanup
polars-pythonlifetimes (#22548) - Fix
nix build(#22809) - Fix flake.nix to work on macos (#22803)
- Remove unused dependencies in
polars-arrow(#22806) - Unused variables on release build (#22800)
- Update cloud docs (#22624)
- Add proptest implementations for all Array types (#22711)
- Dispatch
.write_*to.lazy().sink_*(engine='in-memory')(#22582) - Move to all optimization flags to
QueryOptFlags(#22680) - Add test for
str.replace_many(#22615) - Stabilize
sink_*(#22643) - Add proptest for row-encode (#22626)
- Emphasize PolarsDataType::get_dtype is static-only (#22648)
- Use named fields for Logical (#22647)
- Update rust version in nix flake (#22627)
- Add a nix flake with a devShell and package (#22246)
- Use a wrapper struct to store time zone (#22523)
- Add
proptesttesting for for parquet decoding kernels (#22608)
Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JakubValtar, @Julian-J-S, @LucioFranco, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-
Python Polars 1.30.0-beta.1
🚀 Performance improvements
- Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
- Add elementwise execution mode for
list.eval(#22715) - Support optimised init from non-dict
Mappingobjects infrom_recordsand frame/series constructors (#22638) - Add streaming cross-join node (#22581)
- Switch off
maintain_orderin group-by followed by sort (#22492)
✨ Enhancements
- Support binaryoffset in search sorted (#22786)
- Add
nulls_equalflag tolist/arr.contains(#22773) - Implement
LazyFrame.match_to_schema(#22726) - Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
- Allow for
.overto be called withoutpartition_by(#22712) - Support
AnyValuetranslation fromPyMappingvalues (#22722) - Support optimised init from non-dict
Mappingobjects infrom_recordsand frame/series constructors (#22638) - Support inference of
Int128dtype from databases that support it (#22682) - Add options to write Parquet field metadata (#22652)
- Add
cast_optionsparameter to control type casting inscan_parquet(#22617) - Allow casting List<UInt8> to Binary (#22611)
- Allow setting of regex size limit using
POLARS_REGEX_SIZE_LIMIT(#22651) - Support use of literal values as "other" when evaluating
Series.zip_with(#22632) - Allow to read and write custom file-level parquet metadata (#21806)
- Support PEP702
@deprecateddecorator behaviour (#22594) - Support grouping by
pl.Array(#22575) - Preserve exception type and traceback for errors raised from Python (#22561)
- Use fixed-width font in streaming phys plan graph (#22540)
🐞 Bug fixes
- Respect BinaryOffset metadata (#22785)
- Correct the output order of
PartitionByKeyandPartitionParted(#22778) - Fallback to non-strict casting for deprecated casts (#22760)
- Clippy on new stable version (#22771)
- Handle sliced out remainder for bitmaps (#22759)
- Don't merge
Enumcategories on append (#22765) - Fix unnest() not working on empty struct columns (#22391)
- Fix the default value type in
Schemainit (#22589) - Correct name in
unnesterror message (#22740) - Provide "schema" to
DataFrame, even if empty JSON (#22739) - Properly account for nulls in the
is_not_nancheck made indrop_nans(#22707) - Incorrect result from SQL
count(*)withpartition by(#22728) - Fix deadlock joining scanned tables with low thread count (#22672)
- Don't allow deserializing incompatible DSL (#22644)
- Incorrect null dtype from binary ops in empty group_by (#22721)
- Don't mark
str.replace_manywith Mapping as deprecated (#22697) - Gzip has maximum compression of 9, not 10 (#22685)
- Fix predicate pushdown of fallible expressions (#22669)
- Fix
index out of boundspanic when scanning hugging face (#22661) - Panic on
group_bywith literal and empty rows (#22621) - Return input instead of panicking if empty subset in
drop_nulls()anddrop_nans()(#22469) - Bump argminmax to 0.6.3 (#22649)
- DSL version deserialization endianness (#22642)
- Allow Expr.round() to be called on integer dtypes (#22622)
- Fix panic when filtering based on row index column in parquet (#22616)
- WASM and PyOdide compile (#22613)
- Resolve
get()SchemaMismatch panic (#22350) - Panic in group_by_dynamic on single-row df with group_by (#22597)
- Add
new_streamingfeature topolarscrate (#22601) - Consistently use Unix epoch as origin for
dt.truncate(except weekly buckets which start on Mondays) (#22592) - Fix interpolate on dtype Decimal (#22541)
- CSV count rows skipped last line if file did not end with newline (#22577)
- Make nested strict casting actually strict (#22497)
- Make
replaceandreplace_strictmapping use list literals (#22566) - Allow pivot on
Timecolumn (#22550) - Fix error when providing CSV schema with extra columns (#22544)
- Panic on bitwise op between Series and Expr (#22527)
- Multi-selector regex expansion (#22542)
📖 Documentation
- Fix broken link to service account page in Polars Cloud docs (#22762)
- Add
match_to_schemato API reference (#22777) - Provide additional explanation and examples for the
value_counts"normalize" parameter (#22756) - Rework documentation for
drop/fillfor nulls/nans (#22657) - Add documentation to new
RoundModeparameter inround(#22555) - Add missing
repeat_byto API reference, fixuplist.get(#22698) - Fix non-rendering bullet points in
scan_iceberg(#22694) - Improve
insert_columndocstring (description and examples) (#22551) - Improve
joindocumentation (#22556)
🛠️ Other improvements
- Update cloud docs (#22624)
- Fix unstable
list.evalperformance test (#22729) - Add proptest implementations for all Array types (#22711)
- Dispatch
.write_*to.lazy().sink_*(engine='in-memory')(#22582) - Move to all optimization flags to
QueryOptFlags(#22680) - Add test for
str.replace_many(#22615) - Stabilize
sink_*(#22643) - Add proptest for row-encode (#22626)
- Update rust version in nix flake (#22627)
- Add a nix flake with a devShell and package (#22246)
- Use a wrapper struct to store time zone (#22523)
- Add
proptesttesting for for parquet decoding kernels (#22608) - Include equiprobable as valid quantile method (#22571)
- Remove confusing error context calling
.collect(_eager=True)(#22602) - Fix test_truncate_path test case (#22598)
- Unify function flags into 1 bitset (#22573)
- Display the operation behind
in-memory-map(#22552)
Thank you to all our contributors for making this release possible!
@JakubValtar, @Julian-J-S, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @mcrumiller, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-
Rust Polars 0.47.1
🏆 Highlights
- Enable common subplan elimination across plans in
collect_all(#21747) - Add lazy sinks (#21733)
- Add
PartitionByKeyfor new streaming sinks (#21689) - Enable new streaming memory sinks by default (#21589)
💥 Breaking changes
- Make bottom interval closed in
hist(#22090)
🚀 Performance improvements
- Avoid alloc_zeroed in decompression (#22460)
- Lower Expr.(n_)unique to group_by on streaming engine (#22420)
- Chunk huge munmap calls (#22414)
- Add single-key variants of streaming group_by (#22409)
- Improve accumulate_dataframes_vertical performance (#22399)
- Use optimize rolling_quantile with varying window sizes (#22353)
- Dedicated
rolling_skewkernel (#22333) - Call large munmap's in background thread (#22329)
- New streaming group_by implementation (#22285)
- Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
- Turn on
parallel=prefilteredby default for new streaming (#22190) - Add CSE to streaming groupby (#22196)
- Speed-up new streaming predicate filtering (#22179)
- Speedup new-streaming file row count (#22169)
- Fix quadratic behavior when casting Enums (#22008)
- Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
- Fast path for empty inner join (#21965)
- Add native semi/anti join in new streaming engine (#21937)
- Cache regex compilation globally (#21929)
- Use views for binary hash tables and add single-key binary variant (#21872)
- Avoid rechunking in gather (#21876)
- Switch ahash for foldhash (#21852)
- Put THP behind feature flag (#21853)
- Enable THP by default (#21829)
- Improve join performance for expanding joins (#21821)
- Use binary_search instead of contains in business-day functions (#21775)
- Implement linear-time rolling_min/max (#21770)
- Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
- Enable common subplan elimination across plans in
collect_all(#21747) - Allow elementwise functions in recursive lowering (#21653)
- Add primitive single-key hashtable to new-streaming join (#21712)
- Remove unnecessary black_boxes in Kahan summation (#21679)
- Box large enum variants (#21657)
- Improve join performance for new-streaming engine (#21620)
- Pre-fill caches (#21646)
- Optimize only a single cache input (#21644)
- Collect parquet statistics in one contiguous buffer (#21632)
- Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
- Don't maintain order when maintain_order=False in new streaming sinks (#21586)
- Pre-sort groups in group-by-dynamic (#21569)
- Provide a fallback skip batch predicate for constant batches (#21477)
- Parallelize the passing in new streaming multiscan (#21430)
- Toggle projection pushdown for eager rolling (#21405)
- Fix pathologic
rolling + group-byperformance and memory explosion (#21403) - Add sampling to new-streaming equi join to decide between build/probe side (#21197)
- Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
- Implement native Expr.count() on new-streaming (#21126)
- Speed up list operations that use amortized_iter() (#20964)
- Use Cow as output for rechunk and add rechunk_mut (#21116)
- Reduce arrow slice mmap overhead (#21113)
- Reduce conversion cost in chunked string gather (#21112)
- Enable prefiltered by default for new streaming (#21109)
- Enable parquet column expressions for streaming (#21101)
- Deduplicate buffers again in stringview concat kernel (#21098)
- Add dedicated concatenate kernels (#21080)
- Rechunk only once during join probe gather (#21072)
- Speed up from_pandas when converting frame with multi-index columns (#21063)
- Change default memory prefetch to MADV_WILLNEED (#21056)
- Remove cast to boolean after comparison in optimizer (#21022)
- Split last rowgroup among all threads in new-streaming parquet reader (#21027)
- Recombine into larger morsels in new-streaming join (#21008)
- Improve
list.minandlist.maxperformance for logical types (#20972) - Ensure count query select minimal columns (#20923)
✨ Enhancements
- Support grouping by
pl.Array(#22575) - Preserve exception type and traceback for errors raised from Python (#22561)
- Use fixed-width font in streaming phys plan graph (#22540)
- Highlight nodes in streaming phys plan graph (#22535)
- Support BinaryOffset serde (#22528)
- Show physical stage graph (#22491)
- Add structure for dispatching iceberg to native scans (#22405)
- Add SQL support for checking array values with
INandNOT INexpressions (#22487) - Add more IRBuilder utils (#22482)
- Support
DataFrameandSeriesinit from torchTensorobjects (#22177) - Add
RoundModefor Decimal and Float (#22248) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
- Make streaming dispatch public (#22347)
- Add
rolling_kurtosis(#22335) - Support Cast in IO plugin predicates (#22317)
- Add
.sort(nulls_last=True)to booleans, categoricals and enums (#22300) - Add rolling min/max for temporals (#22271)
- Support literal:list agg (#22249)
- Support
implode + agg(#22230) - Dispatch scans to new-streaming by default (#22153)
- Improved expression autocomplete for
IPython,Jupyter, andMarimo(#22221) - Expose
FunctionIR::FastCountin the python visitor (#22195) - Add
SPLIT_PARTstring function to the SQL interface (#22158) - Allow scalar expr in
Expr.diff(#22142) - Support additional unsigned int aliases in the SQL interface (#22127)
- Add
STRING_TO_ARRAYfunction to the SQL interface (#22129) - Add dt.is_business_day (#21776)
- Add support for
Int128parsing/recognition to the SQL interface (#22104) - Allow sinking to abstract python
ioandfsclasses (#21987) - Add
add_alp_optimize_exprstoIRBuilder(#22061) - Add
cat.slice(#21971) - Support growing schema if line lenght increases during csv schema inference (#21979)
- Replace thread unsafe
GilOnceCellwithMutex(#21927) - Support modified dsl in file cache (#21907)
- Add support for io-plugins in new-streaming (#21870)
- Add
PartitionParted(#21788) - Add DoubleEndedIterator for CatIter (#21816)
- Minor improvements to EXPLAIN plan output (#21822)
- Add
polars_testingfolder with relevant files andadd_series_equal!()functionality (#21722) - Allow to use
repeat_bywith (nested) lists and structs (#21206) - Add support for rolling_(sum/min/max) for booleans through casting (#21748)
- Support multi-column sort for all nested types and nested search-sorted (#21743)
- Add lazy sinks (#21733)
- Add
PartitionByKeyfor new streaming sinks (#21689) - Fix replace flags (#21731)
- Add
mkdirflag to sinks (#21717) - Enable joins on list/array dtypes (#21687)
- Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
- Support all elementwise functions in IO plugin predicates (#21705)
- Stabilize Enum datatype (#21686)
- Support Polars int128 in from arrow (#21688)
- Use FFI to read dataframe instead of transmute (#21673)
- Enable new streaming memory sinks by default (#21589)
- Cloud support for new-streaming scans and sinks (#21621)
- Add len method to arr (#21618)
- Closeable files on unix (#21588)
- Add new
PartitionMaxSizesink (#21573) - Implement
unpack_dtypes()functionality with unit tests (#21574) - Support engine callback for
LazyFrame.profile(#21534) - Dispatch new-streaming CSV negative slice to separate node (#21579)
- Add NDJSON source to new streaming engine (#21562)
- Add lossy decoding to
read_csvfor non-utf8 encodings (#21433) - Add 'nulls_equal' parameter to
is_in(#21426) - Improve numeric stability
rolling_{std, var, cov, corr}(#21528) - IR Serde cross-filter (#21488)
- Support writing
Timetype in json (#21454) - Activate all optimizations in sinks (#21462)
- Add
AssertionErrorvariant toPolarsErrorinpolars-error(#21460) - Pass filter to inner readers in multiscan new streaming (#21436)
- Implement i128 -> str cast (#21411)
- Version DSL (#21383)
- Make user facing binary formats mostly self describing (#21380)
- Filter hive files using predicates in new streaming (#21372)
- Add negative slicing to new streaming multiscan (#21219)
- Pub-licize Expr DSL Function enums (#20421)
- Implement sorted flags for struct series (#21290)
- Support reading arrow Map type from Delta (#21330)
- Add a dedicated
removemethod forDataFrameandLazyFrame(#21259) - Expose
include_file_pathsto python visitor (#21279) - Implement
merge_sortedfor struct (#21205) - Add positive slice for new streaming MultiScan (#21191)
- Don't take in rewriting visitor (#21212)
- Add SQL support for the
DELETEstatement (#21190) - Add row index to new streaming multiscan (#21169)
- Improve DataFrame fmt in explain (#21158)
- Add projection pushdown to new streaming multiscan (#21139)
- Implement join on struct dtype (#21093)
- Use unique temporary directory path per user and restrict permissions (#21125)
- Enable new streaming multiscan for CSV (#21124)
- Environment
POLARS_MAX_CONCURRENT_SCANSin multiscan for new streaming (#21127) - Multi/Hive scans in new streaming engine (#21011)
- Add
linear_spaces(#20941) - Implement
merge_sortedfor binary (#21045) - Hold string cache in new streaming engine and fix row-encoding (#21039)
- Support max/min method for Time dtype (#19815)
- Implement a streaming merge sorted node (#20960)
- Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
- Add negative slice support to new-streaming engine (#21001)
- Allow for more RG skipping by rewriting expr in planner (#20828)
- Rename catalog
schematonamespace(#20993) - Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
- Improved support for KeyboardInterrupts (#20961...
Python Polars 1.29.0
🚀 Performance improvements
- Avoid alloc_zeroed in decompression (#22460)
✨ Enhancements
- Highlight nodes in streaming phys plan graph (#22535)
- Show physical stage graph (#22491)
- Add structure for dispatching iceberg to native scans (#22405)
- Add SQL support for checking array values with
INandNOT INexpressions (#22487) - Support
DataFrameandSeriesinit from torchTensorobjects (#22177) - Add
RoundModefor Decimal and Float (#22248) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
🐞 Bug fixes
- Streaming outer join coalesce bug (#22530)
- Remove redundant print statement in
assert_frame_schema_equal()(#22529) - Bug in
.unique()followed by.slice()(#22471) - Fix error reading parquet with datetimes written by pandas (#22524)
- Fix
schema_overridesnot taking effect in NDJSON (#22521) - Fold flags and verify scalar correctness in apply (#22519)
- Invalid values were triggering panics instead of returning
nullindt.to_date/dt.to_datetime(#22500) - Ensure numpy
isinstancecheck is lazy (avoid forcing the dependency) (#22486) - Incorrectly dropped sort after unique for some queries (#22489)
- Fix incorrect ternary agg state with mixed columns and scalars (#22496)
- Make
replaceandreplace_strictproperly elementwise (#22465) - Fix index out of bounds panic on parquet prefiltering (#22458)
- Integer underflow when checking parquet UTF-8 (#22472)
- Add implementation for
array.getwith idx overflow (#22449) - Deprecate
str.collection functions with flat strings and mark as elementwise (#22461) - Deprecate flat
list.gatherand mark as elementwise (#22456) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
📖 Documentation
- Fix typo in structs page (#22504)
🛠️ Other improvements
- Don't store name/dtype in grouper (#22525)
- Add structure for dispatching iceberg to native scans (#22405)
- Remove unused reduction code (#22462)
- Pin to explicit macOS version in code coverage (#22432)
Thank you to all our contributors for making this release possible!
@AH-Merii, @JakubValtar, @Julian-J-S, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @alexander-beedie, @brianmakesthings, @coastalwhite, @nameexhaustion, @orlp and @ritchie46
Python Polars 1.28.1
🐞 Bug fixes
- Reading of reencoded categorical in Parquet (#22436)
- Last thread in parquet predicate filter oob (#22429)
📖 Documentation
📦 Build system
- Update
pyo3andnumpycrates to version0.24(#22015)
🛠️ Other improvements
- Add test for
implode+over(#22437) - Fix CI by removing use_legacy_dataset (#22438)
- Only use pytorch index-url for
pytorchpackage (#22355)
Thank you to all our contributors for making this release possible!
@bschoenmaeckers, @coastalwhite, @etiennebacher, @mcrumiller and @ritchie46
Python Polars 1.28.0
🚀 Performance improvements
- Lower Expr.(n_)unique to group_by on streaming engine (#22420)
- Chunk huge munmap calls (#22414)
- Add single-key variants of streaming group_by (#22409)
- Improve accumulate_dataframes_vertical performance (#22399)
- Use optimize rolling_quantile with varying window sizes (#22353)
- Dedicated
rolling_skewkernel (#22333) - Call large munmap's in background thread (#22329)
- New streaming group_by implementation (#22285)
- Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
- Turn on
parallel=prefilteredby default for new streaming (#22190)
✨ Enhancements
- When reporting unexpected types in errors, module-qualify the typename (#22390)
- Add Series
backward_fill/forward_fill(#22360) - Add GPU support to sink_* APIs (#20940)
- Changed mapping type from
dicttoMapping(#19400) (#19436) - Make streaming dispatch public (#22347)
- Add
rolling_kurtosis(#22335) - Support Cast in IO plugin predicates (#22317)
- Add
.sort(nulls_last=True)to booleans, categoricals and enums (#22300) - Add rolling min/max for temporals (#22271)
- Support literal:list agg (#22249)
- Support running Polars SQL queries against any objects implementing the PyCapsule interface (#22235)
- Support
implode + agg(#22230) - Dispatch scans to new-streaming by default (#22153)
🐞 Bug fixes
- Ensure
write_excelcorrectly preserves null values in nested dtype data on export (#22379) - Panic when visualizing streaming physical plan with joins (#22404)
- Fix incorrect filter after
LazyFrame.rename().select()(#22380) - Fix
select(len())performance regression (#22363) - Handle pytz named timezone in
lit(#21785) - Don't leak state during prefill CSE cache (#22341)
- Maintain float32 type in partitioned group-by (#22340)
- Resolve streaming panic on multiple
merge_sorted(#22205) - Fix ndjson nested types (#22325)
- Fix nested datetypes in ndjson (#22321)
- Check matching lengths for
pl.corr(#22305) - Move type coercion for
pl.durationto planner (#22304) - Check dtype to avoid panic with mixed types in min/max_horizontal (#21857)
- Coalesce correct column for new streaming full join (#22301)
- Don't collect
NaNfrom Parquet Statistics (#22294) - Set revmap for empty
AnyValuetoSeries(#22293) - Add an
__all__entry to internal type definition module (#22254) - Datetime parser was incorrectly parsing 8-digit fractional seconds when format specified to expect 9 (#22180)
- More robust
str → dateconversion when reading from spreadsheet (#22276) - Deprecate using
is_inwith 2 equal types and mark as elementwise (#22178) - Duplicate key column name in streaming group_by due to CSE (#22280)
- Raise
ColumnNotFoundErrorfor missing columns injoin_where(#22268) - Parquet filters for logical types and operations (#22253)
- Ensure floating-point accuracy in
hist(#22245) - Check matching key datatypes for new streaming joins (#22247)
- Incorrect length BinaryArray/ListBuilder (#22227)
📖 Documentation
- Update docs for schema arg in scan_csv to match read_csv (#22357)
- Update
pl.whendocumentation (#22345) - Add missing
is_business_dayto documentation reference (#22338) - Improve interpolation documentation to clarify behavior of null values (#22274)
🛠️ Other improvements
- Install pytorch for 3.13 on Windows (#22356)
- Make interpolate fix more robust (#22421)
- Fix interpolate test (#22417)
- Reduce hot table size in debug mode (#22400)
- Replace intrinsic with non-intrinsic (#22401)
- Make streaming dispatch public (#22347)
- Update rustc to 'nightly-2025-04-19' (#22342)
- Update mozilla-actions/sccache-action (#22319)
- Purge old parquet and scan code (#22226)
- Add an
__all__entry to internal type definition module (#22254) - Add online skew/kurtosis algorithm for future use in rolling kernels (#22261)
- Add Polars Cloud 0.0.7 release notes (#22223)
- Change format name from list to implode (#22240)
- Make other parallel parquet modes filter afterwards (#22228)
- Close async reader issues (#22224)
- Add BinaryArrayBuilder (#22225)
Thank you to all our contributors for making this release possible!
@DavideCanton, @JakubValtar, @Jesse-Bakker, @MarcoGorelli, @NeejWeej, @Shoeboxam, @adamreeve, @alexander-beedie, @axellpadilla, @cmdlineluser, @coastalwhite, @d-reynol, @dongchao-1, @florian-klein, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @ritchie46, @stijnherfst and @yiteng-guo