Python Polars 1.27.0
💥 Breaking changes
- Make bottom interval closed in
hist(#22090) - Change Partition API to
base_pathandfile_path(#21888)
🚀 Performance improvements
- Add CSE to streaming groupby (#22196)
- Speed-up new streaming predicate filtering (#22179)
- Speedup new-streaming file row count (#22169)
- Fix quadratic behavior when casting Enums (#22008)
- Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
- Fast path for empty inner join (#21965)
- Add native semi/anti join in new streaming engine (#21937)
- Cache regex compilation globally (#21929)
✨ Enhancements
- Add
SPLIT_PARTstring function to the SQL interface (#22158) - Allow scalar expr in
Expr.diff(#22142) - Support additional unsigned int aliases in the SQL interface (#22127)
- Add
STRING_TO_ARRAYfunction to the SQL interface (#22129) - Add dt.is_business_day (#21776)
- Add an
eagerparameter topl.cov(#22098) - Add support for
Int128parsing/recognition to the SQL interface (#22104) - Add an
eagerparameter topl.coalesce(#22092) - Add an
eagerparameter topl.corr(#22097) - Allow sinking to abstract python
ioandfsclasses (#21987) - Add
add_alp_optimize_exprstoIRBuilder(#22061) - Add
cat.slice(#21971) - Support growing schema if line lenght increases during csv schema inference (#21979)
- Replace thread unsafe
GilOnceCellwithMutex(#21927) - Support modified dsl in file cache (#21907)
🐞 Bug fixes
- Implode in agg (#22197)
- Reduce GIL hold time for IO plugins in new-streaming (#22186)
- Enhance predicate validation and cast safety in
join_where(#22112) - Handle Parquet with compressed empty DataPage v2 (#22172)
- Schema error during lowering (#22175)
- Rewrite unroll of overlapping groups to mitigate out of range index panic (#22072)
- Incorrect rounding for very large/small numbers (#22173)
- Allow set input to
list.set_*operations (#22163) - Deadlock in join due to rayon nested task-stealing (#22159)
- Mark
Expr.repeat_byas elementwise (#22068) - Fix csv serializer panic by supporting ScalarColumn in as_single_chunk (#22146)
- Raise an error if a number doesn't have associated unit in duration strings (#22035)
- Add
i128as supertype to boolean (#22138) - Fix panic when constructing DF from pyarrow due to duplicate field names (#22114)
- Add broadcasts and error messages for many elementwise operations (#22130)
- Throw error for
n=0onlist.gather_every(#22122) - Throw error for unsupported rolling operations (#22121)
- Error on unequal length
str.to_integerarguments (#22100) - Make bottom interval closed in
hist(#22090) - Relative path resolution for plugin libraries (#21911)
- Avoiding panic with striptime for out-of-bounds dates (#21208)
- Join revmaps for categoricals in
merge_sorted(#21976) - Fix glob expansion matching extra files (#21991)
- Ensure SQL dot-notation for nested column fields resolves correctly (#22109)
- Parquet filter performance regression from multiscan dispatch (#22116)
- Panic for unequal length
ewm_mean_byargs (#22093) - Add scalarity checks to
pl.repeat(#22088) - Type check
nparameter ofpl.repeat(#22071) - Mark
bitwise_{count,leading,trailing}_{ones,zeros}as elementwise (#22044) - Mark
pl.*_rangesfunctions correctly as element-wise (#22059) - Correctly type check
pl.arctan2(#22060) - Mark
pl.business_day_countas elementwise (#22055) - Check input python type for
str.extract_groups(#22032) - Check types for
fill_charinstr.pad_{start,end}(#22036) - Mark
str.to_decimalproperly as non-elementwise (#22040) - Documented return type for
bin.encodeandbin.decode(#22022) - Revert #22017 and improve block(_in_place)_on doc comment (#22031)
- Remove outdated depth warning (#22030)
- Expression pl.concat was incorrectly marked as elementwise (#22019)
- Use block_in_place_on to start streaming (#22017)
- Panic on empty aggregation in streaming (#22016)
- Error instead of panick for invalid durations in
dt.offset_by()anddt.round()(#21982) - Raise error instead of silently appending NULL in NDJSON parsing (#21953)
- Ensure AV is static before pushing to row buffer (#21967)
- Deadlock in new-streaming multiplexer (#21963)
- Release GIL in
collect_with_callback(#21941) - Panic in new RegexCache (#21935)
- Type hint of
cs.exclude()isSelectorTypeinstead ofExpr(#21892) - Add correct deprecation warning for .str.concat (#21666)
- Use absolute paths by defaults for plugins (#21904)
📖 Documentation
- Add user guide section on working with Sheets in Colab (#22161)
- Update distributed engine docs (#22128)
- Add Polars Cloud release notes (#22021)
- Remove trailing space in settings POLARS_CLOUD_CLIENT_ID (#21995)
- Fix typo (#21954)
- Fix 'pickleable' typo in docs (#21938)
- Change ctx to compute=ctx for all remote query examples (#21930)
🛠️ Other improvements
- Remove old
MultiScanExecfor in-memory (#22184) - Separate
FunctionOptionsfrom DSL calls (#22133) - Undeprecate
backward_fillandforward_fill(#22156) - Handle conversion of Duration specially in pyir (#22101)
- Deprecate duplicate
backward_fillandforward_fillinterface (#22083) - Solve clippy lints for 1.86 (#22102)
- Remove rust exclusive
MaxBoundandMinBoundfill strategies (#22063) - Change Partition API to
base_pathandfile_path(#21888) - Fix pydantic model_fields deprecation (#21958)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @EnricoMi, @Jacob640, @JakubValtar, @MarcoGorelli, @MaxJackson, @alexander-beedie, @amotzop, @anath2, @bschoenmaeckers, @cnpryer, @coastalwhite, @dependabot[bot], @eitsupi, @etiennebacher, @hemanth94, @kdn36, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @rgertenbach, @ritchie46, @sebasv, @silannisik, @stijnherfst, @wence-, @zachlefevre and dependabot[bot]