Releases: pola-rs/polars
Python Polars 1.2.1
🚀 Performance improvements
✨ Enhancements
- Raise error instead of panic in unsupported serde (#17679)
- Expose Arrow C interface directly on Polars (#17696)
- Include file path option for NDJSON (#17681)
🐞 Bug fixes
- Use bytemuck in slice reinterpret for Parquet ArrayChunks (#17700)
- Remove non-existing names from
__all__(#17494) - Fix return type hint for LazyFrame
sinkmethods (#17698) - Propagate struct outer nullability eagerly (#17697)
- Address
read_databaseissue with batched reads from Snowflake (#17688) - Use ETag for HTTP file cache invalidation (#17684)
📖 Documentation
- Fixed default name for
value_countsmethods based on normalize parameter (#17685)
📦 Build system
- Pin
setuptoolsto fix failing CI (#17695)
🛠️ Other improvements
- Fix return type hint for LazyFrame
sinkmethods (#17698) - Pin
setuptoolsto fix failing CI (#17695) - Name tests so they actually run (#17690)
- Add reduce
ComputeNodein new streaming engine (#17389)
Thank you to all our contributors for making this release possible!
@5j9, @ByteNybbler, @MarcoGorelli, @alexander-beedie, @coastalwhite, @diegoglozano, @eitsupi, @nameexhaustion, @orlp, @ragyabraham, @ritchie46 and @ruihe774
Python Polars 1.2.0
🚀 Performance improvements
- Fix pathological perf issue in window-order-by (#17650)
- Cache path resolving of
scanfunctions (#17616) - Add
ArrayChunksto optimize codegen of BatchDecoder (#17632) - Rechunk before we go into grouped gathers (#17623)
- Cache schema resolve back to DSL (#17610)
- Add fastpath for when rounding by single constant durations (#17580)
- Improve parallelism in writing hive parquet (#17512)
- Support datetime in predicate during hive partition pruning (#17545)
- Batch nested embed parquet decoding (#17549)
- Batch nested Parquet decoding (#17542)
- Collect Parquet dictionary binary as view (#17475)
✨ Enhancements
- Hugging Face path expansion (#17665)
- Add DSL validation for cloud eligible check (#17287)
- Raise informative error message if non-IntoExpr is passed by name in *Frame.group_by (#17654)
- Add
infer_schemaparameter toread_csv/scan_csv(#17617) - Change API for writing partitioned Parquet to reduce code duplication (#17586)
- Cache schema resolve back to DSL (#17610)
- Expose
returns_scalarto map_elements (#17613) - Add option to include file path for Parquet, IPC, CSV scans (#17563)
- Support
describeon decimal (#15092) - Support datetime in predicate during hive partition pruning (#17545)
- Raise more informative error message for directories containing files with mixed extensions (#17480)
- Exclude empty files from directory/glob expansion (#17478)
- Support use of SQLAlchemy "Connectable" in
write_database(#17470)
🐞 Bug fixes
- Support duplicate expression names when calling ufuncs (#17641)
- Interpret %y consistently with Chrono in to_date/to_datetime/strptime (#17661)
- Fix explode invalid check (#17651)
- Raise for overlapping index/column names in pandas dataframes post string coercion (#17628)
- Expand brackets in async glob expansion (#17630)
- Fix row index disappearing after projection pushdown in NDJSON (#17631)
- Fix struct -> enum is_in (#17622)
- Don't needlessly unwrap in
pivot_schema(#17611) - Reject literal input in
sort_by_exprs()(#17606) - Don't enforce row order in join test results where not guaranteed (#17596)
- Bitmap collect into safety (#17588)
- Make schema picklable (#17524)
- Handle current position of file objects (#17543)
- Set
O_CLOEXECon duplicated file descriptor (#17537) - Method dt.truncate was sometimes returning incorrect results for pre-1970 datetimes (#17582)
- Defer path expansion until
collectin file scan methods (#17532) - Fix
retriesparameter in scan functions not taking effect when it was set to0(#17564) - Don't unwrap send attempt to oneshot channel (#17566)
- Fix scanning from HTTP cloud paths (#17571)
- Properly implement struct (#17522)
- Add right to lazyframe join docstring (#17529)
- Fix predicate pushdown for
.list.(get|gather)(#17511) - Make sure
scan_ipcdoes not go through fsspec (#17495) - Turn panic into error when serializing Object types (#17353)
- Fix struct expansion and raise on exclude (#17489)
- Normalize path in
sink_csv(#17476)
📖 Documentation
- Update
plotdocs to refer to docstrings (#17504) - Rename
str.lengthstostr.len_bytesin description text (#11577) (#17626) - Create example for
polars.Expr.bin.decode(#17508) - Add right join in the user guide (#17608)
- Adjust rendering of links in
read_database_uridocstring (#17536) - Update SQL examples in README (#17568)
- Fixup "deprecated" directive for
DataFrame.meltandLazyFrame.melt(#17530) - Add
write_parquet_partitioned(#17488) - Add example for writing hive partitioned parquet to user guide (#17483)
- Fix typo in Getting Started section of user guide (#17465)
🛠️ Other improvements
- Add DSL validation for cloud eligible check (#17287)
- Add
ArrayChunksto optimize codegen of BatchDecoder (#17632) - Move path logic to from
utilstopath_utilsin polars-io (#17635) - Fix struct gather (#17621)
- Back to StructChunked name (#17609)
- Remove unused
with_columnmethod of PyLazyFrame (#17607) - Re-enable struct related tests (#17597)
- Completely redo structure of Parquet decoder (#17589)
- Fix struct outer validity;fmt;is_in;cast;cmp (#17590)
- Add/fix version-gating in some SQLAlchemy and Pandas tests (#17538)
- Add
styleaccessor toDataFrame(#17502) - Remove unused
is_supported_cloudutil (#17493)
Thank you to all our contributors for making this release possible!
@Julian-J-S, @MarcoGorelli, @alexander-beedie, @anergictcell, @arnabanimesh, @brandon-b-miller, @cmdlineluser, @coastalwhite, @deanm0000, @eitsupi, @flisky, @henryharbeck, @itamarst, @jonaylor89, @moritzwilksch, @nameexhaustion, @orlp, @phi-friday, @r-brink, @rcorty, @ritchie46, @ruihe774, @stinodego, @tylerriccio33 and @wence-
Python Polars 1.1.0
🚀 Performance improvements
- Keep more parallelism when CSE plan cache hits (#17463)
- Batch parquet primitive decoding (#17462)
- Respect allow_threading in some more operators (#17450)
- Parallelize parquet metadata deserialization (#17399)
- Use underlying fileno for Python files when possible (#17315)
- Add future arg to Series.to_arrow (#17371)
✨ Enhancements
- Add "future" versioning (#17421)
- Apply slice pushdown immediately to in-memory frames (#17459)
- Support writing hive partitioned parquet (#17324)
- Add right join support (#17441)
- Support hive partitioning in
scan_ipc(#17434) - Improve error message when passing string key to
Series.__getitem__(#17408)
🐞 Bug fixes
- Handle DB cursor descriptions that contain more fields than the DBAPI2 standard (#17468)
- Fix decimal dyn float supertype (#17464)
- Verify the integrity of pandas column names before implied string conversion (#17433)
- Don't rechunk on phys_repr (#17461)
- Harden alchemy session for old sqlalchemy versions (#17366)
- Fix swapping rename schema (#17458)
- Make boolean reads consistent across all
read_excelengines (#17448) - Raise on oob decimal precision (#17445)
- Fix handling of TextIOWrapper in write_csv (#17328)
- Support sa session (#17435)
- Fix
from_pandasfor string columns with missing values (#17397) - Fix a global variable table-discovery edge case for the
SQLinterface (#17400) - Don't allow json inference method to be chunked/streaming (#17396)
- Set literal nesting to 0 (#17392)
- Fix scanning cloud paths with spaces (#17379)
- Fix
slicelength no longer allowingNone(#17372) - Fix typo in
SchemaErrorexception message (#17350) - Raise proper error for mismatching parquet schema instead of panicking (#17321)
📖 Documentation
- Add examples for scanning hive datasets to user guide (#17431)
- Update
partition_bydocstring to match new behavior (#17394) - Update
GroupBy.__iter__docstring to match new behavior (#17383)
📦 Build system
- Add support for NumPy 2.0 (#17384)
🛠️ Other improvements
- Add automated check for PR title formatting (#17412)
- Remove transmute for object store path (#17395)
- Fix Python version resolver in release drafter (#17390)
- Avoid use of
np.trapzin tests to prepare for NumPy 2.0 (#17387) - Avoid writing to disk when running
sink_csvtest (#17386)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @brunobbaraujo, @cmdlineluser, @coastalwhite, @dependabot, @dependabot[bot], @nameexhaustion, @orlp, @phi-friday, @ritchie46, @ruihe774, @sherlockbeard, @stinodego, @tylerriccio33 and @wence-
Rust Polars 0.41.3
🚀 Performance improvements
- Improve
uniqueperformance by adding RangedUniqueKernel for primitive arrays (#17166) - faster decode on Parquet HybridRLE (#17208)
✨ Enhancements
- Add SQL support for
NATURALjoins and theCOLUMNSfunction (#17295) - Add
str.extract_manyexpression (#17304) - Support '%' in pathnames for async scan (#17271)
- Support
SQLStruct/JSON field access operators (#17226) - Exclude directories from glob expansion result (#17174)
- Support SQL
ORDER BY ALLsyntax (#17212) - Support PostgreSQL
^@("starts with"), and~~,~~*,!~~,!~~*("like", "ilike") string-matching operators (#17251) - Support SQL
SELECT * ILIKEwildcard syntax (#17169) - Support
SQLtemporal functionsSTRFTIMEandSTRPTIME, and typed literal syntax (#17245) - Support date/datetime for hive parts (#17256)
- Expose some more information in translated expression IR to python (#17209)
- Allow no-op
round/ceil/flooron integer types (#17241) - Support loading from datasets where the hive columns are also stored in the file (#17203)
- Implement serde for Null columns (#17218)
- Support Decimal types in
write_csv/write_json(#14209) - Improve SQL support for array indexing, increase test coverage (#16972)
- Support reading byte stream split encoded floats and doubles in parquet (#17099)
- Add
float_scientificoption towrite_csv/sink_csv(#17111)
🐞 Bug fixes
- Raise proper error for mismatching parquet schema instead of panicking (#17321)
- Raise on invalid shape dataframe arithmetic (#17322)
- Fix panic in window case (#17320)
- Raise errors instead of panicking when
sink_csvfails (#17313) - Raise if join keys are passed to cross join (#17305)
- Don't null on oob in
list.getfor column index (#17276) - Fix issue where sliced PyArrow record batches were not handled correctly (#17058)
- Don't oob on nulls in
list.get(#17262) - Fix list getter with nulls (#17261)
- Respect
nulls_lastparameter in aggregatesort_by(#17249) - Fix literal slice in group by (#17242)
- Fix
DataFrame.top_knot handling nulls correctly (#17239) - Avoid using the regex dependency when the regex feature is not used (#17206)
- properly check the BMI2 uleb128 (#17191)
📖 Documentation
- Minor layout/terminology improvement for
selectorset ops (#17299) - Fix polars-plan docs.rs build (#17266)
- Add SQL docs for the
CASTandTRY_CASTfunctions (#17214)
🛠️ Other improvements
- Prefer ParquetError::oos to ParquetError::OutOfSpec (#17314)
- remove seqmacro and u8,u16 bitpack (#17290)
- Fix typo in join validation error message (#17296)
- Use typed
iterinlist.get(#17286) - add ability to have pipeline blockers in new streaming engine (#17247)
- Support date/datetime for hive parts (#17256)
- Add elementwise
selectandwith_columnsto new streaming engine (#17185) chrono's ParseErrorKind is now public (#17201)
Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JamesCE2001, @MarcoGorelli, @SeanTater, @adamreeve, @alexander-beedie, @coastalwhite, @datapythonista, @flisky, @itamarst, @jqnatividad, @lukeshingles, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @stinodego and @wence-
Python Polars 1.0.0
This is the first major release for Python Polars. Please check out the upgrade guide for help navigating the breaking changes when upgrading to this version.
💥 Breaking changes
- Change default engine for
read_excelto"calamine"(#17263) - Implement binary serialization of LazyFrame/DataFrame/Expr and set it as the default format (#17223)
- Streamline optional dependency definitions in
pyproject.toml(#17168) - Update
read/scan_parquetto disable Hive partitioning by default for file inputs (#17106) - Split
replacefunctionality into two separate methods (#16921) - Default to writing binview data to IPC, mark
compressionargument as keyword-only (#17084) - Remove re-export of type aliases (#17032)
- Rename
ModuleUpgradeRequiredandPolarsPanicErrorerror, removeInvalidAsserterror (#17033) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Properly apply
strictparameter in Series constructor (#16939) - Remove supertype definition of List and non-List types (#16918)
- Consistently convert to given time zone in Series constructor (#16828)
- Update
reshapeto return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get/gatheroperations (#16841) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Set
infer_schema_lengthas keyword-only argument instr.json_decode(#16835) - Update
set_sortedto only accept a single column (#16800) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update some error types to more appropriate variants (#15030)
- Scheduled removal of deprecated functionality (#16715)
- Change default
offsetingroup_by_dynamicfrom 'negativeevery' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sqlin favor of top-levelpl.sql(#16598) - Read 2D NumPy arrays as
Arraytype instead ofList(#16710) - Update
clipto no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetimeto default to microsecond precision for format specifiers"%f"and"%.f"(#13597) - Update resulting column names in
pivotwhen pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean,ewm_std, andewm_var(#15503) - Restrict casting for temporal data types (#14142)
- Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_jsonandDataFrame.write_json(#16550) - Update function signature of
nthto allow positional input of indices, removecolumnsparameter (#16510) - Rename struct fields of
rleoutput tolen/valueand update data type oflenfield (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_namesparameter toSeries.equalsand default toFalse(#16610)
⚠️ Deprecations
- Deprecate
LazyFrame.fetch(#17278) - Deprecate
sizeparameter in parametric testing strategies in favor ofmin_size/max_size(#17128) - Split
replacefunctionality into two separate methods (#16921) - Rename
DataFrame.melttounpivotand make parameters consistent withpivot(#17095) - Remove re-export of exceptions at top-level (#17059)
- Deprecate
dt.mean/dt.medianin favor ofmean/median(#16888) - Deprecate
LazyFrame.with_contextin favor of horizontal concatenation (#16860) - Rename parameter
descendingtoreverseintop_kmethods (#16817) - Rename
str.concattostr.joinand update default delimiter (#16790) - Deprecate
arctan2din favor ofarctan2(...).degrees()(#16786)
🚀 Performance improvements
- Rechunk before
group_by`iteration (#17302) - Improve
uniqueperformance by adding RangedUniqueKernel for primitive arrays (#17166) - Improve
uniqueperformance by creating UniqueKernel and improve bool implementation (#17160) - Default to writing binview data to IPC, mark
compressionargument as keyword-only (#17084) - Parallelize arrow conversion if binview -> large_bin (#17083)
- Garbage collect buffers in
if-then-elseview kernel (#16993) - Desugar
ANDfilter into multiple nodes (#16992) - Optimize generic
arg_sortof row-encoding (#16894) - Improve
rle_iditeration performance and set sorted flags (#16893) - Optimize
sortfor String and Binary types (#16871) - Use
split_atinsplit(#16865) - Use
split_atinstead of double slice in chunk splits. (#16856) - Don't rechunk in
align_if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort(#16808) - Speed up
dt.offset_by2x for constant durations (#16728) - Toggle coalesce in
joinif non-coalesced key isn't projected (#16677) - Make
dt.truncate1.5x faster wheneveryis just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
✨ Enhancements
- Add SQL support for
NATURALjoins and theCOLUMNSfunction (#17295) - Add
str.extract_manyexpression (#17304) - Change default engine for
read_excelto"calamine"(#17263) - Deprecate
LazyFrame.fetch(#17278) - Support '%' in pathnames for async scan (#17271)
- Support
SQLStruct/JSON field access operators (#17226) - Exclude directories from glob expansion result (#17174)
- Support SQL
ORDER BY ALLsyntax (#17212) - Support PostgreSQL
^@("starts with"), and~~,~~*,!~~,!~~*("like", "ilike") string-matching operators (#17251) - Support SQL
SELECT * ILIKEwildcard syntax (#17169) - Support
SQLtemporal functionsSTRFTIMEandSTRPTIME, and typed literal syntax (#17245) - Support date/datetime for hive parts (#17256)
- Implement binary serialization of LazyFrame/DataFrame/Expr and set it as the default format (#17223)
- Allow no-op
round/ceil/flooron integer types (#17241) - Support loading from datasets where the hive columns are also stored in the file (#17203)
- Implement serde for Null columns (#17218)
- Support Decimal types in
write_csv/write_json(#14209) - Add optional "default" to
get_columnDataFrame method (#17176) - Improve SQL support for array indexing, increase test coverage (#16972)
- Support reading byte stream split encoded floats and doubles in parquet (#17099)
- Add
float_scientificoption towrite_csv/sink_csv(#17111) - Support
Structfield selection in the SQL engine,RENAMEandREPLACEselect wildcard options (#17109) - Update
DataFrame.pivotto allowindex=Nonewhenvaluesis set (#17126) - Update
read/scan_parquetto disable Hive partitioning by default for file inputs (#17106) - Improve ipython autocomplete for LazyFrame and DataFrame (#17091)
- Split
replacefunctionality into two separate methods (#16921) - Improve schema inference for hive partitions (#17079)
- Rename
DataFrame.melttounpivotand make parameters consistent withpivot(#17095) - Print row index in
explainandshow_graph(#17074) - Support top-level
pl.colautocompletion for iPython (#17080) - Remove re-export of exceptions at top-level (#17059)
- Implement predicate and projection pushdown for
read_ndjson(#17068) - Allow (non-)coalescing in join_asof (#17066)
- Turn of coalescing and fix mutation of join on expressions (#17061)
- Expand NDJson glob into one SCAN (#17063)
- Do not parse hive partitions from user provided base directory path (#17055)
- Support directory paths in scans for Parquet, IPC and CSV (#17017)
- Implement general array equality checks (#17043)
- Add
strictparameter toDataFrame/LazyFrame.dropand fix behavior to default to True (#17044) - Rename
ModuleUpgradeRequiredandPolarsPanicErrorerror, removeInvalidAsserterror (#17033) - Add
rechunkparameter toread_delta(#16991) - allow experimental metadata use on release (#17005)
- Add simple version of
json_normalize(#17015) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Desugar
ANDfilter into multiple nodes (#16992) - Handle textio even if not correct (#16971)
- Properly apply
strictparameter in Series constructor (#16939) - Add SQL support for
INTERSECTandEXCEPTops (#16960) - Add
PerformanceWarningto LazyFrame properties (#16964) - Add
collect_schemamethod toLazyFrameandDataFrame(#16929) - Allow setting file cache TTL on a per-file basis (#16891)
- Support Decimal inputs for
lit(#16950) - Implement multiply and division for lhs duration (#16948)
- Raise on invalid temporal arithmetic (#16934)
- Always end with a in-memory sink on collect (#16928)
- Add
DataFrame.stylenamespace (#16809) - Add
Schemaclass (#16873) - Normalize
value_counts(#16917) - Implement equality for more Array types (#16902)
- Set up some of the infrastructure for new streaming engine (#16900)
- Cache downloaded cloud IPC files (#16892)
- Consistently convert to given time zone in Series constructor (#16828)
- Improve
read_csvSQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUESclause and inline renaming of columns in CTE & derived table definitions (#16851) - Support Python
Enumvalues inlit(#16858) - Convert to given time zone in
.str.to_datetimewhen values are offset-aware (#16742) - Update
reshapeto return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get/gatheroperations (#16841) - Support
SQL"SELECT" with no tables, optimise registration of globals (#16836) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACTandDATE_PARTSQL part abbreviations (#167...
Rust Polars 0.41.2
🚀 Performance improvements
- create UniqueKernel and improve bool implementation (#17160)
- parallel linearize in new streaming engine (#17050)
✨ Enhancements
- Support
Structfield selection in the SQL engine,RENAMEandREPLACEselect wildcard options (#17109) - POC metadata reading and writing (#17112)
- Use AsRef<Path> instead of PathBuf in sink_ methods (#17150)
- Update
DataFrame.pivotto allowindex=Nonewhenvaluesis set (#17126)
🐞 Bug fixes
- Use explicit turbofish to help rustc (#17159)
- Raise on invalid set dtypes (#17157)
- Fix corrupted reads for hive parts from cloud and projection pushdown failure on hive parts (#17152)
- Set intersection supertype (#17154)
- Fix feature gates (#17141)
- Fix feature gate (#17134)
🛠️ Other improvements
- MinMaxKernel in primitive/binary parquet stats (#17158)
- Add a test for AnonymousScan options (projection and slice pushdown) (#17149)
- MinMaxKernel in prim parquet stats (#17153)
- Add missing spaces in
cargo.toml(#17145) - Update rustc 2024-06-23 (#17135)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @coastalwhite, @datapythonista, @eitsupi, @mcrumiller, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Python Polars 1.0.0-rc.2
💥 Breaking changes
- Make
hive_partitioningparameter default toNone, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Split
replacefunctionality into two separate functions (#16921) - Default to writing binview data to IPC (#17084)
- Remove re-export of type aliases (#17032)
- Add
strictparameter toDataFrame/LazyFrame.dropand fix behavior to default to True (#17044) - Rename
ModuleUpgradeRequiredandPolarsPanicErrorerror, removeInvalidAsserterror (#17033) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Properly apply
strictparameter in Series constructor (#16939) - Remove supertype definition of List and non-List types (#16918)
- Consistently convert to given time zone in Series constructor (#16828)
- Update
reshapeto return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get/gatheroperations (#16841) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Set
infer_schema_lengthas keyword-only argument instr.json_decode(#16835) - Update
set_sortedto only accept a single column (#16800) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update some error types to more appropriate variants (#15030)
- Scheduled removal of deprecated functionality (#16715)
- Change default
offsetingroup_by_dynamicfrom 'negativeevery' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sqlin favor of top-levelpl.sql(#16598) - Read 2D NumPy arrays as
Arraytype instead ofList(#16710) - Update
clipto no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetimeto default to microsecond precision for format specifiers"%f"and"%.f"(#13597) - Update resulting column names in
pivotwhen pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean,ewm_std, andewm_var(#15503) - Restrict casting for temporal data types (#14142)
- Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_jsonandDataFrame.write_json(#16550) - Update function signature of
nthto allow positional input of indices, removecolumnsparameter (#16510) - Rename struct fields of
rleoutput tolen/valueand update data type oflenfield (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_namesparameter toSeries.equalsand default toFalse(#16610)
⚠️ Deprecations
- Deprecate
sizeparameter in parametric testing strategies in favor ofmin_size/max_size(#17128) - Split
replacefunctionality into two separate functions (#16921) - Rename
DataFrame.melttounpivotand make parameters consistent withpivot(#17095) - Remove re-export of exceptions at top-level (#17059)
- Deprecate
dt.mean/dt.medianin favor ofmean/median(#16888) - Deprecate
LazyFrame.with_contextin favor of horizontal concatenation (#16860) - Rename parameter
descendingtoreverseintop_kmethods (#16817) - Rename
str.concattostr.joinand update default delimiter (#16790) - Deprecate
arctan2din favor ofarctan2(...).degrees()(#16786)
🚀 Performance improvements
- create UniqueKernel and improve bool implementation (#17160)
- parallel linearize in new streaming engine (#17050)
- Default to writing binview data to IPC (#17084)
- Parallelize arrow conversion if binview -> large_bin (#17083)
- GC buffers in if_then_else view kernel (#16993)
- Desugar
ANDfilter into multiple nodes (#16992) - Optimize generic argsort of row-encoding (#16894)
- Improve rle_id iteration perf and set sorted flags (#16893)
- Optimize string/binary sort (#16871)
- Use
split_atinsplit(#16865) - Use
split_atinstead of double slice in chunk splits. (#16856) - Don't rechunk in
align_if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort(#16808) - Speed up
dt.offset_by2x for constant durations (#16728) - Toggle coalesce in
joinif non-coalesced key isn't projected (#16677) - Make
dt.truncate1.5x faster wheneveryis just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
✨ Enhancements
- Support reading byte stream split encoded floats and doubles in parquet (#17099)
- Add
float_scientificoption towrite_csv/sink_csv(#17111) - Support
Structfield selection in the SQL engine,RENAMEandREPLACEselect wildcard options (#17109) - Update
DataFrame.pivotto allowindex=Nonewhenvaluesis set (#17126) - Make
hive_partitioningparameter default toNone, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Improve ipython autocomplete for LazyFrame and DataFrame (#17091)
- Split
replacefunctionality into two separate functions (#16921) - Improve schema inference for hive partitions (#17079)
- Rename
DataFrame.melttounpivotand make parameters consistent withpivot(#17095) - print row index in explain + dot (#17074)
- Support top-level
pl.colautocompletion for iPython (#17080) - Remove re-export of exceptions at top-level (#17059)
- predicate + projection pushdown in NDJson (#17068)
- Allow (non-)coalescing in join_asof (#17066)
- Turn of coalescing and fix mutation of join on expressions (#17061)
- Expand NDJson glob into one SCAN (#17063)
- Do not parse hive partitions from user provided base directory path (#17055)
- Support directory paths in scans for Parquet, IPC and CSV (#17017)
- Implement general array equality checks (#17043)
- Add
strictparameter toDataFrame/LazyFrame.dropand fix behavior to default to True (#17044) - Rename
ModuleUpgradeRequiredandPolarsPanicErrorerror, removeInvalidAsserterror (#17033) - Add
rechunkparameter toread_delta(#16991) - allow experimental metadata use on release (#17005)
- first working prototype of new streaming engine (#16970)
- Add simple version of
json_normalize(#17015) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Desugar
ANDfilter into multiple nodes (#16992) - Handle textio even if not correct (#16971)
- Properly apply
strictparameter in Series constructor (#16939) - Add SQL support for
INTERSECTandEXCEPTops (#16960) - Add
PerformanceWarningto LazyFrame properties (#16964) - Add
collect_schemamethod toLazyFrameandDataFrame(#16929) - Allow setting file cache TTL on a per-file basis (#16891)
- Support Decimal inputs for
lit(#16950) - Implement multiply and division for lhs duration (#16948)
- Raise on invalid temporal arithmetic (#16934)
- Always end with a in-memory sink on collect (#16928)
- add style namespace (which defers to Great Tables) (#16809)
- Add
Schemaclass (#16873) - Normalize
value_counts(#16917) - add
eq/nefor moreFixedSizeLists (#16902) - setup skeleton (#16900)
- add fundamentals for new async-based streaming execution engine (#16884)
- Cache downloaded cloud IPC files (#16892)
- Consistently convert to given time zone in Series constructor (#16828)
- Improve
read_csvSQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUESclause and inline renaming of columns in CTE & derived table definitions (#16851) - Support Python
Enumvalues inlit(#16858) - convert to give time zone in
.str.to_datetimewhen values are offset-aware (#16742) - Update
reshapeto return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get/gatheroperations (#16841) - Support
SQL"SELECT" with no tables, optimise registration of globals (#16836) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACTandDATE_PARTSQL part abbreviations (#16767) - Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Update
set_sortedto only accept a single column (#16800) - Expose overflowing cast (#16805)
- Update
group_byiteration andpartition_byto always return tuple keys (#16793) - Support array arithmetic for equally sized shapes (#16791)
- Expedited removal of certain deprecated functionality (2) (#16779)
- Removal of
read_database_uripassthrough fromread_database(#16783) - Remove
pyxlsbengine fromread_database(#16784) - Add
check_orderparameter toassert_series_equal(#16778) - Enforce deprecation of keyword arguments as positional (#16755)
- Support cloud storage in
scan_csv(#16674) - Streamline SQL
INTERVALhandling and improve related error messages, updatesqlparser-rslib (#16744) - Support use of ordinal values in SQL
ORDER BYclause (#16745) - Support executing polars SQL against
pandasandpyarrowobjects (#16746) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated functionality from rolling methods (#16750)
- Update
date_rangeto no longer produce datetime ranges (#16734) - Mark
min_periodsas keyword-only forrollingmethods (#16738) - Remove deprecated
top_kparametersnulls_last,maintain_order, andmultithreaded(#16599) - Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LASTordering (#16711) - Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVALstrings (#16732) - Scheduled removal of deprecated functionality (2) (#16724)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of `off...
Python Polars 1.0.0-rc.1
💥 Breaking changes
- Make
hive_partitioningparameter default toNone, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Split
replacefunctionality into two separate functions (#16921) - Default to writing binview data to IPC (#17084)
- Do not parse hive partitions from user provided directory/glob path (#17055)
- Remove re-export of type aliases (#17032)
- Add
strictparameter toDataFrame/LazyFrame.dropand fix behavior to default to True (#17044) - Rename
ModuleUpgradeRequiredandPolarsPanicErrorerror, removeInvalidAsserterror (#17033) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Properly apply
strictparameter in Series constructor (#16939) - Remove supertype definition of List and non-List types (#16918)
- Consistently convert to given time zone in Series constructor (#16828)
- Update
reshapeto return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get/gatheroperations (#16841) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Set
infer_schema_lengthas keyword-only argument instr.json_decode(#16835) - Update
set_sortedto only accept a single column (#16800) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update some error types to more appropriate variants (#15030)
- Scheduled removal of deprecated functionality (#16715)
- Change default
offsetingroup_by_dynamicfrom 'negativeevery' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sqlin favor of top-levelpl.sql(#16598) - Read 2D NumPy arrays as
Arraytype instead ofList(#16710) - Update
clipto no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetimeto default to microsecond precision for format specifiers"%f"and"%.f"(#13597) - Update resulting column names in
pivotwhen pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean,ewm_std, andewm_var(#15503) - Restrict casting for temporal data types (#14142)
- Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_jsonandDataFrame.write_json(#16550) - Update function signature of
nthto allow positional input of indices, removecolumnsparameter (#16510) - Rename struct fields of
rleoutput tolen/valueand update data type oflenfield (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_namesparameter toSeries.equalsand default toFalse(#16610)
⚠️ Deprecations
- Deprecate
sizeparameter in parametric testing strategies in favor ofmin_size/max_size(#17128) - Split
replacefunctionality into two separate functions (#16921) - Rename
DataFrame.melttounpivotand make parameters consistent withpivot(#17095) - Remove re-export of exceptions at top-level (#17059)
- Deprecate
dt.mean/dt.medianin favor ofmean/median(#16888) - Deprecate
LazyFrame.with_contextin favor of horizontal concatenation (#16860) - Rename parameter
descendingtoreverseintop_kmethods (#16817) - Rename
str.concattostr.joinand update default delimiter (#16790) - Deprecate
arctan2din favor ofarctan2(...).degrees()(#16786)
🚀 Performance improvements
- Default to writing binview data to IPC (#17084)
- Parallelize arrow conversion if binview -> large_bin (#17083)
- GC buffers in if_then_else view kernel (#16993)
- Desugar
ANDfilter into multiple nodes (#16992) - Optimize generic argsort of row-encoding (#16894)
- Improve rle_id iteration perf and set sorted flags (#16893)
- Optimize string/binary sort (#16871)
- Use
split_atinsplit(#16865) - Use
split_atinstead of double slice in chunk splits. (#16856) - Don't rechunk in
align_if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort(#16808) - Speed up
dt.offset_by2x for constant durations (#16728) - Toggle coalesce in
joinif non-coalesced key isn't projected (#16677) - Make
dt.truncate1.5x faster wheneveryis just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
✨ Enhancements
- Update
DataFrame.pivotto allowindex=Nonewhenvaluesis set (#17126) - Make
hive_partitioningparameter default toNone, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Improve ipython autocomplete for LazyFrame and DataFrame (#17091)
- Split
replacefunctionality into two separate functions (#16921) - Improve schema inference for hive partitions (#17079)
- Rename
DataFrame.melttounpivotand make parameters consistent withpivot(#17095) - print row index in explain + dot (#17074)
- Support top-level
pl.colautocompletion for iPython (#17080) - Remove re-export of exceptions at top-level (#17059)
- predicate + projection pushdown in NDJson (#17068)
- Allow (non-)coalescing in join_asof (#17066)
- Turn of coalescing and fix mutation of join on expressions (#17061)
- Expand NDJson glob into one SCAN (#17063)
- Do not parse hive partitions from user provided directory/glob path (#17055)
- Support directory paths in scans for Parquet, IPC and CSV (#17017)
- Implement general array equality checks (#17043)
- Add
strictparameter toDataFrame/LazyFrame.dropand fix behavior to default to True (#17044) - Rename
ModuleUpgradeRequiredandPolarsPanicErrorerror, removeInvalidAsserterror (#17033) - Add
rechunkparameter toread_delta(#16991) - allow experimental metadata use on release (#17005)
- first working prototype of new streaming engine (#16970)
- Add simple version of
json_normalize(#17015) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Desugar
ANDfilter into multiple nodes (#16992) - Handle textio even if not correct (#16971)
- Properly apply
strictparameter in Series constructor (#16939) - Add SQL support for
INTERSECTandEXCEPTops (#16960) - Add
PerformanceWarningto LazyFrame properties (#16964) - Add
collect_schemamethod toLazyFrameandDataFrame(#16929) - Allow setting file cache TTL on a per-file basis (#16891)
- Support Decimal inputs for
lit(#16950) - Implement multiply and division for lhs duration (#16948)
- Raise on invalid temporal arithmetic (#16934)
- Always end with a in-memory sink on collect (#16928)
- add style namespace (which defers to Great Tables) (#16809)
- Add
Schemaclass (#16873) - Normalize
value_counts(#16917) - add
eq/nefor moreFixedSizeLists (#16902) - setup skeleton (#16900)
- add fundamentals for new async-based streaming execution engine (#16884)
- Cache downloaded cloud IPC files (#16892)
- Consistently convert to given time zone in Series constructor (#16828)
- Improve
read_csvSQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUESclause and inline renaming of columns in CTE & derived table definitions (#16851) - Support Python
Enumvalues inlit(#16858) - convert to give time zone in
.str.to_datetimewhen values are offset-aware (#16742) - Update
reshapeto return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get/gatheroperations (#16841) - Support
SQL"SELECT" with no tables, optimise registration of globals (#16836) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACTandDATE_PARTSQL part abbreviations (#16767) - Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Update
set_sortedto only accept a single column (#16800) - Expose overflowing cast (#16805)
- Update
group_byiteration andpartition_byto always return tuple keys (#16793) - Support array arithmetic for equally sized shapes (#16791)
- Expedited removal of certain deprecated functionality (2) (#16779)
- Removal of
read_database_uripassthrough fromread_database(#16783) - Remove
pyxlsbengine fromread_database(#16784) - Add
check_orderparameter toassert_series_equal(#16778) - Enforce deprecation of keyword arguments as positional (#16755)
- Support cloud storage in
scan_csv(#16674) - Streamline SQL
INTERVALhandling and improve related error messages, updatesqlparser-rslib (#16744) - Support use of ordinal values in SQL
ORDER BYclause (#16745) - Support executing polars SQL against
pandasandpyarrowobjects (#16746) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated functionality from rolling methods (#16750)
- Update
date_rangeto no longer produce datetime ranges (#16734) - Mark
min_periodsas keyword-only forrollingmethods (#16738) - Remove deprecated
top_kparametersnulls_last,maintain_order, andmultithreaded(#16599) - Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LASTordering (#16711) - Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVALstrings (#16732) - Scheduled removal of deprecated functionality (2) (#16724)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offsetarg intruncateandround(#16655) - Change default
offsetingroup_by_dynamicfrom 'negativeevery' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sqlin favor of top-levelpl.sql(#16598) - Read 2D NumPy arrays as
Arraytype instead ofList(#16710) - Upda...
Rust Polars 0.41.0
💥 Breaking changes
- Make
hive_partitioningparameter default toNone, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Split
replacefunctionality into two separate functions (#16921) - Rename
DataFrame.melttounpivotand make parameters consistent withpivot(#17095) - Default to writing binview data to IPC (#17084)
- Do not parse hive partitions from user provided directory/glob path (#17055)
- Add
strictparameter toDataFrame/LazyFrame.dropand fix behavior to default to True (#17044) - Remove supertype definition of List and non-List types (#16918)
- Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - move offset_by implementation from polars-plan to polars-time, rename feature from DateOffset to OffsetBy (#16796)
- Rename
str.concattostr.joinand update default delimiter (#16790) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update some error types to more appropriate variants (#15030)
- Change default
offsetingroup_by_dynamicfrom 'negativeevery' to 'zero' (#16658) - Update
clipto no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetimeto default to microsecond precision for format specifiers"%f"and"%.f"(#13597) - Update resulting column names in
pivotwhen pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean,ewm_std, andewm_var(#15503) - Restrict casting for temporal data types (#14142)
- Rename struct fields of
rleoutput tolen/valueand update data type oflenfield (#15249) - Add
check_namesparameter toSeries.equalsand default toFalse(#16610) - Deprecate
str.explodein favor ofstr.split("").explode()(#16508) - Deprecate
how="outer"join type in favour ofhow="full"(left/right are *also* outer joins) (#16417) - Change
DataFrame.is_empty()to checkheight == 0instead ofwidth == 0(#16351)
🚀 Performance improvements
- Default to writing binview data to IPC (#17084)
- Parallelize arrow conversion if binview -> large_bin (#17083)
- GC buffers in if_then_else view kernel (#16993)
- Desugar
ANDfilter into multiple nodes (#16992) - Optimize generic argsort of row-encoding (#16894)
- Improve rle_id iteration perf and set sorted flags (#16893)
- Optimize string/binary sort (#16871)
- Use
split_atinsplit(#16865) - Use
split_atinstead of double slice in chunk splits. (#16856) - Don't rechunk in
align_if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort(#16808) - Speed up
dt.offset_by2x for constant durations (#16728) - Toggle coalesce in
joinif non-coalesced key isn't projected (#16677) - Make
dt.truncate1.5x faster wheneveryis just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
- make truncate 4x faster in simple cases (#16615)
- Cache arena's (and conversion) in SQL context (#16566)
- Partial schema cache. (#16549)
- improved numeric fill_(forward/backward) (#16475)
- only rechunk once per aggregate (#16469)
- Fix pathological small chunk parquet writing (#16433)
✨ Enhancements
- Make
hive_partitioningparameter default toNone, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Split
replacefunctionality into two separate functions (#16921) - Improve schema inference for hive partitions (#17079)
- Rename
DataFrame.melttounpivotand make parameters consistent withpivot(#17095) - print row index in explain + dot (#17074)
- Support top-level
pl.colautocompletion for iPython (#17080) - predicate + projection pushdown in NDJson (#17068)
- Allow (non-)coalescing in join_asof (#17066)
- Turn of coalescing and fix mutation of join on expressions (#17061)
- Expand NDJson glob into one SCAN (#17063)
- Do not parse hive partitions from user provided directory/glob path (#17055)
- Support directory paths in scans for Parquet, IPC and CSV (#17017)
- Implement general array equality checks (#17043)
- Add
strictparameter toDataFrame/LazyFrame.dropand fix behavior to default to True (#17044) - allow experimental metadata use on release (#17005)
- first working prototype of new streaming engine (#16970)
- Desugar
ANDfilter into multiple nodes (#16992) - use min/max metadata on debug builds with
POLARS_METADATA_FLAGS=extensive(#16963) - Add SQL support for
INTERSECTandEXCEPTops (#16960) - Allow setting file cache TTL on a per-file basis (#16891)
- Implement multiply and division for lhs duration (#16948)
- Raise on invalid temporal arithmetic (#16934)
- Always end with a in-memory sink on collect (#16928)
- Normalize
value_counts(#16917) - add
eq/nefor moreFixedSizeLists (#16902) - setup skeleton (#16900)
- add fundamentals for new async-based streaming execution engine (#16884)
- Cache downloaded cloud IPC files (#16892)
- Improve
read_csvSQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUESclause and inline renaming of columns in CTE & derived table definitions (#16851) - convert to give time zone in
.str.to_datetimewhen values are offset-aware (#16742) - Support
SQL"SELECT" with no tables, optimise registration of globals (#16836) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACTandDATE_PARTSQL part abbreviations (#16767) - Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Expose overflowing cast (#16805)
- Expose a few more expression nodes in the expression IR (#16781)
- Support array arithmetic for equally sized shapes (#16791)
- Support cloud storage in
scan_csv(#16674) - Streamline SQL
INTERVALhandling and improve related error messages, updatesqlparser-rslib (#16744) - Support use of ordinal values in SQL
ORDER BYclause (#16745) - Support executing polars SQL against
pandasandpyarrowobjects (#16746) - add
envlocked metadata functions (#16719) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update
date_rangeto no longer produce datetime ranges (#16734) - Remove deprecated
top_kparametersnulls_last,maintain_order, andmultithreaded(#16599) - Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LASTordering (#16711) - Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVALstrings (#16732) - Enforce deprecation of
offsetarg intruncateandround(#16655) - eliminate ProjectionExprs and handle CSE by stacking extra columns (#16682)
- Change default
offsetingroup_by_dynamicfrom 'negativeevery' to 'zero' (#16658) - Update
clipto no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetimeto default to microsecond precision for format specifiers"%f"and"%.f"(#13597) - Update resulting column names in
pivotwhen pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean,ewm_std, andewm_var(#15503) - Restrict casting for temporal data types (#14142)
- Add many more auto-inferable datetime formats for
str.to_datetime(#16634) - Rename struct fields of
rleoutput tolen/valueand update data type oflenfield (#15249) - Add
check_namesparameter toSeries.equalsand default toFalse(#16610) - Dedicated
SQLInterfaceandSQLSyntaxerrors (#16635) - Add
DIVfunction support to the SQL interface (#16678) - add additional control to
write_parquet::statisticsparameter (#16575) - Support non-coalescing streaming left join (#16672)
- Allow wildcard and exclude before struct expansions (#16671)
- Support per-column
nulls_laston sort operations (#16639) - Add
split_atmethod to arrowArray(#16620) - Initial support for SQL
ARRAYliterals and theUNNESTtable function (#16330) - Don't allow
struct.with_fieldsin grouping (#16629) - Add SQL support for
TRY_CASTfunction (#16589) - add fuzzer for expressions (#16581)
- handle CSE dtypes in NodeTraverser.get_dtype (#16552)
- check if by column is sorted, rather than just checking sorted flag, in
group_by_dynamic,upsample, androlling(#16494) - Add general metadata structure to
ChunkedArray(#16399) - Add
is_column_selection()to expression meta, enhanceexpand_selector(#16479) - NDarray/Tensor support (#16466)
- Allow designation of a custom name for the
value_counts"count" column (#16434) - Default rechunk=False for read_parquet (#16427)
- Add
fieldexpression as selector with an struct scope (#16402) - Field expansion renaming (#16397)
- add cluster_with_columns plan optimization (#16274)
- Change
DataFrame.is_empty()to checkheight == 0instead ofwidth == 0(#16351) - add Expr.interpolate_by (#16313)
🐞 Bug fixes
- Expand i128 primitive type match (#17076)
- Fix decompress_impl for csv with n_rows set (#17118)
- adds "polars-ops/timezones" dependency for "timezones" feature (#17115)
- Fix incorrect window std for chunked series (#17110)
- make
GetOutput::get_fieldfallible (#17114) - bubble error when no available bitrepr (#17116)
- Fix melt panic (#17088)
- Exclude index from expansion in rolling/group_by_dynamic (#17086)
- fix #17043 binary compare (#17052)
- Fix oob of join with literals and empty table (#17047)
- Don't silently accept multi-table FROM clauses (implicit JOIN syntax) (#17028)
- fix get categories on multiple row groups (#17041...
Python Polars 1.0.0-beta.1
💥 Breaking changes
- Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Properly apply
strictparameter in Series constructor (#16939) - Remove supertype definition of List and non-List types (#16918)
- Consistently convert to given time zone in Series constructor (#16828)
- Update
reshapeto return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get/gatheroperations (#16841) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Set
infer_schema_lengthas keyword-only argument instr.json_decode(#16835) - Update
set_sortedto only accept a single column (#16800) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update some error types to more appropriate variants (#15030)
- Scheduled removal of deprecated functionality (#16715)
- Change default
offsetingroup_by_dynamicfrom 'negativeevery' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sqlin favor of top-levelpl.sql(#16598) - Read 2D NumPy arrays as
Arraytype instead ofList(#16710) - Update
clipto no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetimeto default to microsecond precision for format specifiers"%f"and"%.f"(#13597) - Update resulting column names in
pivotwhen pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean,ewm_std, andewm_var(#15503) - Restrict casting for temporal data types (#14142)
- Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_jsonandDataFrame.write_json(#16550) - Update function signature of
nthto allow positional input of indices, removecolumnsparameter (#16510) - Rename struct fields of
rleoutput tolen/valueand update data type oflenfield (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_namesparameter toSeries.equalsand default toFalse(#16610)
⚠️ Deprecations
- Deprecate
dt.mean/dt.medianin favor ofmean/median(#16888) - Deprecate
LazyFrame.with_contextin favor of horizontal concatenation (#16860) - Rename parameter
descendingtoreverseintop_kmethods (#16817) - Rename
str.concattostr.joinand update default delimiter (#16790) - Deprecate
arctan2din favor ofarctan2(...).degrees()(#16786)
🚀 Performance improvements
- GC buffers in if_then_else view kernel (#16993)
- Desugar
ANDfilter into multiple nodes (#16992) - Optimize generic argsort of row-encoding (#16894)
- Improve rle_id iteration perf and set sorted flags (#16893)
- Optimize string/binary sort (#16871)
- Use
split_atinsplit(#16865) - Use
split_atinstead of double slice in chunk splits. (#16856) - Don't rechunk in
align_if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort(#16808) - Speed up
dt.offset_by2x for constant durations (#16728) - Toggle coalesce in
joinif non-coalesced key isn't projected (#16677) - Make
dt.truncate1.5x faster wheneveryis just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
✨ Enhancements
- allow experimental metadata use on release (#17005)
- first working prototype of new streaming engine (#16970)
- Add simple version of
json_normalize(#17015) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Desugar
ANDfilter into multiple nodes (#16992) - Handle textio even if not correct (#16971)
- Properly apply
strictparameter in Series constructor (#16939) - Add SQL support for
INTERSECTandEXCEPTops (#16960) - Add
PerformanceWarningto LazyFrame properties (#16964) - Add
collect_schemamethod toLazyFrameandDataFrame(#16929) - Allow setting file cache TTL on a per-file basis (#16891)
- Support Decimal inputs for
lit(#16950) - Implement multiply and division for lhs duration (#16948)
- Raise on invalid temporal arithmetic (#16934)
- Always end with a in-memory sink on collect (#16928)
- add style namespace (which defers to Great Tables) (#16809)
- Add
Schemaclass (#16873) - Normalize
value_counts(#16917) - add
eq/nefor moreFixedSizeLists (#16902) - setup skeleton (#16900)
- add fundamentals for new async-based streaming execution engine (#16884)
- Cache downloaded cloud IPC files (#16892)
- Consistently convert to given time zone in Series constructor (#16828)
- Improve
read_csvSQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUESclause and inline renaming of columns in CTE & derived table definitions (#16851) - Support Python
Enumvalues inlit(#16858) - convert to give time zone in
.str.to_datetimewhen values are offset-aware (#16742) - Update
reshapeto return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get/gatheroperations (#16841) - Support
SQL"SELECT" with no tables, optimise registration of globals (#16836) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACTandDATE_PARTSQL part abbreviations (#16767) - Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Update
set_sortedto only accept a single column (#16800) - Expose overflowing cast (#16805)
- Update
group_byiteration andpartition_byto always return tuple keys (#16793) - Support array arithmetic for equally sized shapes (#16791)
- Expedited removal of certain deprecated functionality (2) (#16779)
- Removal of
read_database_uripassthrough fromread_database(#16783) - Remove
pyxlsbengine fromread_database(#16784) - Add
check_orderparameter toassert_series_equal(#16778) - Enforce deprecation of keyword arguments as positional (#16755)
- Support cloud storage in
scan_csv(#16674) - Streamline SQL
INTERVALhandling and improve related error messages, updatesqlparser-rslib (#16744) - Support use of ordinal values in SQL
ORDER BYclause (#16745) - Support executing polars SQL against
pandasandpyarrowobjects (#16746) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated functionality from rolling methods (#16750)
- Update
date_rangeto no longer produce datetime ranges (#16734) - Mark
min_periodsas keyword-only forrollingmethods (#16738) - Remove deprecated
top_kparametersnulls_last,maintain_order, andmultithreaded(#16599) - Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LASTordering (#16711) - Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVALstrings (#16732) - Scheduled removal of deprecated functionality (2) (#16724)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offsetarg intruncateandround(#16655) - Change default
offsetingroup_by_dynamicfrom 'negativeevery' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sqlin favor of top-levelpl.sql(#16598) - Read 2D NumPy arrays as
Arraytype instead ofList(#16710) - Update
clipto no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetimeto default to microsecond precision for format specifiers"%f"and"%.f"(#13597) - Update resulting column names in
pivotwhen pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean,ewm_std, andewm_var(#15503) - Restrict casting for temporal data types (#14142)
- Add many more auto-inferable datetime formats for
str.to_datetime(#16634) - Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_jsonandDataFrame.write_json(#16550) - Update function signature of
nthto allow positional input of indices, removecolumnsparameter (#16510) - Rename struct fields of
rleoutput tolen/valueand update data type oflenfield (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_namesparameter toSeries.equalsand default toFalse(#16610) - Dedicated
SQLInterfaceandSQLSyntaxerrors (#16635) - Add
DIVfunction support to the SQL interface (#16678) - Support non-coalescing streaming left join (#16672)
- Allow wildcard and exclude before struct expansions (#16671)
🐞 Bug fixes
- properly catch not found explode cols (#17020)
- Correctly convert data frames to NumPy for C index order (#17000)
- Raise on invalid arithmetic shapes (#16986)
- Don't pushdown predicates in cross join if the refer to both tables (#16983)
- Fix projection pushdown with literal joins (#16981)
- Fix edge case in DataFrame constructor data orientation inference (#16975)
- Raise on list of objects (#16959)
- Handle strictness for Decimal Series construction (#15309)
- Don't panic in object to anyvalue (#16957)
- properly set
FAST_EXPLODE_LISTmetadata (#16951) - Raise informative error when writing object to file (#16954)
- Remove supertype definition of List and non-List types (#16918)
- Remove unwrap in
extend()(#16890) - Fix
should_rechunkcheck (#16852) - Ensure
read_excelandread_odsreturn identical frames across all engines when given empty spreadsheet tables (#16802) - Consistent behaviour when "infer_schema_length=0" for
read_excel(#16840) - Standardised additional SQL interface errors (#16829)
- Ensure that splitted ChunkedArray also flattens chunks (#16837)
- Reduce needless panics in comparisons (#16831)
-...