This release consists of 475 commits from 114 contributors. See credits at the end of this changelog for more information.
See the upgrade guide for information on how to upgrade from previous versions.
Breaking changes:
- Allow logical optimizer to be run without evaluating now() & refactor SimplifyInfo #19505 (adriangb)
- Make default ListingFilesCache table scoped #19616 (jizezhang)
- chore(deps): Update sqlparser to 0.60 #19672 (Standing-Man)
- Do not require mut in memory reservation methods #19759 (gabotechs)
- refactor: make PhysicalExprAdatperFactory::create fallible #20017 (niebayes)
- Add
ScalarValue::RunEndEncodedvariant #19895 (Jefffrey) - minor: remove unused crypto functions & narrow public API #20045 (Jefffrey)
- Wrap immutable plan parts into Arc (make creating
ExecutionPlans less costly) #19893 (askalt) - feat: Support planning subqueries with OuterReferenceColumn belongs to non-adjacent outer relations #19930 (mkleen)
- Remove the statistics() api in execution plan #20319 (xudong963)
- Remove recursive const check in
simplify_const_expr#20234 (AdamGS) - Cache
PlanProperties, add fast-path forwith_new_children#19792 (askalt) - [branch-53] feat: parse
JsonAccessas a binary operator, addOperator::Colon#20717 (Samyak2)
Performance related:
- perf: optimize
HashTableLookupExpr::evaluate#19602 (UBarney) - perf: Improve performance of
split_part#19570 (andygrove) - Optimize
Nullstate/ accumulators #19625 (Dandandan) - perf: optimize
NthValuewhenignore_nullsis true #19496 (mzabaluev) - Optimize
concat/concat_wsscalar path by pre-allocating memory #19547 (lyne7-sc) - perf: optimize left function by eliminating double chars() iteration #19571 (viirya)
- perf: Optimize floor and ceil scalar performance #19752 (kumarUjjawal)
- perf: improve performance of
spark hexfunction #19738 (lyne7-sc) - perf: Optimize initcap scalar performance #19776 (kumarUjjawal)
- Row group limit pruning for row groups that entirely match predicates #18868 (xudong963)
- perf: Optimize trunc scalar performance #19788 (kumarUjjawal)
- perf: optimize
spark_hexdictionary path by avoiding dictionary expansion #19832 (lyne7-sc) - Add FilterExecBuilder to avoid recomputing properties multiple times #19854 (adriangb)
- perf: Optimize round scalar performance #19831 (kumarUjjawal)
- perf: Optimize signum scalar performance with fast path #19871 (kumarUjjawal)
- perf: Optimize scalar performance for cot #19888 (kumarUjjawal)
- perf: Optimize scalar fast path for iszero #19919 (kumarUjjawal)
- Misc hash / hash aggregation performance improvements #19910 (Dandandan)
- perf: Optimize scalar path for ascii function #19951 (kumarUjjawal)
- perf: Optimize factorial scalar path #19949 (kumarUjjawal)
- Speedup statistics_from_parquet_metadata #20004 (Dandandan)
- perf: improve performance of
array_remove,array_remove_nandarray_remove_allfunctions #19996 (lyne7-sc) - perf: Optimize ArrowBytesViewMap with direct view access #19975 (Tushar7012)
- perf: Optimize repeat function for scalar and array fast #19976 (kumarUjjawal)
- perf: Push down join key filters for LEFT/RIGHT/ANTI joins #19918 (nuno-faria)
- perf: Optimize scalar path for chr function #20073 (kumarUjjawal)
- perf: improve performance of
array_repeatfunction #20049 (lyne7-sc) - perf: optimise right for byte access and StringView #20069 (theirix)
- Optimize
PhysicalExprSimplifier#20111 (AdamGS) - Improve performance of
CASE WHEN x THEN y ELSE NULLexpressions #20097 (pepijnve) - perf: Optimize scalar fast path of to_hex function #20112 (kumarUjjawal)
- perf: Optimize scalar fast path & write() encoding for sha2 #20116 (kumarUjjawal)
- perf: improve performance of
array_union/array_intersectwith batched row conversion #20243 (lyne7-sc) - perf: various optimizations to eliminate branch misprediction in hash_utils #20168 (notashes)
- perf: Optimize strpos() for ASCII-only inputs #20295 (neilconway)
- perf: Optimize compare_element_to_list #20323 (neilconway)
- perf: Optimize replace() fastpath by avoiding alloc #20344 (neilconway)
- perf: optimize
array_distinctwith batched row conversion #20364 (lyne7-sc) - perf: Optimize scalar fast path of atan2 #20336 (kumarUjjawal)
- perf: Optimize concat()/concat_ws() UDFs #20317 (neilconway)
- perf: Optimize translate() UDF for scalar inputs #20305 (neilconway)
- perf: Optimize
array_has()for scalar needle #20374 (neilconway) - perf: Optimize lpad, rpad for ASCII strings #20278 (neilconway)
- perf: Optimize trim UDFs for single-character trims #20328 (neilconway)
- perf: Optimize scalar fast path for
regexp_likeand rejects g inside combined flags like ig #20354 (kumarUjjawal) - perf: Use zero-copy slice instead of take kernel in sort merge join #20463 (andygrove)
- perf: Optimize
initcap()#20352 (neilconway) - perf: Fix quadratic behavior of
to_array_of_size#20459 (neilconway) - perf: Optimize
array_has_any()with scalar arg #20385 (neilconway) - perf: Use Hashbrown for array_distinct #20538 (neilconway)
- perf: Cache num_output_rows in sort merge join to avoid O(n) recount #20478 (andygrove)
- perf: Optimize heap handling in TopK operator #20556 (AdamGS)
- perf: Optimize
array_positionfor scalar needle #20532 (neilconway) - perf: Use Arrow vectorized eq kernel for IN list with column references #20528 (zhangxffff)
- perf: Optimize
array_agg()usingGroupsAccumulator#20504 (neilconway) - perf: Optimize
array_to_string(), support more types #20553 (neilconway) - [branch-53] perf: sort replace free()->try_grow() pattern with try_resize() to reduce memory pool interactions #20733 (mbutrovich)
Implemented enhancements:
- feat: add list_files_cache table function for
datafusion-cli#19388 (jizezhang) - feat: implement metrics for AsyncFuncExec #19626 (feniljain)
- feat: split BatchPartitioner::try_new into hash and round-robin constructors #19668 (mohit7705)
- feat: add Time type support to date_trunc function #19640 (kumarUjjawal)
- feat: Allow log with non-integer base on decimals #19372 (Yuvraj-cyborg)
- feat(spark): implement array_repeat function #19702 (cht42)
- feat(spark): Implement collect_list/collect_set aggregate functions #19699 (cht42)
- feat: implement Spark size function for arrays and maps #19592 (CuteChuanChuan)
- feat: support Set Comparison Subquery #19109 (waynexia)
- feat(spark): implement array slice function #19811 (cht42)
- feat(spark): implement substring function #19805 (cht42)
- feat: Add support for 'isoyear' in date_part function #19821 (cht42)
- feat: support
SELECT DISTINCT id FROM t ORDER BY id LIMIT nquery use GroupedTopKAggregateStream #19653 (haohuaijin) - feat(spark): add trunc, date_trunc and time_trunc functions #19829 (cht42)
- feat(spark): implement Spark
date_difffunction #19845 (cht42) - feat(spark): implement add_months function #19711 (cht42)
- feat: support pushdown alias on dynamic filter with
ProjectionExec#19404 (discord9) - feat(spark): add
base64andunbase64functions #19968 (cht42) - feat: Show the number of matched Parquet pages in
DataSourceExec#19977 (nuno-faria) - feat(spark): Add
SessionStateBuilderSparkto datafusion-spark #19865 (cht42) - feat(spark): implement
from/to_utc_timestampfunctions #19880 (cht42) - feat(spark): implement
StringViewforSparkConcat#19984 (aryan-212) - feat(spark): add unix date and timestamp functions #19892 (cht42)
- feat: implement protobuf converter trait to allow control over serialization and deserialization processes #19437 (timsaucer)
- feat: optimise copying in
leftfor Utf8 and LargeUtf8 #19980 (theirix) - feat: support Spark-compatible abs math function part 2 - ANSI mode #18828 (hsiang-c)
- feat: add AggregateMode::PartialReduce for tree-reduce aggregation #20019 (njsmith)
- feat: add ExpressionPlacement enum for optimizer expression placement decisions #20065 (adriangb)
- feat: support f16 in coercion logic #18944 (Jefffrey)
- feat: unify left and right functions and benches #20114 (theirix)
- feat(spark): Adds negative spark function #20006 (SubhamSinghal)
- feat: support limited deletion #20137 (askalt)
- feat: Pushdown filters through
UnionExecnodes #20145 (haohuaijin) - feat: support Spark-compatible
string_to_mapfunction #20120 (unknowntpo) - feat: Add
partition_stats()forEmptyExec#20203 (jonathanc-n) - feat: add ExtractLeafExpressions optimizer rule for get_field pushdown #20117 (adriangb)
- feat: Push limit into hash join #20228 (jonathanc-n)
- feat: Optimize hash util for
MapArray#20179 (jonathanc-n) - feat: Implement Spark
bitmap_bit_positionfunction #20275 (kazantsev-maksim) - feat: support sqllogictest output coloring #20368 (theirix)
- feat: support Spark-compatible
json_tuplefunction #20412 (CuteChuanChuan) - feat: Implement Spark
bitmap_bucket_numberfunction #20288 (kazantsev-maksim) - feat: support
arrays_zipfunction #20440 (comphead) - feat: Implement Spark
binfunction #20479 (kazantsev-maksim) - feat: support extension planner for
TableScan#20548 (linhr)
Fixed bugs:
- fix: Return Int for Date - Date instead of duration #19563 (kumarUjjawal)
- fix: DynamicFilterPhysicalExpr violates Hash/Eq contract #19659 (kumarUjjawal)
- fix: unnest struct field with an alias failed with internal error #19698 (kumarUjjawal)
- fix(accumulators): preserve state in evaluate() for window frame queries #19618 (GaneshPatil7517)
- fix: Don't treat quoted column names as placeholder variables in SQL #19339 (pmallex)
- fix: enhance CTE resolution with identifier normalization #19519 (kysshsy)
- feat: Add null-aware anti join support #19635 (viirya)
- fix: expose
ListFilesEntry#19804 (lonless9) - fix: trunc function with precision uses round instead of trunc semantics #19794 (kumarUjjawal)
- fix: calculate total seconds from interval fields for
extract(epoch)#19807 (lemorage) - fix: predicate cache stats calculation #19561 (feniljain)
- fix: preserve state in DistinctMedianAccumulator::evaluate() for window frame queries #19887 (kumarUjjawal)
- fix: null in array_agg with DISTINCT and IGNORE #19736 (davidlghellin)
- fix: union should retrun error instead of panic when input schema's len different #19922 (haohuaijin)
- fix: change token consumption to pick to test on EOF in parser #19927 (askalt)
- fix: maintain inner list nullability for
array_sort#19948 (Jefffrey) - fix: Make
generate_seriesreturn an empty set with invalid ranges #19999 (nuno-faria) - fix: return correct length array for scalar null input to
calculate_binary_math#19861 (Jefffrey) - fix: respect DataFrameWriteOptions::with_single_file_output for paths without extensions #19931 (kumarUjjawal)
- fix: correct weight handling in approx_percentile_cont_with_weight #19941 (sesteves)
- fix: The limit_pushdown physical optimization rule removes limits in some cases leading to incorrect results #20048 (masonh22)
- Add duplicate name error reproducer #20106 (gabotechs)
- fix: filter pushdown when merge filter #20110 (haohuaijin)
- fix: Make
serialize_to_filetest cross platform #20147 (nuno-faria) - fix: regression of
dict_idin physical plan proto #20063 (kumarUjjawal) - fix: panic in ListingTableFactory when session is not SessionState #20139 (evangelisilva)
- fix: update comment on FilterPushdownPropagation #20040 (niebayes)
- fix: datatype_is_logically_equal for dictionaries #20153 (dd-annarose)
- fix: Avoid integer overflow in split_part() #20198 (neilconway)
- fix: Fix panic in regexp_like() #20200 (neilconway)
- fix: Handle NULL inputs correctly in find_in_set() #20209 (neilconway)
- fix: Ensure columns are casted to the correct names with Unions #20146 (nuno-faria)
- fix: Avoid assertion failure on divide-by-zero #20216 (neilconway)
- fix: Throw coercion error for
LIKEoperations for nested types. #20212 (jonathanc-n) - fix: disable dynamic filter pushdown for non min/max aggregates #20279 (notashes)
- fix: Avoid integer overflow in substr() #20199 (neilconway)
- fix: Fix scalar broadcast for to_timestamp() #20224 (neilconway)
- fix: Add integer check for bitwise coercion #20241 (Acfboy)
- fix: percentile_cont interpolation causes NaN for f16 input #20208 (kumarUjjawal)
- fix: validate inter-file ordering in eq_properties() #20329 (adriangb)
- fix: update filter predicates for min/max aggregates only if bounds change #20380 (notashes)
- fix: Handle Utf8View and LargeUtf8 separators in concat_ws #20361 (neilconway)
- fix: HashJoin panic with dictionary-encoded columns in multi-key joins #20441 (Tim-53)
- fix: handle out of range errors in DATE_BIN instead of panicking #20221 (mishop-15)
- fix: prevent duplicate alias collision with user-provided __datafusion_extracted names #20432 (adriangb)
- fix: SortMergeJoin don't wait for all input before emitting #20482 (rluvaton)
- fix:
cardinality()of an empty array should be zero #20533 (neilconway) - fix: Unaccounted spill sort in row_hash #20314 (EmilyMatt)
- fix: IS NULL panic with invalid function without input arguments #20306 (Acfboy)
- fix: handle empty delimiter in split_part (closes #20503) #20542 (gferrate)
- fix(substrait): Correctly parse field references in subqueries #20439 (neilconway)
- fix: increase ROUND decimal precision to prevent overflow truncation #19926 (kumarUjjawal)
- fix: Fix
array_to_stringwith columnar third arg #20536 (neilconway) - fix: Fix and Refactor Spark
shufflefunction #20484 (erenavsarogullari)
Documentation updates:
- perfect hash join #19411 (UBarney)
- docs: Fix two small issues in introduction.md #19712 (AdamGS)
- docs: Refine Communication documentation to highlight Discord #19714 (alamb)
- chore(deps): bump maturin from 1.10.2 to 1.11.5 in /docs #19740 (dependabot[bot])
- chore: remove LZO Parquet compression #19726 (kumarUjjawal)
- Update 52.0.0 release version number and changelog #19767 (xudong963)
- Update the upgrading.md #19769 (xudong963)
- chore: update copyright notice year #19758 (Jefffrey)
- doc: Add an auto-generated dependency graph for internal crates #19280 (2010YOUY01)
- Docs: Fix some links in docs #19834 (alamb)
- Docs: add additional links to blog posts #19833 (alamb)
- Ensure null inputs to array setop functions return null output #19683 (Jefffrey)
- chore(deps): bump sphinx from 8.2.3 to 9.1.0 in /docs #19647 (dependabot[bot])
- Fix struct casts to align fields by name (prevent positional mis-casts) #19674 (kosiew)
- chore(deps): bump setuptools from 80.9.0 to 80.10.1 in /docs #19988 (dependabot[bot])
- minor: Fix doc about
write_batch_size#19979 (nuno-faria) - Fix broken links in the documentation #19964 (alamb)
- minor: Add favicon #20000 (nuno-faria)
- docs: Fix some broken / missing links in the DataFusion documentation #19958 (alamb)
- chore(deps): bump setuptools from 80.10.1 to 80.10.2 in /docs #20022 (dependabot[bot])
- docs: Automatically update DataFusion version in docs #20001 (nuno-faria)
- docs: update data_types.md to reflect current Arrow type mappings #20072 (karuppuchamysuresh)
- Runs-on for
linux-build-libandlinux-test(2X faster CI) #20107 (blaginin) - Disallow positional struct casting when field names don’t overlap #19955 (kosiew)
- docs: fix docstring formatting #20158 (Jefffrey)
- Break upgrade guides into separate pages #20183 (mishop-15)
- Better document the relationship between
FileFormat::projection/FileFormat::filterandFileScanConfig::Statistics#20188 (alamb) - Document the relationship between FileFormat::projection / FileFormat::filter and FileScanConfig::output_ordering #20196 (alamb)
- More documentation on
FileSource::table_schemaandFileSource::projection#20242 (alamb) - chore(deps): bump setuptools from 80.10.2 to 82.0.0 in /docs #20255 (dependabot[bot])
- docs: fix typos and improve wording in README #20301 (iampratap7997-dot)
- Reduce ExtractLeafExpressions optimizer overhead with fast pre-scan #20341 (adriangb)
- chore(deps): bump maturin from 1.11.5 to 1.12.2 in /docs #20400 (dependabot[bot])
- Migrate Python usage to uv workspace #20414 (adriangb)
- test: Extend Spark Array functions:
array_repeat,shuffleandslicetest coverage #20420 (erenavsarogullari) - Runs-on for more actions #20274 (blaginin)
- docs: Document that adding new optimizer rules are expensive #20348 (alamb)
- add redirect for old upgrading.html URL to fix broken changelog links #20582 (mishop-15)
- Upgrade DataFusion to arrow-rs/parquet 58.0.0 /
object_store0.13.0 #19728 (alamb) - Document guidance on how to evaluate breaking API changes #20584 (alamb)
- [branch-53] chore: prepare 53 release #20649 (comphead)
Other:
- [branch-53] chore: Add branch protection (comphead)
- Add a protection to release candidate branch 52 #19660 (xudong963)
- Downgrade aws-smithy-runtime, update
rust_decimal, ignore RUSTSEC-2026-0001 to get clean CI #19657 (alamb) - Update dependencies #19667 (alamb)
- Refactor PartitionedFile: add ordering field and new_from_meta constructor #19596 (adriangb)
- Remove coalesce batches rule and deprecate CoalesceBatchesExec #19622 (feniljain)
- Perf: Optimize
substring_indexvia single-byte fast path and direct indexing #19590 (lyne7-sc) - refactor: Use
Signature::coerciblefor isnan/iszero #19604 (kumarUjjawal) - Parquet: Push down supported list predicates (array_has/any/all) during decoding #19545 (kosiew)
- Remove dependency on
rust_decimal, remove ignore ofRUSTSEC-2026-0001#19666 (alamb) - Store example data directly inside the datafusion-examples (#19141) #19319 (cj-zhukov)
- minor: More comments to
ParquetOpener::open()#19677 (2010YOUY01) - Feat: Allow pow with negative & non-integer exponent on decimals #19369 (Yuvraj-cyborg)
- chore(deps): bump taiki-e/install-action from 2.65.13 to 2.65.15 #19676 (dependabot[bot])
- Refactor cache APIs to support ordering information #19597 (adriangb)
- Record sort order when writing Parquet with WITH ORDER #19595 (adriangb)
- implement var distinct #19706 (thinh2)
- Fix TopK aggregation for UTF-8/Utf8View group keys and add safe fallback for unsupported string aggregates #19285 (kosiew)
- infer parquet file order from metadata and use it to optimize scans #19433 (adriangb)
- Add support for additional numeric types in to_timestamp functions #19663 (gokselk)
- Fix internal error "Physical input schema should be the same as the one converted from logical input schema." #18412 (alamb)
- fix(functions-aggregate): drain CORR state vectors for streaming aggregation #19669 (geoffreyclaude)
- chore: bump dependabot PR limit for cargo from 5 to 15 #19730 (Jefffrey)
- chore(deps): bump taiki-e/install-action from 2.65.15 to 2.66.1 #19741 (dependabot[bot])
- chore(deps): bump sqllogictest from 0.28.4 to 0.29.0 #19744 (dependabot[bot])
- chore(deps): bump blake3 from 1.8.2 to 1.8.3 #19746 (dependabot[bot])
- chore(deps): bump libc from 0.2.179 to 0.2.180 #19748 (dependabot[bot])
- chore(deps): bump async-compression from 0.4.36 to 0.4.37 #19742 (dependabot[bot])
- chore(deps): bump indexmap from 2.12.1 to 2.13.0 #19747 (dependabot[bot])
- Improve comment for predicate_cache_inner_records #19762 (xudong963)
- Fix dynamic filter is_used function #19734 (LiaCastaneda)
- slt: Add test for REE arrays in group by #19763 (brancz)
- Fix run_tpcds data dir #19771 (gabotechs)
- chore(deps): bump taiki-e/install-action from 2.66.1 to 2.66.2 #19778 (dependabot[bot])
- Include .proto files in datafusion-proto distribution #19490 (DarkWanderer)
- Simplify
expr = L1 AND expr != L2toexpr = L1whenL1 != L2#19731 (simonvandel) - chore(deps): bump flate2 from 1.1.5 to 1.1.8 #19780 (dependabot[bot])
- Upgrade DataFusion to arrow-rs/parquet 57.2.0 #19355 (alamb)
- Expose Spilling Progress Interface in DataFusion #19708 (xudong963)
- dev: Add a script to auto fix all lint violations #19560 (2010YOUY01)
- refactor: Optimize
required_columnsfromBTreeSettoVecin structPushdownChecker#19678 (kumarUjjawal) - Revert Workround for Empty FixedSizeBinary Values Buffer After arrow-rs Upgrade #19801 (tobixdev)
- chore(deps): bump taiki-e/install-action from 2.66.2 to 2.66.3 #19802 (dependabot[bot])
- Add Reproducer for Issues with LEFT joins on Fixed Size Binary Columns #19800 (tobixdev)
- Improvements to
list_files_cachetable function #19703 (alamb) - Issue 19781 : Internal error: Assertion failed: !self.finished: LimitedBatchCoalescer #19785 (bert-beyondloops)
- physical plan: add
reset_plan_states, plan re-use benchmark #19806 (askalt) - chore(deps): bump actions/setup-node from 6.1.0 to 6.2.0 #19825 (dependabot[bot])
- Use correct setting for click bench queries in sql_planner benchmark #19835 (alamb)
- chore(deps): bump taiki-e/install-action from 2.66.3 to 2.66.5 #19824 (dependabot[bot])
- chore: refactor scalarvalue/encoding using available upstream arrow-rs methods #19797 (Jefffrey)
- Refactor Spark
date_add/date_sub/bitwise_notto remove unnecessary scalar arg check #19473 (Jefffrey) - Add BatchAdapter to simplify using PhysicalExprAdapter / Projector to map RecordBatch between schemas #19716 (adriangb)
- [Minor] Reuse indices buffer in RepartitionExec #19775 (Dandandan)
- Fix(optimizer): Make
EnsureCooperativeoptimizer idempotent under multiple runs #19757 (danielhumanmod) - Allow dropping qualified columns #19549 (ntjohnson1)
- Doc: Add more blog links to doc comments #19837 (alamb)
- datafusion/common: Add support for hashing ListView arrays #19814 (brancz)
- Project sort expressions in StreamingTable #19719 (timsaucer)
- Fix grouping set subset satisfaction #19853 (freakyzoidberg)
- Spark date part #19823 (cht42)
- chore(deps): bump wasm-bindgen-test from 0.3.56 to 0.3.58 #19898 (dependabot[bot])
- chore(deps): bump tokio-postgres from 0.7.15 to 0.7.16 #19899 (dependabot[bot])
- chore(deps): bump postgres-types from 0.2.11 to 0.2.12 #19902 (dependabot[bot])
- chore(deps): bump insta from 1.46.0 to 1.46.1 #19901 (dependabot[bot])
- chore(deps): bump taiki-e/install-action from 2.66.5 to 2.66.7 #19883 (dependabot[bot])
- Consolidate cte_quoted_reference.slt into cte.slt #19862 (AnjaliChoudhary99)
- Disable failing
array_unionedge-case with nested null array #19904 (Jefffrey) - chore(deps): bump the proto group across 1 directory with 5 updates #19745 (dependabot[bot])
- test(wasmtest): enable compression feature for wasm build #19860 (ChanTsune)
- Feat : added truncate table support #19633 (Nachiket-Roy)
- Remove UDAF manual Debug impls and simplify signatures #19727 (Jefffrey)
- chore(deps): bump thiserror from 2.0.17 to 2.0.18 #19900 (dependabot[bot])
- Include license and notice files in more crates #19913 (ankane)
- chore(deps): bump actions/setup-python from 6.1.0 to 6.2.0 #19935 (dependabot[bot])
- Coerce expressions to udtf #19915 (XiangpengHao)
- Fix trailing whitespace in CROSS JOIN logical plan formatting #19936 (mkleen)
- chore(deps): bump chrono from 0.4.42 to 0.4.43 #19897 (dependabot[bot])
- Improve error message when string functions receive Binary types #19819 (lemorage)
- Refactor ListArray hashing to consider only sliced values #19500 (Jefffrey)
- feat(datafusion-spark): implement spark compatible
unhexfunction #19909 (lyne7-sc) - Support API for "pre-image" for pruning predicate evaluation #19722 (sdf-jkl)
- Support LargeUtf8 as partition column #19942 (paleolimbot)
- chore(deps): bump actions/checkout from 6.0.1 to 6.0.2 #19953 (dependabot[bot])
- preserve FilterExec batch size during ser/de #19960 (askalt)
- Add struct pushdown query benchmark and projection pushdown tests #19962 (adriangb)
- Improve error messages with nicer formatting of Date and Time types #19954 (emilk)
- export
SessionState::register_catalog_list(...)#19925 (askalt) - Change GitHub actions dependabot schedule to weekly #19981 (Jefffrey)
- chore(deps): bump taiki-e/install-action from 2.66.7 to 2.67.9 #19987 (dependabot[bot])
- chore(deps): bump quote from 1.0.43 to 1.0.44 #19992 (dependabot[bot])
- chore(deps): bump nix from 0.30.1 to 0.31.1 #19991 (dependabot[bot])
- chore(deps): bump sysinfo from 0.37.2 to 0.38.0 #19990 (dependabot[bot])
- chore(deps): bump uuid from 1.19.0 to 1.20.0 #19993 (dependabot[bot])
- minor: pull
uuidinto workspace dependencies #19997 (Jefffrey) - Fix ClickBench EventDate handling by casting UInt16 days-since-epoch to DATE via
hitsview #19881 (kosiew) - refactor: extract pushdown test utilities to shared module #20010 (adriangb)
- chore(deps): bump taiki-e/install-action from 2.67.9 to 2.67.13 #20020 (dependabot[bot])
- add more projection pushdown slt tests #20015 (adriangb)
- minor: Move metric
page_index_rows_prunedto verbose level inEXPLAIN ANALYZE#20026 (2010YOUY01) - Tweak
adapter serializationexample #20035 (adriangb) - Simplify wait_complete function #19937 (LiaCastaneda)
- [main] Update version to
52.1.0(#19878) #20028 (alamb) - Fix/parquet opener page index policy #19890 (aviralgarg05)
- minor: add tests for coercible signature considering nulls/dicts/ree #19459 (Jefffrey)
- Enforce
clippy::allow_attributesglobally across workspace #19576 (Jefffrey) - Fix constant value from stats #20042 (gabotechs)
- Simplify Spark
sha2implementation #19475 (Jefffrey) - Further refactoring of type coercion function code #19603 (Jefffrey)
- replace private is_volatile_expression_tree with equivalent public is_volatile #20056 (adriangb)
- Improve documentation for ScalarUDFImpl::preimage #20008 (alamb)
- Use BooleanBufferBuilder rather than Vec in ArrowBytesViewMap #20064 (etk18)
- chore: Add microbenchmark (compared to ExprOrExpr) #20076 (CuteChuanChuan)
- Minor: update tests in limit_pushdown.rs to insta #20066 (alamb)
- Reduce number of traversals per node in
PhysicalExprSimplifier#20082 (AdamGS) - Automatically generate examples documentation adv (#19294) #19750 (cj-zhukov)
- Implement preimage for floor function to enable predicate pushdown #20059 (devanshu0987)
- Refactor
iszero()andisnan()to accept all numeric types #20093 (kumarUjjawal) - Use return_field_from_args in information schema and date_trunc #20079 (AndreaBozzo)
- Preserve PhysicalExpr graph in proto round trip using Arc pointers as unique identifiers #20037 (adriangb)
- add ability to customize tokens in parser #19978 (askalt)
- Adjust
case_when DivideByZeroProtectionbenchmark so that "percentage of zeroes" corresponds to "number of times protection is needed" #20105 (pepijnve) - refactor: Rename
FileSource::try_reverse_outputtoFileSource::try_pushdown_sort#20043 (kumarUjjawal) - Improve memory accounting for ArrowBytesViewMap #20077 (vigneshsiva11)
- chore: reduce production noise by using
debugmacro #19885 (Standing-Man) - chore(deps): bump taiki-e/install-action from 2.67.13 to 2.67.18 #20124 (dependabot[bot])
- chore(deps): bump actions/setup-node from 4 to 6 #20125 (dependabot[bot])
- chore(deps): bump tonic from 0.14.2 to 0.14.3 #20127 (dependabot[bot])
- chore(deps): bump insta from 1.46.1 to 1.46.3 #20129 (dependabot[bot])
- chore(deps): bump flate2 from 1.1.8 to 1.1.9 #20130 (dependabot[bot])
- chore(deps): bump clap from 4.5.54 to 4.5.56 #20131 (dependabot[bot])
- Add BufferExec execution plan #19760 (gabotechs)
- Optimize the evaluation of date_part() == when pushed down #19733 (sdf-jkl)
- chore(deps): bump bytes from 1.11.0 to 1.11.1 #20141 (dependabot[bot])
- Make session state builder clonable #20136 (askalt)
- chore: remove datatype check functions in favour of upstream versions #20104 (Jefffrey)
- Add Decimal support for floor preimage #20099 (devanshu0987)
- Add more struct pushdown tests and planning benchmark #20143 (adriangb)
- Add RepartitionExec test to projection_pushdown.slt #20156 (adriangb)
- chore: Fix typos in comments #20157 (neilconway)
- Fix
array_repeathandling of null count values #20102 (lyne7-sc) - Refactor schema rewriter: remove lifetimes, extract column/cast helpers, add mismatch coverage #20166 (kosiew)
- chore(deps): bump time from 0.3.44 to 0.3.47 #20172 (dependabot[bot])
- chore(deps-dev): bump webpack from 5.94.0 to 5.105.0 in /datafusion/wasmtest/datafusion-wasm-app #20178 (dependabot[bot])
- Fix Arrow Spill Underrun #20159 (cetra3)
- nom parser instead of ad-hoc in examples #20122 (cj-zhukov)
- fix(datafusion-cli): solve row count bug adding
saturating_addto prevent potential overflow #20185 (dariocurr) - Enable inlist support for preimage #20051 (sdf-jkl)
- unify the prettier versions #20167 (cj-zhukov)
- chore: Unbreak doctest CI #20218 (neilconway)
- Minor: verify plan output and unique field names #20220 (alamb)
- Add more tests to projection_pushdown.slt #20236 (adriangb)
- Add Expr::Alias passthrough to Expr::placement() #20237 (adriangb)
- Make PushDownFilter and CommonSubexprEliminate aware of Expr::placement #20239 (adriangb)
- Refactor example metadata parsing utilities(#20204) #20233 (cj-zhukov)
- add module structure and unit tests for expression pushdown logical optimizer #20238 (adriangb)
- repro and disable dyn filter for preserve file partitions #20175 (gene-bordegaray)
- chore(deps): bump taiki-e/install-action from 2.67.18 to 2.67.27 #20254 (dependabot[bot])
- chore(deps): bump sysinfo from 0.38.0 to 0.38.1 #20261 (dependabot[bot])
- chore(deps): bump clap from 4.5.56 to 4.5.57 #20265 (dependabot[bot])
- chore(deps): bump tempfile from 3.24.0 to 3.25.0 #20262 (dependabot[bot])
- chore(deps): bump regex from 1.12.2 to 1.12.3 #20260 (dependabot[bot])
- chore(deps): bump criterion from 0.8.1 to 0.8.2 #20258 (dependabot[bot])
- chore(deps): bump regex-syntax from 0.8.8 to 0.8.9 #20264 (dependabot[bot])
- chore(deps): bump aws-config from 1.8.12 to 1.8.13 #20263 (dependabot[bot])
- chore(deps): bump async-compression from 0.4.37 to 0.4.39 #20259 (dependabot[bot])
- Support JSON arrays reader/parse for datafusion #19924 (zhuqi-lucas)
- chore: Add confirmation before tarball is released #20207 (milenkovicm)
- FilterExec should remap indices of parent dynamic filters #20286 (jackkleeman)
- Clean up expression placement UDF usage in tests #20272 (adriangb)
- chore(deps): bump the arrow-parquet group with 7 updates #20256 (dependabot[bot])
- Cleanup example metadata parsing utilities(#20251) #20252 (cj-zhukov)
- Add
StructArrayandRunArraybenchmark tests towith_hashes#20182 (notashes) - Add protoc support for ArrowScanExecNode (#20280) #20284 (JoshElkind)
- Improve ExternalSorter ResourcesExhausted Error Message #20226 (erenavsarogullari)
- Introduce ProjectionExprs::unproject_exprs/project_exprs and improve docs #20193 (alamb)
- chore: Remove "extern crate criterion" in benches #20299 (neilconway)
- Support pushing down empty projections into joins #20191 (jackkleeman)
- chore: change width_bucket buckets parameter from i32 to i64 #20330 (comphead)
- fix null handling for
nanvl& implement fast path #20205 (kumarUjjawal) - unify the prettier version adv(#20024) #20311 (cj-zhukov)
- chore: Make memchr a workspace dependency #20345 (neilconway)
- feat(datafusion-cli): enhance CLI helper with default hint #20310 (dariocurr)
- Adds support for ANSI mode in negative function #20189 (SubhamSinghal)
- Support parent dynamic filters for more join types #20192 (jackkleeman)
- Fix incorrect
SortExecremoval beforeAggregateExec(option 2) #20247 (alamb) - Fix
try_shrinknot freeing back to pool #20382 (cetra3) - chore(deps): bump sysinfo from 0.38.1 to 0.38.2 #20411 (dependabot[bot])
- chore(deps): bump indicatif from 0.18.3 to 0.18.4 #20410 (dependabot[bot])
- chore(deps): bump liblzma from 0.4.5 to 0.4.6 #20409 (dependabot[bot])
- chore(deps): bump aws-config from 1.8.13 to 1.8.14 #20407 (dependabot[bot])
- chore(deps): bump tonic from 0.14.3 to 0.14.4 #20406 (dependabot[bot])
- chore(deps): bump clap from 4.5.57 to 4.5.59 #20404 (dependabot[bot])
- chore(deps): bump sqllogictest from 0.29.0 to 0.29.1 #20405 (dependabot[bot])
- chore(deps): bump env_logger from 0.11.8 to 0.11.9 #20402 (dependabot[bot])
- chore(deps): bump actions/stale from 10.1.1 to 10.2.0 #20397 (dependabot[bot])
- chore(deps): bump uuid from 1.20.0 to 1.21.0 #20401 (dependabot[bot])
- [Minor] Update object_store to 0.12.5 #20378 (Dandandan)
- chore(deps): bump syn from 2.0.114 to 2.0.116 #20399 (dependabot[bot])
- chore(deps): bump taiki-e/install-action from 2.67.27 to 2.68.0 #20398 (dependabot[bot])
- chore: Cleanup returning null arrays #20423 (neilconway)
- chore: fix labeler for
datafusion-functions-nested#20442 (comphead) - build: update Rust toolchain version from 1.92.0 to 1.93.0 in
rust-toolchain.toml#20309 (dariocurr) - chore: Cleanup "!is_valid(i)" -> "is_null(i)" #20453 (neilconway)
- refactor: Extract sort-merge join filter logic into separate module #19614 (viirya)
- Implement FFI table provider factory #20326 (davisp)
- bench: Add criterion benchmark for sort merge join #20464 (andygrove)
- chore: group minor dependencies into single PR #20457 (comphead)
- chore(deps): bump taiki-e/install-action from 2.68.0 to 2.68.6 #20467 (dependabot[bot])
- chore(deps): bump astral-sh/setup-uv from 6.1.0 to 7.3.0 #20468 (dependabot[bot])
- chore(deps): bump the all-other-cargo-deps group with 6 updates #20470 (dependabot[bot])
- chore(deps): bump testcontainers-modules from 0.14.0 to 0.15.0 #20471 (dependabot[bot])
- [Minor] Use buffer_unordered #20462 (Dandandan)
- bench: Add IN list benchmarks for non-constant list expressions #20444 (zhangxffff)
- feat(memory-tracking): implement arrow_buffer::MemoryPool for MemoryPool #18928 (notfilippo)
- chore: Avoid build fails on MinIO rate limits #20472 (comphead)
- chore: Add end-to-end benchmark for array_agg, code cleanup #20496 (neilconway)
- Upgrade to sqlparser 0.61.0 #20177 (alamb)
- Switch to the latest Mac OS #20510 (blaginin)
- Fix name tracker #19856 (xanderbailey)
- Runs-on for extended CI checks #20511 (blaginin)
- chore(deps): bump strum from 0.27.2 to 0.28.0 #20520 (dependabot[bot])
- chore(deps): bump taiki-e/install-action from 2.68.6 to 2.68.8 #20518 (dependabot[bot])
- chore(deps): bump the all-other-cargo-deps group with 2 updates #20519 (dependabot[bot])
- Make
custom_file_castsexample schema nullable to allow nullidvalues during casting #20486 (kosiew) - Add support for FFI config extensions #19469 (timsaucer)
- chore: Cleanup code to use
repeat_nin a few places #20527 (neilconway) - chore(deps): bump strum_macros from 0.27.2 to 0.28.0 #20521 (dependabot[bot])
- chore: Replace
matches!on fieldless enums with==#20525 (neilconway) - Update comments on OptimizerRule about function name matching #20346 (alamb)
- Fix incorrect regex pattern in regex_replace_posix_groups #19827 (GaneshPatil7517)
- Improve
HashJoinExecBuilderto save state from previous fields #20276 (askalt) - [Minor] Fix error messages for
shrinkandtry_shrink#20422 (hareshkh) - Fix physical expr adapter to resolve physical fields by name, not column index #20485 (kosiew)
- [fix] Add type coercion from NULL to Interval to make date_bin more postgres compatible #20499 (LiaCastaneda)
- Clamp early aggregation emit to the sort boundary when using partial group ordering #20446 (jackkleeman)
- Split
push_down_filter.sltinto standalone sqllogictest files to reduce long-tail runtime #20566 (kosiew) - Add deterministic per-file timing summary to sqllogictest runner #20569 (kosiew)
- chore: Enable workspace lint for all workspace members #20577 (neilconway)
- Fix serde of window lead/lag defaults #20608 (avantgardnerio)
- [branch-53] fix: make the
sqlfeature truly optional (#20625) #20680 (linhr) - [53] fix: Fix bug in
array_hasscalar path with sliced arrays (#20677) #20700 (neilconway) - [branch-53] fix: Return
probe_side.len()for RightMark/Anti count(*) queries (#… #20726 (jonathanc-n) - [branch-53] FFI_TableOptions are using default values only #20722 (timsaucer)
- chore(deps): pin substrait to
0.62.2#20827 (milenkovicm) - chore(deps): pin substrait version #20848 (milenkovicm)
- [branch-53] Fix repartition from dropping data when spilling (#20672) #20792 (xanderbailey)
- [branch-53] fix:
HashJoinpanic with String dictionary keys (don't flatten keys) (#20505) #20791 (alamb) - [branch-53] cli: Fix datafusion-cli hint edge cases (#20609) #20887 (comphead)
- [branch-53] perf: Optimize
to_charto allocate less, fix NULL handling (#20635) #20885 (neilconway) - [branch-53] fix: interval analysis error when have two filterexec that inner filter proves zero selectivity (#20743) #20882 (haohuaijin)
- [branch-53] correct parquet leaf index mapping when schema contains struct cols (#20698) #20884 (friendlymatthew)
- [branch-53] ser/de fetch in FilterExec (#20738) #20883 (haohuaijin)
- [branch-53] fix: use try_shrink instead of shrink in try_resize (#20424) #20890 (ariel-miculas)
- [branch-53] Reattach parquet metadata cache after deserializing in datafusion-proto (#20574) #20891 (nathanb9)
- [branch-53] fix: do not recompute hash join exec properties if not required (#20900) #20903 (askalt)
- [branch-53] fix(spark): handle divide-by-zero in Spark
mod/pmodwith ANSI mode support (#20461) #20896 (davidlghellin) - [branch-53] fix: Provide more generic API for the capacity limit parsing (#20372) #20893 (erenavsarogullari)
- [branch-53] fix: sqllogictest cannot convert to Substrait (#19739) #20897 (kumarUjjawal)
- [branch-53] Fix DELETE/UPDATE filter extraction when predicates are pushed down into TableScan (#19884) #20898 (kosiew)
- [branch-53] fix: preserve None projection semantics across FFI boundary in ForeignTableProvider::scan (#20393) #20895 (Kontinuation)
- [branch-53] Fix FilterExec converting Absent column stats to Exact(NULL) (#20391) #20892 (fwojciec)
- [branch-53] backport: Support Spark
array_containsbuiltin function (#20685) #20914 (comphead) - [branch-53] Fix duplicate group keys after hash aggregation spill (#20724) (#20858) #20918 (gboucher90)
- [branch-53] fix: SanityCheckPlan error with window functions and NVL filter (#20231) #20932 (EeshanBembi)
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
73 dependabot[bot]
37 Neil Conway
32 Kumar Ujjawal
28 Andrew Lamb
26 Adrian Garcia Badaracco
21 Jeffrey Vo
13 cht42
11 Albert Skalt
11 kosiew
10 lyne
8 Nuno Faria
8 Oleks V
7 Sergey Zhukov
7 xudong.w
6 Daniël Heres
6 Huaijin
5 Adam Gutglick
5 Gabriel
5 Jonathan Chen
4 Andy Grove
4 Dmitrii Blaginin
4 Eren Avsarogullari
4 Jack Kleeman
4 notashes
4 theirix
4 Tim Saucer
4 Yongting You
3 dario curreri
3 feniljain
3 Kazantsev Maksim
3 Kosta Tarasov
3 Liang-Chi Hsieh
3 Lía Adriana
3 Marko Milenković
3 mishop-15
3 Yu-Chuan Hung
2 Acfboy
2 Alan Tang
2 David López
2 Devanshu
2 Frederic Branczyk
2 Ganesh Patil
2 Heran Lin
2 jizezhang
2 Miao
2 Michael Kleen
2 niebayes
2 Pepijn Van Eeckhoudt
2 Peter L
2 Subham Singhal
2 Tobias Schwarzinger
2 UBarney
2 Xander
2 Yuvraj
2 Zhang Xiaofeng
1 Andrea Bozzo
1 Andrew Kane
1 Anjali Choudhary
1 Anna-Rose Lescure
1 Ariel Miculas-Trif
1 Aryan Anand
1 Aviral Garg
1 Bert Vermeiren
1 Brent Gardner
1 ChanTsune
1 comphead
1 danielhumanmod
1 Dewey Dunnington
1 discord9
1 Divyansh Pratap Singh
1 Eesh Sagar Singh
1 EeshanBembi
1 Emil Ernerfeldt
1 Emily Matheys
1 Eric Chang
1 Evangeli Silva
1 Filip Wojciechowski
1 Filippo
1 Gabriel Ferraté
1 Gene Bordegaray
1 Geoffrey Claude
1 Goksel Kabadayi
1 Guillaume Boucher
1 Haresh Khanna
1 hsiang-c
1 iamthinh
1 Josh Elkind
1 karuppuchamysuresh
1 Kristin Cowalcijk
1 Mason
1 Matt Butrovich
1 Matthew Kim
1 Mikhail Zabaluev
1 Mohit rao
1 nathan
1 Nathaniel J. Smith
1 Nick
1 Oleg V. Kozlyuk
1 Paul J. Davis
1 Pierre Lacave
1 pmallex
1 Qi Zhu
1 Raz Luvaton
1 Rosai
1 Ruihang Xia
1 Samyak Sarnayak
1 Sergio Esteves
1 Simon Vandel Sillesen
1 Siyuan Huang
1 Tim-53
1 Tushar Das
1 Vignesh
1 Xiangpeng Hao
1 XL Liang
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.