Skip to content

[GLUTEN-6887][VL] Daily Update Velox Version (2026_04_08)#11891

Open
GlutenPerfBot wants to merge 4 commits intoapache:mainfrom
GlutenPerfBot:tagging-2026_04_08
Open

[GLUTEN-6887][VL] Daily Update Velox Version (2026_04_08)#11891
GlutenPerfBot wants to merge 4 commits intoapache:mainfrom
GlutenPerfBot:tagging-2026_04_08

Conversation

@GlutenPerfBot
Copy link
Copy Markdown
Contributor

@GlutenPerfBot GlutenPerfBot commented Apr 8, 2026

Upstream Velox's New Commits:

d7891436c by Masha Basmanova, fix: Skip custom type coercion for parameterized types (17064)
fbda33022 by Han Yan, feat(dwio): Add BufferPool for reusing cached BufferPtr objects (17042)
6e5224164 by Krishna Pai, fix(ci): Add OIDC permission and unrestrict Bash for CI failure analysis (17061)
b43a8c892 by Peter Enescu, feat: Allow EncodedVectorCopy to generate FlatMapVector in non-NULL vectors (16161)
1355dd3ab by Pratik Pugalia, fix: GetTimestampFunction recompiling datetime format on every row (17037)
c7d5b0104 by Krishna Pai, fix(ci): Use bash parameter expansion for multiline metadata substitution (17058)
472701f4a by Pratik Pugalia, fix: Remove per-query timeout in TableEvolutionFuzzer (17046)
933dd4e10 by Kevin Wilfong, fix: Remove unnecessary output_ field from IndexLookupJoin (17043)
034f86cb0 by Pratik Pugalia, fix: Increase Presto request timeout for parallel fuzzer runs (17049)
ba21e5661 by Ke Wang, fix: Allow IoStats to override storageReadBytes in getRuntimeStats (17036)
39d3494de by Masha Basmanova, fix: Change array_sort comparator lambda return type from bigint to integer (17030)
682c4a8e7 by Artem Selishchev, fix: Catch exceptions from TaskCompletionListeners in Task::onTaskCompletion() (17051)
c3f34536b by Varun Srinivas, fix(remote): Use VELOX_USER_FAIL for remote error re-throwing (16903)
a7d9036ae by Krishna Pai, feat(ci): Use Claude to analyze CI failures and post diagnostic PR comments (17039)
e9d03d8b3 by Kk Pulla, fix(exec): Fix data race in OutputBuffer::getUtilization and isOverutilized (17009)
fe24ae068 by Pratik Pugalia, fix: BetweenFunction to handle NaN with correct Spark semantics (17025)
dd58b1536 by Rui Mo, fix: Change count metric from signed to unsigned (int64_t -> uint64_t) (16989)
303bba60c by Mahadevuni Naveen Kumar, refactor: Revert iceberg data file statistics changes (16999)
a649489c1 by Simon Eves, fix(cudf): Fix failure in ToCudfSelectionTest.zeroColumnCountConstantFallsBack (17031)
95894c30a by Christian Zentgraf, feat(s3): Add support for hive.s3.min-part-size when writing (16935)
01b86e20d by Konjac Huang, refactor: refactor filebased datasource (16914)
7ea56098a by Rajeev Singh, feat(expr-eval): Fix flaky adaptiveCpuSamplingPerFunctionRates test (17002)
37e897b30 by joey.ljy, test: Use VectorFuzzer for random RowVector generation in `semiJoinDeduplicateResetCapacity` test (15748)
509ab8fd2 by Chengcheng Jin, feat(cudf): Add config to set timestamp unit (16769)
338598815 by Masha Basmanova, refactor: Migrate production code to ConnectorRegistry API and deprecate free functions (16986)
b65b5c1c5 by Pratik Pugalia, fix: TempFilePath fd_ member initialization order bug causing flaky test failures (17020)
95ce76125 by Rui Mo, test: Extend cast tests in the expression fuzzer test (16990)
4fb74c52f by Miguel Blanco Godón, feat: Support reading PARQUET files with zero offset (16456)
1dfcfbbcc by Kent Yao, fix(sparksql): Default ignoreNulls to true for collect_set backward compatibility (16947)
65800681f by Masha Basmanova, refactor: Migrate test and fuzzer code to ConnectorRegistry API (16985)
4acf9bb28 by Masha Basmanova, feat: Add ScopedRegistry and query-scoped connector lookups (16982)
4a966b2ef by Pratik Pugalia, Fix: SIGSEGV in AggregationFuzzer when reference query returns empty result vector (17018)
4bbea83dc by Krishna Pai, feat(ci): Add workflow_run workflow for posting CI failure comments on PRs (17022)
d14cd0c27 by Matt Gara, fix(cudf): Enable GPU execution for count(*), count(column), and count(NULL) (16522)
084f2221a by Rui Mo, misc: Make `DirectBufferedInput` clone fields protected (16979)
d9c1b6ea3 by Krishna Pai, build(ci): Grant pull-requests write permission to Linux build workflow (17021)
cff0a6e36 by David Reveman, build: Update perfetto SDK to v54 (17004)
7534c2e47 by Bradley Dice, fix(cudf): Refactor CudfToVelox output batching to avoid O(n) D->H syncs (16620)
f736ec1d8 by Masha Basmanova, refactor: Add thread safety to connector registry (16978)
b79f0d188 by Krishna Pai, feat(ci): Add flaky test retry and JUnit XML reporting (17003)
cf7d5a7b7 by Natasha Sehgal, feat: Add pmod (positive modulo) function to Presto SQL (17008)
388105ba3 by Bradley Dice, fix(build): Add missing GTest::gmock link to velox_hive_connector_test (16996)
9d7a2ee24 by Andrii Rosa, fix: support NaN and Inf serialization for Variant (17007)

velox_branch: https://github.com/IBM/velox/commits/dft-2026_04_08

Related issue: #6887

Upstream Velox's New Commits:
d7891436c by Masha Basmanova, fix: Skip custom type coercion for parameterized types (#17064)
fbda33022 by Han Yan, feat(dwio): Add BufferPool for reusing cached BufferPtr objects (#17042)
6e5224164 by Krishna Pai, fix(ci): Add OIDC permission and unrestrict Bash for CI failure analysis (#17061)
b43a8c892 by Peter Enescu, feat: Allow EncodedVectorCopy to generate FlatMapVector in non-NULL vectors (#16161)
1355dd3ab by Pratik Pugalia, fix: GetTimestampFunction recompiling datetime format on every row (#17037)
c7d5b0104 by Krishna Pai, fix(ci): Use bash parameter expansion for multiline metadata substitution (#17058)
472701f4a by Pratik Pugalia, fix: Remove per-query timeout in TableEvolutionFuzzer (#17046)
933dd4e10 by Kevin Wilfong, fix: Remove unnecessary output_ field from IndexLookupJoin (#17043)
034f86cb0 by Pratik Pugalia, fix: Increase Presto request timeout for parallel fuzzer runs (#17049)
ba21e5661 by Ke Wang, fix: Allow IoStats to override storageReadBytes in getRuntimeStats (#17036)
39d3494de by Masha Basmanova, fix: Change array_sort comparator lambda return type from bigint to integer (#17030)
682c4a8e7 by Artem Selishchev, fix: Catch exceptions from TaskCompletionListeners in Task::onTaskCompletion() (#17051)
c3f34536b by Varun Srinivas, fix(remote): Use VELOX_USER_FAIL for remote error re-throwing (#16903)
a7d9036ae by Krishna Pai, feat(ci): Use Claude to analyze CI failures and post diagnostic PR comments (#17039)
e9d03d8b3 by Kk Pulla, fix(exec): Fix data race in OutputBuffer::getUtilization and isOverutilized (#17009)
fe24ae068 by Pratik Pugalia, fix: BetweenFunction to handle NaN with correct Spark semantics (#17025)
dd58b1536 by Rui Mo, fix: Change count metric from signed to unsigned (int64_t -> uint64_t) (#16989)
303bba60c by Mahadevuni Naveen Kumar, refactor: Revert iceberg data file statistics changes (#16999)
a649489c1 by Simon Eves, fix(cudf): Fix failure in ToCudfSelectionTest.zeroColumnCountConstantFallsBack (#17031)
95894c30a by Christian Zentgraf, feat(s3): Add support for hive.s3.min-part-size when writing (#16935)
01b86e20d by Konjac Huang, refactor: refactor filebased datasource (#16914)
7ea56098a by Rajeev Singh, feat(expr-eval): Fix flaky adaptiveCpuSamplingPerFunctionRates test (#17002)
37e897b30 by joey.ljy, test: Use VectorFuzzer for random RowVector generation in `semiJoinDeduplicateResetCapacity` test (#15748)
509ab8fd2 by Chengcheng Jin, feat(cudf): Add config to set timestamp unit (#16769)
338598815 by Masha Basmanova, refactor: Migrate production code to ConnectorRegistry API and deprecate free functions (#16986)
b65b5c1c5 by Pratik Pugalia, fix: TempFilePath fd_ member initialization order bug causing flaky test failures (#17020)
95ce76125 by Rui Mo, test: Extend cast tests in the expression fuzzer test (#16990)
4fb74c52f by Miguel Blanco Godón, feat: Support reading PARQUET files with zero offset (#16456)
1dfcfbbcc by Kent Yao, fix(sparksql): Default ignoreNulls to true for collect_set backward compatibility (#16947)
65800681f by Masha Basmanova, refactor: Migrate test and fuzzer code to ConnectorRegistry API (#16985)
4acf9bb28 by Masha Basmanova, feat: Add ScopedRegistry and query-scoped connector lookups (#16982)
4a966b2ef by Pratik Pugalia, Fix: SIGSEGV in AggregationFuzzer when reference query returns empty result vector (#17018)
4bbea83dc by Krishna Pai, feat(ci): Add workflow_run workflow for posting CI failure comments on PRs (#17022)
d14cd0c27 by Matt Gara, fix(cudf): Enable GPU execution for count(*), count(column), and count(NULL) (#16522)
084f2221a by Rui Mo, misc: Make `DirectBufferedInput` clone fields protected (#16979)
d9c1b6ea3 by Krishna Pai, build(ci): Grant pull-requests write permission to Linux build workflow (#17021)
cff0a6e36 by David Reveman, build: Update perfetto SDK to v54 (#17004)
7534c2e47 by Bradley Dice, fix(cudf): Refactor CudfToVelox output batching to avoid O(n) D->H syncs (#16620)
f736ec1d8 by Masha Basmanova, refactor: Add thread safety to connector registry (#16978)
b79f0d188 by Krishna Pai, feat(ci): Add flaky test retry and JUnit XML reporting (#17003)
cf7d5a7b7 by Natasha Sehgal, feat: Add pmod (positive modulo) function to Presto SQL (#17008)
388105ba3 by Bradley Dice, fix(build): Add missing GTest::gmock link to velox_hive_connector_test (#16996)
9d7a2ee24 by Andrii Rosa, fix: support NaN and Inf serialization for Variant (#17007)

Signed-off-by: glutenperfbot <glutenperfbot@glutenproject-internal.com>
JkSelf and others added 2 commits April 9, 2026 10:02
Add ignoreNulls parameter to VeloxCollectList/VeloxCollectSet to support
Spark's RESPECT NULLS syntax (SPARK-55256). When ignoreNulls=false, null
elements are included in the collected array.

- VeloxCollect: conditionally skip nulls based on ignoreNulls parameter
- CollectRewriteRule: propagate ignoreNulls from Spark's CollectList/CollectSet
  via reflection (backward-compatible with Spark versions without ignoreNulls)
- ArrayType containsNull reflects the ignoreNulls setting

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JkSelf JkSelf force-pushed the tagging-2026_04_08 branch 3 times, most recently from 2bd6fa9 to f3db522 Compare April 9, 2026 15:42
…t_set/list

When aggregate functions have multiple signatures with the same intermediate
type (e.g., collect_set with 1-arg and 2-arg signatures), Velox registers
companion functions with suffix using generic type variables (e.g.,
collect_set_merge_extract_array_T). The Substrait layer was constructing
concrete type suffixes (e.g., array_row_VARCHAR_BIGINT_BIGINT_endrow) that
don't match.

Fix: After failing exact concrete suffix lookup, fall back to discovering
companion function names via getCompanionFunctionSignatures() API.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JkSelf JkSelf force-pushed the tagging-2026_04_08 branch from f3db522 to 40beb5e Compare April 10, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants