Skip to content

Conversation

@findinpath
Copy link
Contributor

@findinpath findinpath commented Jan 19, 2026

Description

If the parquet data file statistics contain more columns (e.g.: the partition columns) than the data columns from the Delta Lake table, an NPE is going to be thrown when encoding the min/max stats.
Flip the existing logic to build the stats based on the columns of the table instead of the stats of the parquet data file.

NOTE that this means that, when writing with Trino, the min/max stats will be missing the partition columns, even though the data files contain such stats.

Relevant stacktrace

Read timed out.)': [io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: Query failed (#20260116_092601_00281_2vc9a): Unable to write deletion vector file
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:119)
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
tests               | 	at io.trino.tests.product.utils.QueryExecutors$1.lambda$executeQuery$0(QueryExecutors.java:54)
tests               | 	at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243)
tests               | 	at dev.failsafe.Functions.lambda$get$0(Functions.java:46)
tests               | 	at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74)
tests               | 	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187)
tests               | 	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
tests               | 	at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112)
tests               | 	at io.trino.tests.product.utils.QueryExecutors$1.executeQuery(QueryExecutors.java:54)
tests               | 	at io.trino.tests.product.deltalake.TestDeltaLakeWriteDatabricksCompatibility.testCaseUpdateInPartition(TestDeltaLakeWriteDatabricksCompatibility.java:197)
tests               | 	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
tests               | 	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
tests               | 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
tests               | 	at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
tests               | 	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
tests               | 	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
tests               | 	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
tests               | 	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
tests               | 	at java.base/java.lang.Thread.run(Thread.java:1447)
tests               | Caused by: java.sql.SQLException: Query failed (#20260116_092601_00281_2vc9a): Unable to write deletion vector file
tests               | 	at io.trino.jdbc.ResultUtils.resultsException(ResultUtils.java:33)
tests               | 	at io.trino.jdbc.AsyncResultIterator.lambda$new$1(AsyncResultIterator.java:94)
tests               | 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:545)
tests               | 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:328)
tests               | 	... 3 more
tests               | 	Suppressed: java.lang.Exception: Query: UPDATE delta.default.update_case_compat_0qppmd20g1 SET upper = 0 WHERE lower = 1
tests               | 		at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:136)
tests               | 		at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:112)
tests               | 		at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
tests               | 		at io.trino.tests.product.utils.QueryExecutors$1.lambda$executeQuery$0(QueryExecutors.java:54)
tests               | 		at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243)
tests               | 		at dev.failsafe.Functions.lambda$get$0(Functions.java:46)
tests               | 		at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74)
tests               | 		at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187)
tests               | 		at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
tests               | 		at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112)
tests               | 		at io.trino.tests.product.utils.QueryExecutors$1.executeQuery(QueryExecutors.java:54)
tests               | 		at io.trino.tests.product.deltalake.TestDeltaLakeWriteDatabricksCompatibility.testCaseUpdateInPartition(TestDeltaLakeWriteDatabricksCompatibility.java:197)
tests               | 		at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
tests               | 		at java.base/java.lang.reflect.Method.invoke(Method.java:565)
tests               | 		at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
tests               | 		at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
tests               | 		at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
tests               | 		at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
tests               | 		at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
tests               | 		at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
tests               | 		... 3 more
tests               | Caused by: io.trino.spi.TrinoException: Unable to write deletion vector file
tests               | 	at io.trino.plugin.deltalake.DeltaLakeMergeSink.writeDeletionVector(DeltaLakeMergeSink.java:448)
tests               | 	at io.trino.plugin.deltalake.DeltaLakeMergeSink.writeMergeResult(DeltaLakeMergeSink.java:399)
tests               | 	at io.trino.plugin.deltalake.DeltaLakeMergeSink.lambda$finish$1(DeltaLakeMergeSink.java:347)
tests               | 	at java.base/java.util.HashMap.forEach(HashMap.java:1430)
tests               | 	at io.trino.plugin.deltalake.DeltaLakeMergeSink.finish(DeltaLakeMergeSink.java:345)
tests               | 	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMergeSink.finish(ClassLoaderSafeConnectorMergeSink.java:52)
tests               | 	at io.trino.operator.MergeWriterOperator.finish(MergeWriterOperator.java:196)
tests               | 	at io.trino.operator.Driver.processInternal(Driver.java:418)
tests               | 	at io.trino.operator.Driver.lambda$process$0(Driver.java:303)
tests               | 	at io.trino.operator.Driver.tryWithLock(Driver.java:706)
tests               | 	at io.trino.operator.Driver.process(Driver.java:295)
tests               | 	at io.trino.operator.Driver.processForDuration(Driver.java:266)
tests               | 	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:891)
tests               | 	at io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:189)
tests               | 	at io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:651)
tests               | 	at io.trino.$gen.Trino_474_e_15_6_3_gbf0e8983____20260116_085601_2.run(Unknown Source)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
tests               | 	at java.base/java.lang.Thread.run(Thread.java:1447)
tests               | Caused by: java.lang.NullPointerException: Cannot invoke "io.trino.spi.type.Type.equals(Object)" because "type" is null
tests               | 	at io.trino.plugin.deltalake.transactionlog.DeltaLakeParquetStatisticsUtils.getMin(DeltaLakeParquetStatisticsUtils.java:303)
tests               | 	at io.trino.plugin.deltalake.transactionlog.DeltaLakeParquetStatisticsUtils.lambda$jsonEncode$1(DeltaLakeParquetStatisticsUtils.java:256)
tests               | 	at com.google.common.collect.CollectCollectors.lambda$toImmutableMap$0(CollectCollectors.java:193)
tests               | 	at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
tests               | 	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:197)
tests               | 	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024)
tests               | 	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
tests               | 	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560)
tests               | 	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
tests               | 	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
tests               | 	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:727)
tests               | 	at io.trino.plugin.deltalake.transactionlog.DeltaLakeParquetStatisticsUtils.jsonEncode(DeltaLakeParquetStatisticsUtils.java:256)
tests               | 	at io.trino.plugin.deltalake.transactionlog.DeltaLakeParquetStatisticsUtils.jsonEncodeMin(DeltaLakeParquetStatisticsUtils.java:244)
tests               | 	at io.trino.plugin.deltalake.DeltaLakeWriter.mergeStats(DeltaLakeWriter.java:226)
tests               | 	at io.trino.plugin.deltalake.DeltaLakeWriter.readStatistics(DeltaLakeWriter.java:211)
tests               | 	at io.trino.plugin.deltalake.DeltaLakeMergeSink.writeDeletionVector(DeltaLakeMergeSink.java:434)
tests               | 	... 18 more
tests               | ]

Additional context and related issues

The issue can be easily reproduced with TestDeltaLakeWriteDatabricksCompatibility.testCaseUpdateInPartition test on a Databricks 14.x runtime.

It seems that the partition columns are being materialized in the data files even without the feature materializePartitionColumns being present in the table.

https://github.com/delta-io/delta/blob/master/PROTOCOL.md#materialize-partition-columns

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

If the parquet data file statistics contain more columns (e.g.: the partition columns)
than the data columns from the Delta Lake table, an NPE is going to be thrown when
encoding the min/max stats.
Flip the existing logic to build the stats based on the columns of the table instead of
the stats of the parquet data file.
@cla-bot cla-bot bot added the cla-signed label Jan 19, 2026
@github-actions github-actions bot added the delta-lake Delta Lake connector label Jan 19, 2026
@findinpath
Copy link
Contributor Author

trinodb/trino maintainers, could anyone pls run this PR with secrets?

@ebyhr
Copy link
Member

ebyhr commented Jan 19, 2026

/test-with-secrets sha=7995f9ed389f086439459fa89d7dc5afd2bd8fbe

@github-actions
Copy link

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/21152087604


// Test partition case sensitivity when updating
@Test(groups = {DELTA_LAKE_DATABRICKS, PROFILE_SPECIFIC_TESTS}, dataProvider = "partition_column_names")
@Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, PROFILE_SPECIFIC_TESTS}, dataProvider = "partition_column_names")
Copy link
Member

@ebyhr ebyhr Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#27956 passes without DeltaLakeParquetStatisticsUtils change.

The materializePartitionColumns failure should be handled by the last condition of:

.filter(entry -> entry.getValue() != null && entry.getValue().isPresent() && !entry.getValue().get().isEmpty() && typeForColumn.containsKey(entry.getKey()))

Copy link
Contributor Author

@findinpath findinpath Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - you're right.
I was working on 474 when reproducing the issue.
Thanks for the heads'up @ebyhr 🙏

@findinpath findinpath closed this Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector

Development

Successfully merging this pull request may close these issues.

2 participants