Skip to content

Conversation

@krvikash
Copy link
Contributor

@krvikash krvikash commented Nov 20, 2025

Description

Fixes #26109

Additional context and related issues

Stack trace:

io.trino.testing.QueryFailedException: Invalid schema: multiple fields for name partition.part_trunc: 1000 and 1001

	at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:138)
	at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:587)
	at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:570)
	at io.trino.sql.query.QueryAssertions$QueryAssert.lambda$new$1(QueryAssertions.java:317)
	at com.google.common.base.Suppliers$NonSerializableMemoizingSupplier.get(Suppliers.java:201)
	at io.trino.sql.query.QueryAssertions$QueryAssert.result(QueryAssertions.java:436)
	at io.trino.sql.query.QueryAssertions$QueryAssert.matches(QueryAssertions.java:357)
	at io.trino.plugin.iceberg.BaseIcebergSystemTables.testFilesPartitionEvolutionUsingTruncateOnSameColumn(BaseIcebergSystemTables.java:574)
	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
	at java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:511)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1450)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2019)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:187)
	Suppressed: java.lang.Exception: SQL: SELECT partition FROM "test_files_tableu9g13hdwhk$files"
		at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:594)
		... 12 more
Caused by: org.apache.iceberg.exceptions.ValidationException: Invalid schema: multiple fields for name partition.part_trunc: 1000 and 1001
	at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
	at org.apache.iceberg.types.IndexByName.addField(IndexByName.java:200)
	at org.apache.iceberg.types.IndexByName.field(IndexByName.java:161)
	at org.apache.iceberg.types.IndexByName.field(IndexByName.java:34)
	at org.apache.iceberg.types.TypeUtil.visit(TypeUtil.java:663)
	at org.apache.iceberg.types.TypeUtil.visit(TypeUtil.java:659)
	at org.apache.iceberg.types.TypeUtil.indexNameById(TypeUtil.java:173)
	at org.apache.iceberg.Schema.lazyIdToName(Schema.java:225)
	at org.apache.iceberg.Schema.<init>(Schema.java:154)
	at org.apache.iceberg.Schema.<init>(Schema.java:111)
	at org.apache.iceberg.Schema.<init>(Schema.java:99)
	at org.apache.iceberg.Schema.<init>(Schema.java:95)
	at org.apache.iceberg.BaseFilesTable.schema(BaseFilesTable.java:49)
	at org.apache.iceberg.FilesTable.schema(FilesTable.java:24)
	at io.trino.plugin.iceberg.system.FilesTable.splitSource(FilesTable.java:141)
	at io.trino.plugin.base.classloader.ClassLoaderSafeSystemTable.splitSource(ClassLoaderSafeSystemTable.java:123)
	at io.trino.connector.system.SystemSplitManager.getSplits(SystemSplitManager.java:74)
	at io.trino.split.SplitManager.getSplits(SplitManager.java:89)
	at io.trino.sql.planner.SplitSourceFactory$Visitor.createSplitSource(SplitSourceFactory.java:191)
	at io.trino.sql.planner.SplitSourceFactory$Visitor.visitTableScan(SplitSourceFactory.java:158)
	at io.trino.sql.planner.SplitSourceFactory$Visitor.visitTableScan(SplitSourceFactory.java:132)
	at io.trino.sql.planner.plan.TableScanNode.accept(TableScanNode.java:219)
	at io.trino.sql.planner.SplitSourceFactory$Visitor.visitOutput(SplitSourceFactory.java:368)
	at io.trino.sql.planner.SplitSourceFactory$Visitor.visitOutput(SplitSourceFactory.java:132)
	at io.trino.sql.planner.plan.OutputNode.accept(OutputNode.java:82)
	at io.trino.sql.planner.SplitSourceFactory.createSplitSources(SplitSourceFactory.java:112)
	at io.trino.execution.scheduler.PipelinedQueryScheduler$DistributedStagesScheduler.createStageScheduler(PipelinedQueryScheduler.java:1075)
	at io.trino.execution.scheduler.PipelinedQueryScheduler$DistributedStagesScheduler.create(PipelinedQueryScheduler.java:949)
	at io.trino.execution.scheduler.PipelinedQueryScheduler.createDistributedStagesScheduler(PipelinedQueryScheduler.java:328)
	at io.trino.execution.scheduler.PipelinedQueryScheduler.start(PipelinedQueryScheduler.java:311)
	at io.trino.execution.SqlQueryExecution.start(SqlQueryExecution.java:441)
	at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:284)
	at io.trino.dispatcher.LocalDispatchQuery.startExecution(LocalDispatchQuery.java:150)
	at io.trino.dispatcher.LocalDispatchQuery.lambda$waitForMinimumWorkers$1(LocalDispatchQuery.java:134)
	at io.airlift.concurrent.MoreFutures.lambda$addSuccessCallback$0(MoreFutures.java:570)
	at io.airlift.concurrent.MoreFutures$3.onSuccess(MoreFutures.java:545)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1132)
	at io.trino.$gen.Trino_testversion____20251120_073657_1.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
	at java.base/java.lang.Thread.run(Thread.java:1474)

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:

## Iceberg
* Fix $files when partition evolution using truncate and bucket on same column. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 20, 2025
@github-actions github-actions bot added the iceberg Iceberg connector label Nov 20, 2025
@krvikash krvikash force-pushed the krvikash/fix-iceberg-file-system-table branch 2 times, most recently from e7364d6 to 95ab895 Compare November 20, 2025 10:49
String column = fromIdentifierToColumn(match.group(1));
builder.bucket(column, parseInt(match.group(2)), column + "_bucket" + suffix);
int numBuckets = parseInt(match.group(2));
builder.bucket(column, numBuckets, column + "_bucket_" + numBuckets + suffix);
Copy link
Member

@ebyhr ebyhr Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change causes behavior change even without partition evolution.

CREATE TABLE test(a varchar) WITH (partitioning = ARRAY['truncate(a, 1)']);
INSERT INTO test VALUES 'abc';
SELECT "$partition" FROM test;

The bottom SELECT returned a_trunc=a before this PR, it returns a_trunc_1=a now. It it possible to keep the original behavior as much as possible? Spark doesn't append the number unless partition happens if I remember correctly.

@krvikash krvikash force-pushed the krvikash/fix-iceberg-file-system-table branch from 95ab895 to 180a6a9 Compare November 21, 2025 13:41
@krvikash krvikash force-pushed the krvikash/fix-iceberg-file-system-table branch from 180a6a9 to 0607069 Compare November 21, 2025 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

Invalid schema error when querying $files table after updating Iceberg partition spec

2 participants