Skip to content

Pass allowFailure flag to statistics access#29006

Open
chenjian2664 wants to merge 2 commits intotrinodb:masterfrom
chenjian2664:jack/fix-stats-flaky-delta-lake
Open

Pass allowFailure flag to statistics access#29006
chenjian2664 wants to merge 2 commits intotrinodb:masterfrom
chenjian2664:jack/fix-stats-flaky-delta-lake

Conversation

@chenjian2664
Copy link
Copy Markdown
Contributor

@chenjian2664 chenjian2664 commented Apr 7, 2026

Description

Concurrent access to metadata statistics may lead to errors:

io.trino.testing.QueryFailedException: Failed to write Delta Lake transaction log entry
	at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:138)
	at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:664)
	at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:640)
	at io.trino.plugin.deltalake.TestDeltaLakeLocalConcurrentWritesTest.lambda$testConcurrentInsertsSelectingFromTheSameVersionedTable$2(TestDeltaLakeLocalConcurrentWritesTest.java:240)
	Suppressed: java.lang.Exception: SQL: INSERT INTO test_concurrent_inserts_select_from_same_versioned_table_jlty6w4qop SELECT 3, 'd' AS part FROM test_concurrent_inserts_select_from_same_versioned_table_jlty6w4qop FOR VERSION AS OF 0
Caused by: io.trino.spi.TrinoException: Failed to write Delta Lake transaction log entry
	at io.trino.plugin.deltalake.DeltaLakeMetadata.finishInsert(DeltaLakeMetadata.java:2696)
Caused by: io.trino.spi.TrinoException: Error reading statistics from cache
	at io.trino.plugin.deltalake.statistics.CachingExtendedStatisticsAccess.readExtendedStatistics(CachingExtendedStatisticsAccess.java:68)
	at io.trino.plugin.deltalake.DeltaLakeMetadata.updateTableStatistics(DeltaLakeMetadata.java:4818)
Caused by: java.lang.RuntimeException: Failed to decode JSON
	at io.trino.plugin.deltalake.statistics.MetaDirStatisticsAccess.decodeAndRethrowIfNotFound(MetaDirStatisticsAccess.java:142)
	at io.trino.plugin.deltalake.statistics.MetaDirStatisticsAccess.readExtendedStatistics(MetaDirStatisticsAccess.java:83)
Caused by: java.lang.IllegalArgumentException: Invalid JSON bytes for [simple type, class io.trino.plugin.deltalake.statistics.ExtendedStatistics]
	at io.airlift.json.JsonCodec.fromJson(JsonCodec.java:242)
Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input
 at [Source: REDACTED (StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION disabled); line: 1]

as the current implementation does not provide concurrency guarantees.
Since statistics access is typically non-essential to query execution, this PR introduces pass a flag to define the failure semantics, enabling best-effort handling where metadata statistics access failures do not interrupt the main execution path.

Fixes #21725
Fixes #22455

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Apr 7, 2026
@github-actions github-actions bot added delta-lake Delta Lake connector lakehouse labels Apr 7, 2026
@chenjian2664 chenjian2664 force-pushed the jack/fix-stats-flaky-delta-lake branch from 036cb3b to 4a9095d Compare April 7, 2026 07:25
Metadata statistics are treated as a best-effort optimization and should not
fail query execution. This change does not fully address concurrency issues
 in metadata statistics access, but allows the primary execution path to proceed
 when statistics are non-critical.
@chenjian2664 chenjian2664 force-pushed the jack/fix-stats-flaky-delta-lake branch from 4a9095d to 6045d9b Compare April 7, 2026 07:54
@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 7, 2026

Concurrent access to metadata statistics may lead to errors:

We can fix concurrent access/modification quite easily.
E.g. we can have a backup file a reader would use.

How will we fix concurrent writes?
I think it might be time to fix them?

Since statistics access is typically non-essential to query execution, this PR introduces a flag to define the failure semantics

This is not great

  • it hides problem rather than fix it
  • the assumption is wrong. A bad query plan can lead to query execution being 1000x more expensive or simply impossible to execute.

@chenjian2664
Copy link
Copy Markdown
Contributor Author

E.g. we can have a backup file a reader would use.

What's would that be like? there will be still a conflict when we are (concurrent) baking up, or still a chance reading an updating backup.

but If you've see an easy way I can have a try on it

How will we fix concurrent writes?

We don't fix the concurrent writes here, like previously we are passing the ignoreFailure in here
But we didn't notice that reading also could fail - I pass the ignoreFailure into for reading, this is major semantic I want to update in this pr.

Could you clarify what kind of "concurrent writes" you have in mind -- are we talking about within a single cluster or across multiple clusters?
If it's just within a single cluster, we could likely handle it by adding synchronization around the read/write methods.
For cross cluster cases, I don't think we currently have a reliable way to support all environments, similar to the log synchronizer challenges discussed in #28092

But it wouldn't matter if it is just a optimization we could just ignore it.

it hides problem rather than fix it

I might be missing something, but why do we have an ignoreFailure flag when writing statistics? My understanding is that they're mainly for optimizing metadata access, so I assumed failures wouldn't be critical.

{
try {
return uncheckedCacheGet(cache, new CacheKey(schemaTableName, tableLocation), () -> delegate.readExtendedStatistics(session, schemaTableName, tableLocation, credentialsHandle));
return uncheckedCacheGet(cache, new CacheKey(schemaTableName, tableLocation), () -> delegate.readExtendedStatistics(session, schemaTableName, tableLocation, credentialsHandle, allowFailure));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: split to multi-line

SchemaTableName schemaTableName,
String tableLocation,
VendedCredentialsHandle credentialsHandle,
boolean allowFailure)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add #18506 to the description of the PR as related PR for additional reviewer context

Copy link
Copy Markdown
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contribution is in line with #18506

While the approach of "ignoring failures" is certainly not ideal, pragmatically this contribution serves at least in avoiding test flakiness.

Let's use the opportunity and come up with a rough action plan on how to deal with Delta Lake table extended statistics before we go forward and merge and forget about the underlying issue (the fact that multiple readers & writers are trying to read & write from the very same extended stats file sometimes concurrently).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 participants