Trino version
481-SNAPSHOT
Please describe the bug
Consider the test showcased in the commit findinpath@65e56c0
tldr;
Session insertOverwriteSession = Session.builder(getQueryRunner().getDefaultSession())
.setCatalogSessionProperty("hive", "insert_existing_partitions_behavior", "OVERWRITE")
.build();
// Simulate a timeout exception while updating the partition statistics of the table
// as part of the logic of the method io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commitShared
var metastore = bridgingHiveMetastoreHolder.get();
metastore.setThrowTimeoutExceptionOnUpdatePartitionStatistics(true);
assertThatThrownBy(() -> assertUpdate(insertOverwriteSession, "INSERT INTO " + tableName + " SELECT * FROM " + tableName + " UNION SELECT * FROM " + stagingTable))
.hasMessageContaining("All operations other than the following update operations were completed: replace partition parameter");
metastore.setThrowTimeoutExceptionOnUpdatePartitionStatistics(false);
// All the content of the table is gone (both the files which existed before running the query,
// as well as the files inserted by the insert overwrite query)
assertQueryReturnsEmptyResult("SELECT * FROM " + tableName);
assertQuery("SELECT part FROM \"" + tableName + "$partitions\"", "VALUES 1, 10, 20");
When performing an INSERT OVERWRITE like query on top of AWS S3, this can lead to a catastrophic failure in the table because all the data of the table may be wiped out by a metastore timeout in the update of partition stats.
Technical context
In case of AWS S3 it creates in place new files corresponding to the insert statement and removes every other file from the partition that is not created by the ongoing query.
While doing rollback as a consequence of metastore timeout, it wipes out also the files created by the query.
This leaves the table's partitions touched by the query completely wiped out of content.
The documentation of the Hive connector https://trino.io/docs/current/connector/hive.html does not specify any warning with regards to using hive.insert-existing-partitions-behavior or associated insert_existing_partitions_behavior session property set to OVERWRITE.
Is it still safe to expose this functionality ?
UPDATE
In findinpath@02a6920 I've also reproduced on top of a hierarchical file system (local) another data loss scenario (this time limited only to one partition)
assertQuery("SELECT * FROM " + tableName, "VALUES (1, 1), (2, 1), (11, 10), (21, 20)");
// Simulate a timeout exception while updating the 2nd partition
metastore.setThrowTimeoutExceptionOnAlterPartitionCall(2);
assertThatThrownBy(() -> assertUpdate(insertOverwriteSession, "INSERT INTO " + tableName + " SELECT * FROM " + tableName + " UNION SELECT * FROM " + stagingTableName))
.hasMessageContaining("timeout exception");
metastore.setThrowTimeoutExceptionOnAlterPartitionCall(-1);
// The content of the partition "part=1" is not available anymore
assertQuery("SELECT * FROM " + tableName, "VALUES (11, 10), (21, 20)");
io.trino.spi.TrinoException: Unable to rename from local:///tpch/test_insert_overwrite_ble7w9zh74/_temp_part=1_20260512_073234_00006_59gvk to local:///tpch/test_insert_overwrite_ble7w9zh74/part=1: target directory already exists
at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.renameDirectory(SemiTransactionalHiveMetastore.java:2603)
at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore$Committer.executeRenameTasksForAbort(SemiTransactionalHiveMetastore.java:2029)
at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commitShared(SemiTransactionalHiveMetastore.java:1545)
at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commit(SemiTransactionalHiveMetastore.java:1260)
at io.trino.plugin.hive.HiveMetadata.commit(HiveMetadata.java:3929)
at io.trino.plugin.hive.HiveTransactionManager.lambda$commit$0(HiveTransactionManager.java:58)
at java.base/java.util.Optional.ifPresent(Optional.java:178)
at io.trino.plugin.hive.HiveTransactionManager.commit(HiveTransactionManager.java:56)
Technical context
In case of a hierarchical file system, the logic renames the partition directory to a _temp_ prefixed directory which is supposed to be restored in case of a rollback.
Trino version
481-SNAPSHOT
Please describe the bug
Consider the test showcased in the commit findinpath@65e56c0
tldr;
When performing an
INSERT OVERWRITElike query on top of AWS S3, this can lead to a catastrophic failure in the table because all the data of the table may be wiped out by a metastore timeout in the update of partition stats.Technical context
In case of AWS S3 it creates in place new files corresponding to the insert statement and removes every other file from the partition that is not created by the ongoing query.
While doing rollback as a consequence of metastore timeout, it wipes out also the files created by the query.
This leaves the table's partitions touched by the query completely wiped out of content.
The documentation of the Hive connector https://trino.io/docs/current/connector/hive.html does not specify any warning with regards to using
hive.insert-existing-partitions-behavioror associatedinsert_existing_partitions_behaviorsession property set toOVERWRITE.Is it still safe to expose this functionality ?
UPDATE
In findinpath@02a6920 I've also reproduced on top of a hierarchical file system (local) another data loss scenario (this time limited only to one partition)
Technical context
In case of a hierarchical file system, the logic renames the partition directory to a
_temp_prefixed directory which is supposed to be restored in case of a rollback.