Using hive's `insert_existing_partitions_behavior` session property set to `OVERWRITE` can lead to data loss

### Trino version

481-SNAPSHOT

### Please describe the bug

Consider the test showcased in the commit https://github.com/findinpath/trino/commit/65e56c05e33c7b3a27e678e6cfcce814a85b5047

tldr;

```
            Session insertOverwriteSession = Session.builder(getQueryRunner().getDefaultSession())
                    .setCatalogSessionProperty("hive", "insert_existing_partitions_behavior", "OVERWRITE")
                    .build();

            // Simulate a timeout exception while updating the partition statistics of the table
            // as part of the logic of the method io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commitShared
            var metastore = bridgingHiveMetastoreHolder.get();
            metastore.setThrowTimeoutExceptionOnUpdatePartitionStatistics(true);
            assertThatThrownBy(() -> assertUpdate(insertOverwriteSession, "INSERT INTO " + tableName + " SELECT * FROM " + tableName + " UNION SELECT * FROM " + stagingTable))
                    .hasMessageContaining("All operations other than the following update operations were completed: replace partition parameter");
            metastore.setThrowTimeoutExceptionOnUpdatePartitionStatistics(false);

            // All the content of the table is gone (both the files which existed before running the query,
            // as well as the files inserted by the insert overwrite query)
            assertQueryReturnsEmptyResult("SELECT * FROM " + tableName);
            assertQuery("SELECT part FROM \"" + tableName + "$partitions\"", "VALUES 1, 10, 20");
```

When performing an `INSERT OVERWRITE` like query on top of AWS S3, this can lead to a catastrophic failure in the table because all the data of the table may be wiped out by a metastore timeout in the update of partition stats.

**Technical context**
In case of AWS S3 it creates in place new files corresponding to the insert statement and removes every other file from the partition that is not created by the ongoing query.
While doing rollback as a consequence of metastore timeout, it wipes out also the files created by the query.
This leaves the table's partitions touched by the query completely wiped out of content.


The documentation of the Hive connector https://trino.io/docs/current/connector/hive.html does not specify any warning with regards to using `hive.insert-existing-partitions-behavior` or associated `insert_existing_partitions_behavior` session property set to `OVERWRITE`.

Is it still safe to expose this functionality ?


**UPDATE**
In https://github.com/findinpath/trino/commit/02a6920061a82aa5558ec9e338a3ca8767368f2f I've also reproduced on top of a hierarchical file system (local) another data loss scenario (this time limited only to one partition)

```
            assertQuery("SELECT * FROM " + tableName, "VALUES (1, 1), (2, 1), (11, 10), (21, 20)");
            // Simulate a timeout exception while updating the 2nd partition
            metastore.setThrowTimeoutExceptionOnAlterPartitionCall(2);
            assertThatThrownBy(() -> assertUpdate(insertOverwriteSession, "INSERT INTO " + tableName + " SELECT * FROM " + tableName + " UNION SELECT * FROM " + stagingTableName))
                    .hasMessageContaining("timeout exception");
            metastore.setThrowTimeoutExceptionOnAlterPartitionCall(-1);

            // The content of the partition "part=1" is not available anymore
            assertQuery("SELECT * FROM " + tableName, "VALUES (11, 10), (21, 20)");
```

```
io.trino.spi.TrinoException: Unable to rename from local:///tpch/test_insert_overwrite_ble7w9zh74/_temp_part=1_20260512_073234_00006_59gvk to local:///tpch/test_insert_overwrite_ble7w9zh74/part=1: target directory already exists
	at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.renameDirectory(SemiTransactionalHiveMetastore.java:2603)
	at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore$Committer.executeRenameTasksForAbort(SemiTransactionalHiveMetastore.java:2029)
	at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commitShared(SemiTransactionalHiveMetastore.java:1545)
	at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commit(SemiTransactionalHiveMetastore.java:1260)
	at io.trino.plugin.hive.HiveMetadata.commit(HiveMetadata.java:3929)
	at io.trino.plugin.hive.HiveTransactionManager.lambda$commit$0(HiveTransactionManager.java:58)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at io.trino.plugin.hive.HiveTransactionManager.commit(HiveTransactionManager.java:56)
```

**Technical context**

In case of a hierarchical file system, the logic renames the partition directory to a `_temp_` prefixed directory which is supposed to be restored in case of a rollback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using hive's `insert_existing_partitions_behavior` session property set to `OVERWRITE` can lead to data loss #29428

Trino version

Please describe the bug

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Using hive's insert_existing_partitions_behavior session property set to OVERWRITE can lead to data loss #29428

Description

Trino version

Please describe the bug

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Using hive's `insert_existing_partitions_behavior` session property set to `OVERWRITE` can lead to data loss #29428