Open
Description
Feature Request / Improvement
Currently, Hive provides a PURGE option when dropping tables, allowing users to choose whether to immediately delete the data or move it to the Trash. (reference code)
In the past, we experienced a situation where dropping a table with a large amount of data triggered a massive number of delete requests to the Hadoop NameNode, causing response delays and system load issues. We believe that if there had been an option to move the data to the Trash, as Hive provides, this issue could have been mitigated.
We believe that introducing such a feature would help reduce the load on the NameNode during large-scale data deletions and improve overall operational stability.
Related Source Links
- https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/CatalogUtil.java#L94-L139
- https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java#L101-L109
Query engine
Spark
Willingness to contribute
- I can contribute this improvement/feature independently
- I would be willing to contribute this improvement/feature with guidance from the Iceberg community
- I cannot contribute this improvement/feature at this time