Skip to content

Proposal to Introduce Trash Option for Hadoop Data Deletion to Mitigate NameNode Load #12560

Open
@Sunwoo-Shin

Description

@Sunwoo-Shin

Feature Request / Improvement

Currently, Hive provides a PURGE option when dropping tables, allowing users to choose whether to immediately delete the data or move it to the Trash. (reference code)

In the past, we experienced a situation where dropping a table with a large amount of data triggered a massive number of delete requests to the Hadoop NameNode, causing response delays and system load issues. We believe that if there had been an option to move the data to the Trash, as Hive provides, this issue could have been mitigated.

We believe that introducing such a feature would help reduce the load on the NameNode during large-scale data deletions and improve overall operational stability.

Related Source Links

Query engine

Spark

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions