Skip to content

[Iceberg][RESTCatalog][Optimization]- clean_expired_metadata not working as expected #28106

@Swarupfule

Description

@Swarupfule

Problem Statement: clean_expired_metadata param in in expire snapshot not working as expected
Expected: Clean expire metadata should also clear the unsused metadata file in storage
Observed: The paramter not cleaning up the files

Performed Steps

  1. Created schema name iceberg.optimization_bucket.
  2. Created table orders with partition
  3. Inserted some data
  4. Took note of current counts of metadata and snapshots
  5. Executed expire_snapshots without clean_expired_metadata:
      Observation:
     The older snapshot file gets deleted and a new manifest JSON file is added which points to the latest snapshot.The older manifest JSON files did not get cleared from the bucket.6) Inserted some more data
  6. Ran the expire_snapshots query with clean_expired_metadata => TRUE:

SET SESSION iceberg.expire_snapshots_min_retention = '0s';

ALTER TABLE iceberg.optimization_bucket.orders  
EXECUTE expire_snapshots(retention_threshold   => '0s',      retain_last => 1,     

ExpiresnapshotparamNotWorkingAsExpected.docx

=> TRUE );

Observation: Same behaviour as running without it — older manifest JSON files still present in the bucket. Question: Based on Trino 479 release notes (https://trino.io/docs/current/release/release-479.html[)](https://trino.io/docs/current/release/release-479.html)),

my expectation is that clean_expired_metadata should also delete the old (unreferenced) manifest JSON files from the warehouse. Is this expectation correct for Trino 479 with Iceberg REST catalog, or is the scope of clean_expired_metadata?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions