Skip to content

[FEATURE] Support Iceberg snapshot maintenance procedures (expire_snapshots, remove_orphan_files, rewrite_data_files) via Gravitino Trino Connector #10280

@awslife

Description

@awslife

Describe the feature

Currently, Iceberg snapshot maintenance procedures available in the native Trino Iceberg Connector are not supported through the Gravitino Trino Connector.
This feature request proposes full support for Iceberg system procedures delegation through the Gravitino Trino Connector, so that users can manage snapshot lifecycle entirely within Trino without needing to rely on external tools such as Spark or the Iceberg Java API.

Motivation

When using Gravitino as a unified metadata layer, users naturally expect that all catalog operations — including maintenance tasks — are accessible through the same interface.
Currently, snapshot cleanup must be performed via a separate tool (e.g., Spark, Iceberg Java API), which introduces operational complexity and breaks the unified access model that Gravitino aims to provide.
Supporting these procedures through the Gravitino Trino Connector would:

  • Allow users to fully manage Iceberg table lifecycle within a single Trino interface.
  • Eliminate the need to maintain a separate Spark or Java-based pipeline solely for snapshot cleanup.
  • Strengthen Gravitino's value as a truly unified metadata and catalog management layer.

Describe the solution

The following Iceberg system procedures should be supported via the Gravitino Trino Connector:

Procedure Description
system.expire_snapshots Remove old snapshots older than a given timestamp
system.remove_orphan_files Delete orphan data files not referenced by any snapshot
system.rewrite_data_files Compact small data files into larger ones
system.rewrite_manifests Rewrite manifest files for improved query performance

Example Usage (Expected to work after this feature is implemented)

-- Expire old snapshots
CALL gravitino_catalog.system.expire_snapshots(
    schema_name => 'my_schema',
    table_name  => 'my_table',
    older_than  => TIMESTAMP '2024-01-01 00:00:00'
);

-- Remove orphan files
CALL gravitino_catalog.system.remove_orphan_files(
    schema_name => 'my_schema',
    table_name  => 'my_table'
);

-- Compact small files
CALL gravitino_catalog.system.rewrite_data_files(
    schema_name => 'my_schema',
    table_name  => 'my_table'
);

Current Behavior
These procedure calls either fail with an error (e.g., procedure not found, unsupported operation) or complete silently without actually performing the expected maintenance operations.
This is because the Gravitino Trino Connector acts as a metadata proxy layer and does not currently delegate Iceberg-specific system procedures to the underlying catalog.

Expected Behavior
The Gravitino Trino Connector should properly intercept and delegate Iceberg system procedure calls to the underlying Iceberg catalog, in the same way that the native Trino Iceberg Connector handles them.

Environment

Apache Gravitino version: (1.2.0-rc6)
Trino version: (472)
Iceberg version: (1.8)
Catalog type: Iceberg (backed by REST)

Additional context

This feature is particularly important for production environments where automated snapshot expiration and storage cost management are critical operational requirements.
Without this feature, Gravitino cannot be adopted as a complete metadata management solution for Iceberg-heavy workloads.
Thank you for considering this feature request!

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions