Skip to content

RuntimeStatsMetric Reporter for table scans in Iceberg Connector #24904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

j-sund
Copy link

@j-sund j-sund commented Apr 10, 2025

Description

Adds a RuntimeStatsMetricsReporter class that exposes iceberg metrics.

Motivation and Context

Exposing these metrics can improve query observability, which thus makes future improvements to the Iceberg integration a little easier. This also enables better performance monitoring.

Impact

Allows for better visibility into the Iceberg integration.

Test Plan

Ensure the metric is visible and being populated. Basic tests are included in IcebergDistributedSmokeTestBase.java class

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • [] Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

Copy link

linux-foundation-easycla bot commented Apr 10, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@j-sund j-sund marked this pull request as ready for review April 12, 2025 03:16
@j-sund j-sund requested review from hantangwangd, ZacBlanco and a team as code owners April 12, 2025 03:16
@j-sund j-sund requested a review from presto-oss April 12, 2025 03:16
@ZacBlanco
Copy link
Contributor

@j-sund Can you sign the CLA using the link in the comment above? You can just sign as a personal contributor.

Also, before your next push try running ./mvnw validate -pl presto-iceberg to ensure your code passes all the checkstyle and license rules. I see the maven checks / maven-checks CI action is failing with the following:

Warning:  Missing header in: /home/runner/work/presto/presto/presto-iceberg/src/main/java/com/facebook/presto/iceberg/RuntimeStatsMetricsReporter.java

@j-sund j-sund force-pushed the runtimeStatsReporter branch from cfc3890 to 67e5b9b Compare April 23, 2025 16:42
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor things. This looks great!

@j-sund j-sund force-pushed the runtimeStatsReporter branch 2 times, most recently from 72ef7e5 to 6124180 Compare May 5, 2025 21:42
@j-sund j-sund force-pushed the runtimeStatsReporter branch from 6124180 to 5b96c26 Compare May 15, 2025 19:49
@j-sund j-sund force-pushed the runtimeStatsReporter branch from 5b96c26 to 8b19e78 Compare May 16, 2025 18:50
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great feature! Some nits, and a minor problem in the test code.

Comment on lines +45 to +55
/**
* Helper method to construct the full metric name for a table scan.
*
* @param tableName the name of the table
* @param metricName the name of the metric
* @return the composed metric name in the format: table.scan.metric
*/
private String tableScanString(final String tableName, final String metricName)
{
return tableName + ".scan." + metricName;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this private method to the bottom, and it can be declared static.

@@ -96,16 +97,19 @@ public ConnectorSplitSource getSplits(
return new ChangelogSplitSource(session, typeManager, icebergTable, scan);
}
else if (table.getIcebergTableName().getTableType() == EQUALITY_DELETES) {
RuntimeStats runtimeStats = session.getRuntimeStats();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't comment above, so commenting here. Should we also register metrics reporter for CHANGELOG table scan?

@@ -168,14 +168,15 @@ public static TableStatistics getTableStatistics(
Table icebergTable,
List<IcebergColumnHandle> columns)
{
return new TableStatisticsMaker(icebergTable, session, typeManager).makeTableStatistics(statisticsFileCache, tableHandle, currentPredicate, constraint, columns);
return new TableStatisticsMaker(icebergTable, session, typeManager).makeTableStatistics(statisticsFileCache, tableHandle, currentPredicate, constraint, columns, session);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newly added parameter session is useless.

}

private TableStatistics makeTableStatistics(StatisticsFileCache statisticsFileCache,
IcebergTableHandle tableHandle,
Optional<TupleDomain<IcebergColumnHandle>> currentPredicate,
Constraint constraint,
List<IcebergColumnHandle> selectedColumns)
List<IcebergColumnHandle> selectedColumns,
ConnectorSession session)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newly added parameter session is useless.


assertTrue(runtimestats
.getMetrics()
.get("iceberg.tpch.orders.scan.totalPlanningDuration")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't hardcode iceberg.tpch.orders as the table name here and below, as it may very across different catalog test subclasses. This seems to be the core reason of failures in TestIcebergSmokeHive and TestIcebergSmokeRestNestedNameSpace.

@hantangwangd
Copy link
Member

I have submit a PR #25140 to make sure that the table name loaded from all catalogs have consistent formatting <catalog>.<schema>.<table>. So I think after that change being merged, we can use the following code to build table name uniformly:

        String catalog = getSession().getCatalog().get();
        String schema = getSession().getSchema().get();
        String tableName = catalog + "." + schema + ".orders";

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants