Parallelize and bulk-resolve getViews in the Iceberg JDBC catalog by tbaeg · Pull Request #29751 · trinodb/trino

tbaeg · 2026-06-04T02:29:09Z

Description

#29738 should be merged first.

Problem

Listing views in the Iceberg JDBC catalog (e.g. querying information_schema.views) was slow and got linearly worse as the number of views grew.

Loading each view requires two network round trips:

A query to the metastore database to resolve where the view's metadata file lives.
A read of that metadata file from the underlying file system (e.g. an S3 GET).

For a schema with N views, this meant N database queries plus N file reads, all performed one after another. Listing views in a busy schema therefore hammered the metastore DB and stacked up file-read latency serially.

Solution

Two changes, one per commit:

Load views in parallel.
Instead of loading views one at a time, the per-view loads are fanned out across the shared metadata executor (bounded by iceberg.metadata-parallelism). The file reads now happen concurrently rather than back-to-back.
Resolve all view metadata pointers in a single bulk query.
Parallelizing still left one separate database query per view. This adds a bulk lookup that resolves every view's metadata location for a schema in one query, replacing the N per-view queries.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(X) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

…atalogTest

TrinoJdbcCatalog.getViews listed views across every namespace in the catalog, ignoring the namespace argument it was given. Route it through the listNamespaces helper so it only scans the requested namespace. The helper now returns an empty list for a present but non-existent namespace rather than falling back to listing all namespaces. Callers already verify that a namespace exists, so this is not a correctness change for listTables/listIcebergTables/listViews; it just avoids issuing extra JDBC queries for a namespace that does not exist. Add TestTrinoJdbcCatalog, a BaseTrinoCatalogTest implementation backed by the Postgres-based TestingIcebergJdbcServer, and add testNamespaceFilter to the shared base test covering listTables, listViews and getViews honoring the namespace filter. TrinoRestCatalog.getViews throws for a non-existent namespace, so TestTrinoRestCatalog overrides the test to document that behavior until it is normalized. Co-authored-by: jdchandler88 <jdchandler88@gmail.com>

TrinoJdbcCatalog.getViews loaded each view sequentially. Every loadView resolves the view metadata pointer over JDBC and then reads the view metadata JSON from the underlying file system (e.g. an S3 GET), so listing views for information_schema.views incurred one network round-trip per view, serialized. Fan out the per-view loads across the shared @ForIcebergMetadata executor via processWithAdditionalThreads, bounded by iceberg.metadata-parallelism, mirroring TrinoGlueCatalog. Because the work is I/O-wait bound, this collapses the wall-clock cost from N sequential round-trips to N/parallelism. Extract the load-and-convert logic into loadViewDefinition, which skips the viewExists check since getViews has already listed the view, halving its JDBC calls. getView keeps the viewExists check for callers that pass an unverified name. loadViewDefinition treats NoSuchViewException as an absent view to stay graceful when a view is dropped concurrently with listing. Co-authored-by: jdchandler88 <jdchandler88@gmail.com>

getViews loaded each view via jdbcCatalog.loadView, whose JdbcViewOperations.doRefresh issues a per-view JDBC query (JdbcUtil.loadView) to resolve the metadata pointer before reading the metadata JSON. So even after the prior parallelization there was still one JDBC round trip per view, just spread across threads. Add IcebergJdbcClient.getViewMetadataLocations, which resolves every view's metadata_location for a namespace in a single query (views are stored in iceberg_tables with iceberg_type = 'VIEW'). getViews now runs that one bulk query per namespace - also subsuming the listViews call - and parses each view's metadata JSON directly via ViewMetadataParser on the metadata-fetching executor, mirroring how tables already resolve a location through IcebergJdbcClient and then read JSON via FileIO. The view-definition conversion is factored into createViewDefinition, shared by the bulk path and the single-view getView path (which keeps using loadView). A view dropped concurrently with listing is treated as absent (NotFoundException), matching the prior NoSuchViewException handling. Co-authored-by: jdchandler88 <jdchandler88@gmail.com>

tbaeg · 2026-06-07T00:17:05Z

Failure doesn't appear to be related.

github-actions Bot added iceberg Iceberg connector cla-signed labels Jun 4, 2026

tbaeg force-pushed the perf/jdbc-parallel-get-views branch from d5e21b1 to 49687d2 Compare June 6, 2026 03:13

tbaeg changed the title ~~Parallelize getViews in Iceberg JDBC catalog~~ Parallelize and bulk-resolve getViews in the Iceberg JDBC catalog Jun 6, 2026

tbaeg and others added 4 commits June 6, 2026 10:40

Co-locate methods of similar access/non-access modifier in BaseTrinoC…

671e32c

…atalogTest

tbaeg force-pushed the perf/jdbc-parallel-get-views branch from 49687d2 to 32040eb Compare June 6, 2026 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize and bulk-resolve getViews in the Iceberg JDBC catalog#29751

Parallelize and bulk-resolve getViews in the Iceberg JDBC catalog#29751
tbaeg wants to merge 4 commits into
trinodb:masterfrom
tbaeg:perf/jdbc-parallel-get-views

tbaeg commented Jun 4, 2026 •

edited

Loading

Uh oh!

tbaeg commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

tbaeg commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Additional context and related issues

Release notes

Uh oh!

tbaeg commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

tbaeg commented Jun 4, 2026 •

edited

Loading