Spark: Improve table existence verification logic #14457

jerqi · 2025-10-31T12:33:10Z

Default implementation used the load the table to judge the table exists. It' better to use the method tableExists in the class SparkCatalog and SessionCatalog directly . Because these classes may have more effective implementation like RESTCatalog.

…ogic

singhpk234

LGTM, thanks @jerqi !

huaxingao · 2025-10-31T17:07:03Z

Shall we also apply this change in v3.5?

kevinjqliu

LGTM,

shall we also add to SparkCatalog?

jerqi · 2025-11-01T10:06:22Z

LGTM,

shall we also add to SparkCatalog?

Added.

jerqi · 2025-11-01T10:06:43Z

Shall we also apply this change in v3.5?

Applied to Spark3.5,Spark3.4.

jerqi · 2025-11-01T10:31:06Z

spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

+          return true;
+        }
+
+        // if the original load didn't work, try using the namespace as an identifier because


I kept consistent with origin logic. Origin logic will reuse the method loadTable. Could we simplify the code ? We don't need to consider the cases the identifier may include a snapshot selector or may point to change log. The upper layer usually to verify the table existence before creating ,altering or dropping tables. The identifiers of altering, creating or dropping tables don't contain snapshot selector or pointer of change log.

could we reuse the load logic here?

@Override public boolean tableExists(Identifier ident) { if (isPathIdentifier(ident)) { try { tables.load(((PathIdentifier) ident).location()); return true; } catch (org.apache.iceberg.exceptions.NoSuchTableException e) { return false; } } else { return icebergCatalog.tableExists(buildIdentifier(ident)); } }

If this gets too complicated, feel free to ignore my comment and lets just move forward with only the SparkSessionCatalog change

OK for me. I can reuse the load logic. Thanks.

Thanks. I have modified.

It may be better uses the method exists in the class HadoopTables.

kevinjqliu

Thanks for adding for all the Spark versions. Left a minor comment

kevinjqliu · 2025-11-04T02:00:48Z

spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

+          return true;
+        }
+
+        // if the original load didn't work, try using the namespace as an identifier because


could we reuse the load logic here?

@Override public boolean tableExists(Identifier ident) { if (isPathIdentifier(ident)) { try { tables.load(((PathIdentifier) ident).location()); return true; } catch (org.apache.iceberg.exceptions.NoSuchTableException e) { return false; } } else { return icebergCatalog.tableExists(buildIdentifier(ident)); } }

If this gets too complicated, feel free to ignore my comment and lets just move forward with only the SparkSessionCatalog change

kevinjqliu · 2025-11-04T03:01:13Z

@singhpk234 could you double check the new logic for SparkCatalog.java?

huaxingao · 2025-11-04T03:14:35Z

spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

+  @Override
+  public boolean tableExists(Identifier ident) {
+    if (isPathIdentifier(ident)) {
+      return tables.exists(((PathIdentifier) ident).location());


Do we need to parse the ident and use the base location?

It depends on the existence semantics. Do we need to consider the snapshot, change log tables?
There are two solution ways:

For snapshot, change log tables, we return false directly. Current implementation follows this way.

For snapshot, change log tables, we parsed them and verify the existence. I have a previous commit

public boolean tableExists(Identifier ident) { try { if (isPathIdentifier(ident)) { loadFromPathIdentifier((PathIdentifier) ident); return true; } else { boolean isExists = icebergCatalog.tableExists(buildIdentifier(ident)); if (isExists) { return true; } if (ident.namespace().length == 0) { return false; } // if the original load didn't work, try using the namespace as an identifier because // the original identifier may include a snapshot selector or may point to the changelog TableIdentifier namespaceAsIdent = buildIdentifier(namespaceToIdentifier(ident.namespace())); Matcher tag = TAG.matcher(ident.name()); if (tag.matches()) { org.apache.iceberg.Table table = icebergCatalog.loadTable(namespaceAsIdent); Snapshot tagSnapshot = table.snapshot(tag.group(1)); return tagSnapshot != null; } if (icebergCatalog.tableExists(namespaceAsIdent)) { if (ident.name().equalsIgnoreCase(SparkChangelogTable.TABLE_NAME)) { return true; } Matcher at = AT_TIMESTAMP.matcher(ident.name()); if (at.matches()) { return true; } Matcher id = SNAPSHOT_ID.matcher(ident.name()); if (id.matches()) { return true; } Matcher branch = BRANCH.matcher(ident.name()); if (branch.matches()) { return true; } if (ident.name().equalsIgnoreCase(REWRITE)) { return true; } } return false; } } catch (org.apache.iceberg.exceptions.NoSuchTableException e) { return false; } }

The way 1 is more clear.

They way 2 follows the origin code semantics.

Do u have any suggestion?

I agree way 1 is clearer.

[MINOR] refactor(improvement): improve table existence verification l…

eb8171d

…ogic

github-actions bot added the spark label Oct 31, 2025

jerqi changed the title ~~[MINOR] refactor(improvement): improve table existence verification logic~~ [MINOR] refactor(spark): improve table existence verification logic Oct 31, 2025

jerqi changed the title ~~[MINOR] refactor(spark): improve table existence verification logic~~ refactor(spark): improve table existence verification logic Oct 31, 2025

jerqi changed the title ~~refactor(spark): improve table existence verification logic~~ Spark: Improve table existence verification logic Oct 31, 2025

singhpk234 approved these changes Oct 31, 2025

View reviewed changes

kevinjqliu approved these changes Oct 31, 2025

View reviewed changes

jerqi added 2 commits November 1, 2025 17:06

Apply to Spark3.4,Spark3.5

08e5353

Modify Spark Catalog

ecbd775

fix style

d738f10

jerqi commented Nov 1, 2025

View reviewed changes

jerqi added 3 commits November 1, 2025 18:38

Add blank line

05d8e8e

Fix indent

0181cc7

fix ut

a9f1af9

kevinjqliu reviewed Nov 4, 2025

View reviewed changes

jerqi added 3 commits November 4, 2025 10:10

Address comment

683d473

Use exists

e15f3fa

remove blank line

df8f14e

kevinjqliu approved these changes Nov 4, 2025

View reviewed changes

huaxingao reviewed Nov 4, 2025

View reviewed changes

huaxingao approved these changes Nov 4, 2025

View reviewed changes

jerqi requested a review from singhpk234 November 4, 2025 08:30

Spark: Improve table existence verification logic #14457

Are you sure you want to change the base?

Spark: Improve table existence verification logic #14457

Conversation

jerqi commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

singhpk234 left a comment

Choose a reason for hiding this comment

Uh oh!

huaxingao commented Oct 31, 2025

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

jerqi commented Nov 1, 2025

Uh oh!

jerqi commented Nov 1, 2025

Uh oh!

jerqi Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinjqliu Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jerqi Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerqi Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jerqi Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

kevinjqliu Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

kevinjqliu commented Nov 4, 2025

Uh oh!

huaxingao Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jerqi Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

huaxingao Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jerqi commented Oct 31, 2025 •

edited

Loading

jerqi Nov 1, 2025 •

edited

Loading

jerqi Nov 4, 2025 •

edited

Loading