Core,Api: Add overwrite option when register external table to catalog #12228

dramaticlly · 2025-02-11T22:49:15Z

This PR adds a new register-table with overwrite option on Catalog interface to allow overwrite table metadata of an existing Iceberg table. The overwrite is achieved via TableOperations.commit(base, new) for catalogs extends BaseMetastoreCatalog.

Relate to RESTTableOperations does not support table metadata swap like others TableOperations did #12134
openAPI REST spec change merged in OpenAPI: Add overwrite option when registering an iceberg table #12239

dramaticlly · 2025-02-12T03:43:42Z

Java CI Failure is timing out on concurrent fast append and seems unrelated to the change.

@rdblue @RussellSpitzer @danielcweeks do you want to take a look?

gaborkaszab

Thanks for working on this, @dramaticlly !

api/src/main/java/org/apache/iceberg/catalog/Catalog.java

aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java

core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java

aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java

core/src/main/java/org/apache/iceberg/CachingCatalog.java

core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java

RussellSpitzer · 2025-02-20T15:43:22Z

core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java

@@ -2871,6 +2872,33 @@ public void testRegisterExistingTable() {
    assertThat(catalog.dropTable(identifier)).isTrue();
  }

+  @Test
+  public void testRegisterAndOverwriteExistingTable() {
+    C catalog = catalog();


Are we just adding a bucket to test the change? Why not just use the table UUID? I feel like we should be able to just

Make Table 1
Make Table 2
Register overwrite Table1 with Table2
Check that metadata table1 matches table 2?

Initially I think register with overwrite helps revert an existing table to a new previous health state. If we want to support overwrite with another tables's metadata, It seems better suited with drop + register, to reflect the table UUID change.

From the table spec, it asks Implementations to throw an exception if a table's UUID does not match the expected UUID when refreshing metadata. What do you think?

I added a similar question in the code where it check the UUID doesn't change for register with overwrite. Because API is called registerTable, I would think think UUID change should be allowed. Interesting to hear other people's takes on this one.

Initially I think register with overwrite helps revert an existing table to a new previous health state.

for this initial use case, agree that UUID shouldn't change. But if we want to only solve this specific/narrower problem, maybe a narrower API would make more sense. Enforcing UUID check for this narrow API is totally the right thing to do.

resetTable(TableIdentifier identifier, String metadataFileLocation)

Sharing my thoughts here, so I believe table UUID is the ideal way for consumer to identify the uniqueness of a given table, instead of relying on the table identifier in the given catalog.

Today, the table operation on refresh will check if underlying the UUID has changed, also REST catalog will have requirements on table UUID unchanged for replace and update of the table. If we want to support register overwrite with foreign table, then it secretly break the assumption, implies catalog need to evict the cached table (even with same table identifier) and force to reload.

Personally I think it's probably better to only support same table UUID on register with overwrite (which provides atomicity), and support table UUID change with drop table first and then reregister.

After some thoughts, I changed behavior from reuse existing commit logic which pass the ops.current, to drop the table first and then register with given metadata. This relax the constraints on table UUID check, also ensure that latest metadataFileLocation in TableMetadata after overwrite is the same as user provided.

I also added a comment in interface to highlight the potential table UUID change.

let me unresolved this comment. wondering if it is desirable to have two different tables share the UUID.

Make Table 1 Make Table 2 Register overwrite Table1 with Table2

E.g., the MV spec PR currently defines storage table refresh-state with only UUID as table/view identifier.

Interesting to hear more inputs.

hive-metastore/src/test/java/org/apache/iceberg/hive/HiveTableTest.java

dell/src/test/java/org/apache/iceberg/dell/ecs/TestEcsCatalog.java

RussellSpitzer · 2025-02-20T15:49:31Z

core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java

-    ops.commit(null, metadata);
-
+    TableMetadata currentMetadata = tableExists(identifier) ? ops.current() : null;
+    ops.commit(currentMetadata, TableMetadataParser.read(ops.io(), metadataFile));


I'm a little worried about passing through current metadata here. Is this just a workaround to the normal commit logic?

If the metadata changes from "current" by the time an overwrite request goes through then we don't want a retry or failure, it should still pass?

I was hoping to reuse the existing commit logic for atomicity support, and also better lineage to track previous/old metadata for hive table and JDBC tables.

As for potential conflict when base is out of date, I think that's a valid concern and we probably do not want this operation fail as user intent is replace with provided table metadata. I am thinking about add a retry block to help, please let me know if you feel otherwise

AtomicBoolean isRetry = new AtomicBoolean(false); // commit with retry Tasks.foreach(ops) .retry(COMMIT_NUM_RETRIES_DEFAULT) .exponentialBackoff( COMMIT_MIN_RETRY_WAIT_MS_DEFAULT, COMMIT_MAX_RETRY_WAIT_MS_DEFAULT, COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT, 2.0 /* exponential */) .onlyRetryOn(CommitFailedException.class) .run( taskOps -> { TableMetadata base = isRetry.get() ? taskOps.refresh() : taskOps.current(); isRetry.set(true); taskOps.commit(base, newMetadata); });

Ended up to drop the table first and then commit with current TableMetadata = null so that we do not need to pass current metadata here. This also avoid the needs to use retry here.

aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java

core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java

core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java

core/src/main/java/org/apache/iceberg/rest/requests/RegisterTableRequest.java

core/src/main/java/org/apache/iceberg/rest/requests/RegisterTableRequestParser.java

stevenzwu · 2025-02-25T23:30:18Z

core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java

@@ -2871,6 +2872,33 @@ public void testRegisterExistingTable() {
    assertThat(catalog.dropTable(identifier)).isTrue();
  }

+  @Test
+  public void testRegisterAndOverwriteExistingTable() {
+    C catalog = catalog();


I added a similar question in the code where it check the UUID doesn't change for register with overwrite. Because API is called registerTable, I would think think UUID change should be allowed. Interesting to hear other people's takes on this one.

Initially I think register with overwrite helps revert an existing table to a new previous health state.

for this initial use case, agree that UUID shouldn't change. But if we want to only solve this specific/narrower problem, maybe a narrower API would make more sense. Enforcing UUID check for this narrow API is totally the right thing to do.

resetTable(TableIdentifier identifier, String metadataFileLocation)

hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java

.palantir/revapi.yml

stevenzwu · 2025-02-26T23:49:11Z

core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java

+            "The requested metadata matches the existing metadata. No changes will be committed.");
+        return new BaseTable(ops, fullTableName(name(), identifier), metricsReporter());
+      }
+      dropTable(identifier, false /* Keep all data and metadata files */);


drop the table first and then register with given metadata

what if the job failed btw these two steps? we can end up with table deleted (but new metadata not registered), which is also not ideal.

Thinking about it again. Enforcing UUID match for overwrite seems reasonable.

Additional notes of switching from TableOperations.commit(ops.current, newTableMetadata) to drop and then re-register:

This ensure the same input metadata.json is used to commit and result as latest TableMetadata.current.metadataLocation like in Core: Avoid creating new metadata file when registerTable API is used #6591. Existing doCommit logic

iceberg/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java

Lines 147 to 151 in c16cefa

protected String writeNewMetadataIfRequired(boolean newTable, TableMetadata metadata) {

return newTable && metadata.metadataFileLocation() != null

? metadata.metadataFileLocation()

: writeNewMetadata(metadata, currentVersion() + 1);

}

will only reuse the metadata.json instead of writing a new one if the commit is for creating a table. It would be difficult to differentiate the register-with-force and normal commit in doCommit method without change its interface for all TableOperations

This also relax the constraint on matching the table UUID between existing table and new metadata to be overwritten

Register-with-force in generally shall be user facing retrievable as end state is having the input metadata as the latest state of the table. Where ops.current() are subject to change if running with a race condition and require retry within method

This ensure the same input metadata.json is used to commit
Register-with-force in generally shall be retrievable as end state is having the input metadata as the latest state of the table. Where ops.current() are subject to change if running with a race condition

I understand this may look intuitive for registerTable API. But what overwrite is essentially update the state of an existing table. Hence, the current behavior of writeNewMetadataIfRequired is fine to me. It would create and commit a new metadata file with the same content as the input file.

Let's assume current metadata file is meta-009.json. It is reasonable to me that registerTable("meta-005.json", true) would commit a new file meta-010.json with the same content as meta-005.json.

The real problem might be piggyback overwrite with registerTable. If it is a separate overwriteTable/resetTable API, then it won't be confusing that the commit metadata file is a new file with the same content as the input metadata file.

This also relax the constraint on matching the table UUID between existing table and new metadata to be overwritten

This is where I have a second thought from my earlier stance. overwrite should enforce UUID match check. See my other comment.

I think I have a disagreement here. I think the behavior should be identical to the registerTable without overwrite. The file that is being passed is the file that should be used for registration. For example, if i'm updating an table to match an existing one and I rely on the metadata.json path to tell what files are copied (like in dr) having a different file name could be really make things difficult.

thank you @guykhazma , I don't think we have any clear semantics expectation for register-table with overwrite in REST API to complete atomically. Table specification states that

Table state is maintained in metadata files. All changes to table state create a new metadata file and replace the old metadata with an atomic swap.

IMO, overwrite of table metadata is not a valid state change but rather a state overwrite, where state can even come from another table with a different table UUID. I had some offline discussion with @RussellSpitzer on this where we do agree on this can be catalog implementation specific and open the room for atomic swap if catalog can support this.

As for your proposed alternative approach, i think we can write multiple metadata.json on file system first and rely on catalog for atomic swap, but we might hit the same limitation in TableOperations API, where new metadata.json will be rewritten with a different file name as input and difficult to verify.

thank you @dramaticlly, Could you elaborate on why you see this as a state overwrite?

I can imagine a scenario where some state or partial state is transferred from another table (with a different UUID), which might be interpreted as a state change. However, I’m not sure the register API is the appropriate mechanism for that.

From my understanding, the purpose of the register API is to create a named reference to a metadata JSON. It doesn’t inherently imply any change to the actual state of a table. Even if you register it against an existing table and the resulting metadata reflects a different state, it doesn’t mean that the underlying storage state has changed.

For instance, it's possible to register the same table multiple times under different names using distinct metadata files—effectively simulating branching using different entries in the catalog.

Sorry for the delayed response. Let's say original table with identifier mytable and state A is represented by A.metadata.json and now we are registering with B.metadata.json. There's no guarantee that A and B has anything in common.

Probably because we are looking from different angles, While the underlying storage state may remain unchanged during a register-table operation, the perception of the table can shift significantly. From a data consumer’s standpoint, if the identifier mytable now references a different collection of data due to registration with the overwrite flag, its internal state—including metadata, schema, partitioning, and data—may have changed entirely.

@dramaticlly I see your point. It seems to me the core question here is whether the responsibility for maintaining the lineage of a table identifier lies with the catalog or the table itself. From my perspective, it makes more sense for the catalog to handle this, especially since the overwrite operation doesn't alter the physical state of the table. Ideally, this reference change should be atomic, but the implementation details can be left to individual catalogs.

Maybe it makes sense to move this into an explicit Table operation api. We essentially want something that's like
ops.setMetadata(newMetadata)

Which ignores validations and transactionally swaps. May be cleaner than doing the drop/create we are currently doing. This is essentially what any rest catalog could do and would fix @stevenzwu 's issues with atomicity by letting each catalog implementation decide whether to make it atomic or not.

core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java

core/src/main/java/org/apache/iceberg/rest/requests/RegisterTableRequest.java

stevenzwu · 2025-02-26T23:56:02Z

core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java

@@ -2871,6 +2872,33 @@ public void testRegisterExistingTable() {
    assertThat(catalog.dropTable(identifier)).isTrue();
  }

+  @Test
+  public void testRegisterAndOverwriteExistingTable() {
+    C catalog = catalog();


let me unresolved this comment. wondering if it is desirable to have two different tables share the UUID.

Make Table 1 Make Table 2 Register overwrite Table1 with Table2

E.g., the MV spec PR currently defines storage table refresh-state with only UUID as table/view identifier.

Interesting to hear more inputs.

api/src/main/java/org/apache/iceberg/catalog/Catalog.java

… catalog Update REST RegisterTableRequest model and parser to support overwrite Enforce table UUID requirement Add commit conflict and retry test in TestHiveCommits

Signed-off-by: Hongyue Zhang <[email protected]>

…e constraint on table UUID check between existing and new TableMetadata Signed-off-by: Hongyue Zhang <[email protected]>

Signed-off-by: Hongyue Zhang <[email protected]>

RussellSpitzer · 2025-05-07T16:33:47Z

api/src/main/java/org/apache/iceberg/catalog/Catalog.java

+   *     false.
+   */
+  default Table registerTable(
+      TableIdentifier identifier, String metadataFileLocation, boolean overwrite) {
    throw new UnsupportedOperationException("Registering tables is not supported");


Not sure if this is worth while, but you could decide only to fail if "overwrite" is true"

Thanks @RussellSpitzer , in this PR I introduced a new register-overwrite API on catalog interface, and changed as new base for register-table (where it can call the new API with overwrite=false).

Before:

default register-table API throw UnsupportedOperationException

After:

default register-table API -> register-overwrite(overwrite=true)
default register-overwrite API throw UnsupportedOperationException

The benefits are all concrete catalog implementations can just implement the new API and interface is only used for delegation between the 2 APIs. This is an easier to reason (as all register logic sits in one place) and follows the convention like drop-table and drop-table-purge.

The potential downside is that some custom catalog implementations outside iceberg repo who implements the register-table API, might need to update their code when upgrades iceberg dependency with the interface change. I feel like it's generally justified for customized catalog to keep up with iceberg interface change. Please let me know if you think otherwise.

RussellSpitzer · 2025-05-07T19:26:27Z

core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java

+    // register table t1 with metadata from table t2
+    Table registered = catalog.registerTable(identT1, opsT2.current().metadataFileLocation(), true);
+
+    assertThat(registered.uuid())


We can just check that it matches t2 uuid rather than checking it doesn't match t1.

In fact why aren't we checking if the T2 object matches the "registered" table? Shouldn't they just be completely identical?

yeah I think the assertions below on line 3210 basically is checking for complete identification.

RussellSpitzer · 2025-05-07T19:28:02Z

core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java

+        .isTrue();
+    assertThat(operation(registered).refresh())
+        .usingRecursiveComparison()
+        // Nessie catalog holds different Nessie commit-ID from which the metadata has been loaded.


I don't think we should have nessie specific issues in the core module

yeah I agree, but since CatalogTests is abstract and existing TestNessieCatalog implements this require exclusion on some of the table properties. Do you think I shall add a assumeTrue to exclude Nessies from this test instead?

RussellSpitzer · 2025-05-07T19:29:45Z

core/src/test/java/org/apache/iceberg/hadoop/TestHadoopCatalog.java

+  }
+
+  @Test
+  public void testRegisterAndOverwriteExistingTable() throws IOException {


This probably shouldn't be supported at all for Hadoop Catalog? Won't this only work if you are replacing the Highest-metadata.json with a Higher-metadata.json?

yeah I think this is a good point as Hadoop table operations always look for highest version file in metadata root.

… in HadoopCatalog

github-actions · 2025-06-12T00:17:47Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

dramaticlly · 2025-06-12T04:00:00Z

Not stale, going to send a discussion email on dev-list to understand the feasibility of #13057

github-actions bot added API core hive AWS DELL OPENAPI labels Feb 11, 2025

dramaticlly closed this Feb 12, 2025

dramaticlly reopened this Feb 12, 2025

gaborkaszab reviewed Feb 13, 2025

View reviewed changes

dramaticlly force-pushed the force-register branch from 9f48d2e to 259ed96 Compare February 17, 2025 19:14

github-actions bot removed the OPENAPI label Feb 17, 2025

RussellSpitzer reviewed Feb 20, 2025

View reviewed changes

hive-metastore/src/test/java/org/apache/iceberg/hive/HiveTableTest.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed Feb 20, 2025

View reviewed changes

dell/src/test/java/org/apache/iceberg/dell/ecs/TestEcsCatalog.java Show resolved Hide resolved

RussellSpitzer reviewed Feb 20, 2025

View reviewed changes

dramaticlly force-pushed the force-register branch from f0e6889 to f7aa204 Compare February 25, 2025 02:00

stevenzwu reviewed Feb 25, 2025

View reviewed changes

RussellSpitzer reviewed Feb 26, 2025

View reviewed changes

hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java Outdated Show resolved Hide resolved

RussellSpitzer reviewed Feb 26, 2025

View reviewed changes

hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java Outdated Show resolved Hide resolved

stevenzwu reviewed Feb 27, 2025

View reviewed changes

RussellSpitzer reviewed Mar 11, 2025

View reviewed changes

api/src/main/java/org/apache/iceberg/catalog/Catalog.java Outdated Show resolved Hide resolved

dramaticlly added 5 commits March 28, 2025 15:21

Core,Api: Add overwrite option when register an external table to the…

17a7e5c

… catalog Update REST RegisterTableRequest model and parser to support overwrite Enforce table UUID requirement Add commit conflict and retry test in TestHiveCommits

Fix TestNessieCatalog

f8bd8f9

Signed-off-by: Hongyue Zhang <[email protected]>

Drop the table first before register with an overwrite. Also relax th…

f953a88

…e constraint on table UUID check between existing and new TableMetadata Signed-off-by: Hongyue Zhang <[email protected]>

Switch to primitive boolean for overwrite in RegisterTableRequest

eacf0de

Signed-off-by: Hongyue Zhang <[email protected]>

Rebase and update javadoc

118061b

Signed-off-by: Hongyue Zhang <[email protected]>

dramaticlly force-pushed the force-register branch from 2aeeae4 to 118061b Compare March 28, 2025 23:24

RussellSpitzer reviewed May 7, 2025

View reviewed changes

Throw UnsupportedOperationException when register and overwrite table…

ee5f4db

… in HadoopCatalog

dramaticlly closed this May 12, 2025

dramaticlly reopened this May 12, 2025

github-actions bot added the stale label Jun 12, 2025

github-actions bot removed the stale label Jun 13, 2025

	protected String writeNewMetadataIfRequired(boolean newTable, TableMetadata metadata) {
	return newTable && metadata.metadataFileLocation() != null
	? metadata.metadataFileLocation()
	: writeNewMetadata(metadata, currentVersion() + 1);
	}

Core,Api: Add overwrite option when register external table to catalog #12228

Are you sure you want to change the base?

Core,Api: Add overwrite option when register external table to catalog #12228

Uh oh!

Conversation

dramaticlly commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dramaticlly commented Feb 12, 2025

Uh oh!

gaborkaszab left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dramaticlly Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

RussellSpitzer Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dramaticlly Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevenzwu Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dramaticlly Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dramaticlly commented Feb 11, 2025 •

edited

Loading

stevenzwu Feb 25, 2025 •

edited

Loading

dramaticlly Feb 26, 2025 •

edited

Loading

RussellSpitzer Feb 20, 2025 •

edited

Loading

dramaticlly Feb 21, 2025 •

edited

Loading

stevenzwu Feb 25, 2025 •

edited

Loading

dramaticlly Feb 27, 2025 •

edited

Loading

stevenzwu Feb 27, 2025 •

edited

Loading

stevenzwu Feb 27, 2025 •

edited

Loading

guykhazma Apr 11, 2025 •

edited

Loading

dramaticlly May 7, 2025 •

edited

Loading