-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core,Api: Add overwrite option when register external table to catalog #12228
base: main
Are you sure you want to change the base?
Conversation
Java CI Failure is timing out on concurrent fast append and seems unrelated to the change. @rdblue @RussellSpitzer @danielcweeks do you want to take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, @dramaticlly !
aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Outdated
Show resolved
Hide resolved
aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java
Outdated
Show resolved
Hide resolved
9f48d2e
to
259ed96
Compare
@@ -2871,6 +2872,33 @@ public void testRegisterExistingTable() { | |||
assertThat(catalog.dropTable(identifier)).isTrue(); | |||
} | |||
|
|||
@Test | |||
public void testRegisterAndOverwriteExistingTable() { | |||
C catalog = catalog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we just adding a bucket to test the change? Why not just use the table UUID? I feel like we should be able to just
Make Table 1
Make Table 2
Register overwrite Table1 with Table2
Check that metadata table1 matches table 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I think register with overwrite helps revert an existing table to a new previous health state. If we want to support overwrite with another tables's metadata, It seems better suited with drop + register, to reflect the table UUID change.
From the table spec, it asks Implementations to throw an exception if a table's UUID does not match the expected UUID when refreshing metadata. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a similar question in the code where it check the UUID doesn't change for register with overwrite. Because API is called registerTable
, I would think think UUID change should be allowed. Interesting to hear other people's takes on this one.
Initially I think register with overwrite helps revert an existing table to a new previous health state.
for this initial use case, agree that UUID shouldn't change. But if we want to only solve this specific/narrower problem, maybe a narrower API would make more sense. Enforcing UUID check for this narrow API is totally the right thing to do.
resetTable(TableIdentifier identifier, String metadataFileLocation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing my thoughts here, so I believe table UUID is the ideal way for consumer to identify the uniqueness of a given table, instead of relying on the table identifier in the given catalog.
Today, the table operation on refresh will check if underlying the UUID has changed, also REST catalog will have requirements on table UUID unchanged for replace and update of the table. If we want to support register overwrite with foreign table, then it secretly break the assumption, implies catalog need to evict the cached table (even with same table identifier) and force to reload.
Personally I think it's probably better to only support same table UUID on register with overwrite (which provides atomicity), and support table UUID change with drop table first and then reregister.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some thoughts, I changed behavior from reuse existing commit logic which pass the ops.current, to drop the table first and then register with given metadata. This relax the constraints on table UUID check, also ensure that latest metadataFileLocation in TableMetadata after overwrite is the same as user provided.
I also added a comment in interface to highlight the potential table UUID change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me unresolved this comment. wondering if it is desirable to have two different tables share the UUID.
Make Table 1
Make Table 2
Register overwrite Table1 with Table2
E.g., the MV spec PR currently defines storage table refresh-state
with only UUID
as table/view identifier.
Interesting to hear more inputs.
hive-metastore/src/test/java/org/apache/iceberg/hive/HiveTableTest.java
Outdated
Show resolved
Hide resolved
ops.commit(null, metadata); | ||
|
||
TableMetadata currentMetadata = tableExists(identifier) ? ops.current() : null; | ||
ops.commit(currentMetadata, TableMetadataParser.read(ops.io(), metadataFile)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about passing through current metadata here. Is this just a workaround to the normal commit logic?
If the metadata changes from "current" by the time an overwrite request goes through then we don't want a retry or failure, it should still pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to reuse the existing commit logic for atomicity support, and also better lineage to track previous/old metadata for hive table and JDBC tables.
As for potential conflict when base is out of date, I think that's a valid concern and we probably do not want this operation fail as user intent is replace with provided table metadata. I am thinking about add a retry block to help, please let me know if you feel otherwise
AtomicBoolean isRetry = new AtomicBoolean(false);
// commit with retry
Tasks.foreach(ops)
.retry(COMMIT_NUM_RETRIES_DEFAULT)
.exponentialBackoff(
COMMIT_MIN_RETRY_WAIT_MS_DEFAULT,
COMMIT_MAX_RETRY_WAIT_MS_DEFAULT,
COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT,
2.0 /* exponential */)
.onlyRetryOn(CommitFailedException.class)
.run(
taskOps -> {
TableMetadata base = isRetry.get() ? taskOps.refresh() : taskOps.current();
isRetry.set(true);
taskOps.commit(base, newMetadata);
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up to drop the table first and then commit with current TableMetadata = null so that we do not need to pass current metadata here. This also avoid the needs to use retry here.
f0e6889
to
f7aa204
Compare
aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/requests/RegisterTableRequest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/requests/RegisterTableRequestParser.java
Outdated
Show resolved
Hide resolved
@@ -2871,6 +2872,33 @@ public void testRegisterExistingTable() { | |||
assertThat(catalog.dropTable(identifier)).isTrue(); | |||
} | |||
|
|||
@Test | |||
public void testRegisterAndOverwriteExistingTable() { | |||
C catalog = catalog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a similar question in the code where it check the UUID doesn't change for register with overwrite. Because API is called registerTable
, I would think think UUID change should be allowed. Interesting to hear other people's takes on this one.
Initially I think register with overwrite helps revert an existing table to a new previous health state.
for this initial use case, agree that UUID shouldn't change. But if we want to only solve this specific/narrower problem, maybe a narrower API would make more sense. Enforcing UUID check for this narrow API is totally the right thing to do.
resetTable(TableIdentifier identifier, String metadataFileLocation)
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java
Outdated
Show resolved
Hide resolved
"The requested metadata matches the existing metadata. No changes will be committed."); | ||
return new BaseTable(ops, fullTableName(name(), identifier), metricsReporter()); | ||
} | ||
dropTable(identifier, false /* Keep all data and metadata files */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop the table first and then register with given metadata
what if the job failed btw these two steps? we can end up with table deleted (but new metadata not registered), which is also not ideal.
Thinking about it again. Enforcing UUID match for overwrite seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional notes of switching from TableOperations.commit(ops.current, newTableMetadata)
to drop and then re-register:
- This ensure the same input metadata.json is used to commit and result as latest TableMetadata.current.metadataLocation like in Core: Avoid creating new metadata file when
registerTable
API is used #6591. Existing doCommit logiciceberg/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java
Lines 147 to 151 in c16cefa
protected String writeNewMetadataIfRequired(boolean newTable, TableMetadata metadata) { return newTable && metadata.metadataFileLocation() != null ? metadata.metadataFileLocation() : writeNewMetadata(metadata, currentVersion() + 1); } doCommit
method without change its interface for all TableOperations - This also relax the constraint on matching the table UUID between existing table and new metadata to be overwritten
- Register-with-force in generally shall be user facing retrievable as end state is having the input metadata as the latest state of the table. Where ops.current() are subject to change if running with a race condition and require retry within method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensure the same input metadata.json is used to commit
Register-with-force in generally shall be retrievable as end state is having the input metadata as the latest state of the table. Where ops.current() are subject to change if running with a race condition
I understand this may look intuitive for registerTable
API. But what overwrite is essentially update the state of an existing table. Hence, the current behavior of writeNewMetadataIfRequired
is fine to me. It would create and commit a new metadata file with the same content as the input file.
Let's assume current metadata file is meta-009.json
. It is reasonable to me that registerTable("meta-005.json", true)
would commit a new file meta-010.json
with the same content as meta-005.json
.
The real problem might be piggyback overwrite
with registerTable
. If it is a separate overwriteTable/resetTable
API, then it won't be confusing that the commit metadata file is a new file with the same content as the input metadata file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also relax the constraint on matching the table UUID between existing table and new metadata to be overwritten
This is where I have a second thought from my earlier stance. overwrite
should enforce UUID match check. See my other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have a disagreement here. I think the behavior should be identical to the registerTable without overwrite. The file that is being passed is the file that should be used for registration. For example, if i'm updating an table to match an existing one and I rely on the metadata.json path to tell what files are copied (like in dr) having a different file name could be really make things difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree about there being an issue where the job fails during the drop, or concurrent operations happen after the drop. I think things get pretty confusing there.
I wanted to really just swap out the metadata.json path in the catalog with this command and not go through the normal table machinery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Steve changed the implementation to delete then register. It would work around the commit API limitation (write a new metadata json file). But this choice has the potential problem of leaving the table in a bad intermediate state (deleted but no re-registered).
If we really want to implement the overwrite properly, we would need to change/expand the TableOperation
interface to support the atomic swap. The scope is non-trivial.
core/src/main/java/org/apache/iceberg/rest/requests/RegisterTableRequest.java
Outdated
Show resolved
Hide resolved
@@ -2871,6 +2872,33 @@ public void testRegisterExistingTable() { | |||
assertThat(catalog.dropTable(identifier)).isTrue(); | |||
} | |||
|
|||
@Test | |||
public void testRegisterAndOverwriteExistingTable() { | |||
C catalog = catalog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me unresolved this comment. wondering if it is desirable to have two different tables share the UUID.
Make Table 1
Make Table 2
Register overwrite Table1 with Table2
E.g., the MV spec PR currently defines storage table refresh-state
with only UUID
as table/view identifier.
Interesting to hear more inputs.
… catalog Update REST RegisterTableRequest model and parser to support overwrite Enforce table UUID requirement Add commit conflict and retry test in TestHiveCommits
Signed-off-by: Hongyue Zhang <[email protected]>
…e constraint on table UUID check between existing and new TableMetadata Signed-off-by: Hongyue Zhang <[email protected]>
Signed-off-by: Hongyue Zhang <[email protected]>
Signed-off-by: Hongyue Zhang <[email protected]>
2aeeae4
to
118061b
Compare
This PR adds a new register-table with overwrite option on Catalog interface to allow overwrite table metadata of an existing Iceberg table. The overwrite is achieved via
TableOperations.commit(base, new)
for catalogs extends BaseMetastoreCatalog.