Should a Delta table’s TableId ever change? What are the reader/writer semantics if it does?
#5311
-
|
Given a Delta table whose version 0 defines a tableId (e.g., guid1), what is the correct interpretation if, at a later version n, the log contains a metadata action referencing a different tableId (e.g., guid2)? Should the table be considered the same logical table, with the tableId change ignored? Or does a new tableId imply a new logical table lineage, meaning anything before the change should no longer be considered part of the same history? In other words, when replaying the Delta log, does a tableId change:
What is the expected or specified resolution rule for clients and readers when such a tableId change is observed during log reconstruction? Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
|
Adding @larsk-db who might be able to clarify this, or redirect us to the right person. |
Beta Was this translation helpful? Give feedback.
-
|
The tableId only changes in response to variants of (CREATE OR) REPLACE table. So it should be considered a boundary for downstream consumers of the table to signal relatively incompatible changes (e.g. a completely different schema). This matters, for example, for trying to get change data feed across this boundary. However, for the purposes of log replay it can conceivably be ignored if that makes the implementation easier, since the newer metadata will simply override the older one in any correct replay implementation. Mildly related: |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @larsk-db for your response. To clarify - when a new tableId is created (for example, via a (CREATE OR) REPLACE TABLE operation), can we expect the transaction log to fully capture all related data file changes? Specifically, will it remove all files associated with the previous tableId? Is there any scenario where the new tableId would not include actions and instead preserve the existing data files? In other words, should we assume that the transaction log after the tableId change fully represents the current state of the table, or is it necessary to reference the log entries from before the change as well? Additionally — if we ignored the tableId change entirely, would it be safe to resolve the state of the Delta table solely based on the other actions (e.g., |
Beta Was this translation helpful? Give feedback.
-
Yes
Mostly. An exception might be the files exist both before an after: Think of a CLONE for example, that may contain files that originate from the target table (e.g. back and forth CLONE). I think it might be legal to retain the files, though I'm not quite sure if that's what we do at the moment, or we just rename them and pretend they are new.
I thought so in my previous answer, but now after thinking of the CLONE case, I'm not so certain anymore. That might need to be investigated.
This should always be safe. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @larsk-db for your response. :) |
Beta Was this translation helpful? Give feedback.
Yes
Mostly. An exception might be the files exist both before an after: Think of a CLONE for example, that may contain files that originate from the target table (e.g. back and forth CLONE). I think it might be legal to retain the files, though I'm not quite sure if that's what we do at the moment, or we just rename them and pretend they are new.