-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[Kernel] Add support for ICT-based time travel (part 1) #4581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@scovich I have copied over your comments from https://github.com/delta-io/delta/pull/4483/files#diff-07b47ed7d50294a001f579299ebd0e7d2328a991babcf6ba23af3c9cacdf8cdcR182 to this PR. This PR just focuses on the binary search part. I will add the exponential search incrementally to make the reviews simpler. |
if (ictEnablementCommit.getTimestamp() <= timestamp) { | ||
// The target commit is in the ICT range. | ||
long latestSnapshotTimestamp = latestSnapshot.getTimestamp(engine); | ||
if (latestSnapshotTimestamp <= timestamp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment that explains this: https://github.com/delta-io/delta/pull/4581/files#diff-07b47ed7d50294a001f579299ebd0e7d2328a991babcf6ba23af3c9cacdf8cdcR91
searchResult = | ||
lastCommitBeforeOrAtTimestamp(commits, timestamp) | ||
.orElse( | ||
commits.get(0)); // This is only returned if canReturnEarliestCommit (see below) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to the formatter. I can't find a way to undo this.
* @return An optional which contains a tuple containing the index and the value of the greatest | ||
* lower bound when found, or an empty optional if not found. | ||
*/ | ||
public static Optional<Tuple2<Long, Long>> greatestLowerBound( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scovich I have updated the implementation so that it is more explicit about when None is returned. I have also added explicit test cases for greatestLowerBound
a347fa1
to
c12d9a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks
+ "based lookup. This can happen when the commit log is corrupted or when " | ||
+ "there is a parallel operation like metadata cleanup that is deleting " | ||
+ "commits. Please retry the query.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe skip the corruption mention? Metadata cleanup is by far the more likely scenario, because Delta clients don't honor the configured data retention window when resolving time travel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, updated.
if (enablementTimestampOpt.isPresent() && enablementVersionOpt.isPresent()) { | ||
return new Commit(enablementVersionOpt.get(), enablementTimestampOpt.get()); | ||
} else if (!enablementTimestampOpt.isPresent() && !enablementVersionOpt.isPresent()) { | ||
// This means that ICT has been enabled for the entire history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you help me to understand why this is the case? If my table doesn't support ICT --> then won't these fields be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we have already established that ICT is enabled on the table (L209). When ICT is enabled, this fields being empty indicates that ICT has been enabled for all of available history.
* Get the version of the checkpoint, checksum or delta file. Returns an empty optional if the | ||
* file is not a checkpoint, checksum or delta file. | ||
*/ | ||
public static Optional<Long> getFileVersionOpt(Path path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change? I don't see any usages of getFileVersionOpt
--> maybe I'm just missing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, this is not need for part 1. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great Thanks!
e45ebb5
to
3c5fb63
Compare
Which Delta project/connector is this regarding?
Description
Adds initial support for performing ICT-based time travel. More specifically, this PR:
How was this patch tested?
Added test cases to DeltaHistoryManagerSuite
Does this PR introduce any user-facing changes?
No
Resolves: #4294