-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[Kernel] Add support for ICT-based time travel (part 1) #4581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@scovich I have copied over your comments from https://github.com/delta-io/delta/pull/4483/files#diff-07b47ed7d50294a001f579299ebd0e7d2328a991babcf6ba23af3c9cacdf8cdcR182 to this PR. This PR just focuses on the binary search part. I will add the exponential search incrementally to make the reviews simpler. |
if (ictEnablementCommit.getTimestamp() <= timestamp) { | ||
// The target commit is in the ICT range. | ||
long latestSnapshotTimestamp = latestSnapshot.getTimestamp(engine); | ||
if (latestSnapshotTimestamp <= timestamp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment that explains this: https://github.com/delta-io/delta/pull/4581/files#diff-07b47ed7d50294a001f579299ebd0e7d2328a991babcf6ba23af3c9cacdf8cdcR91
searchResult = | ||
lastCommitBeforeOrAtTimestamp(commits, timestamp) | ||
.orElse( | ||
commits.get(0)); // This is only returned if canReturnEarliestCommit (see below) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to the formatter. I can't find a way to undo this.
* @return An optional which contains a tuple containing the index and the value of the greatest | ||
* lower bound when found, or an empty optional if not found. | ||
*/ | ||
public static Optional<Tuple2<Long, Long>> greatestLowerBound( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scovich I have updated the implementation so that it is more explicit about when None is returned. I have also added explicit test cases for greatestLowerBound
a347fa1
to
c12d9a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks
+ "based lookup. This can happen when the commit log is corrupted or when " | ||
+ "there is a parallel operation like metadata cleanup that is deleting " | ||
+ "commits. Please retry the query.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe skip the corruption mention? Metadata cleanup is by far the more likely scenario, because Delta clients don't honor the configured data retention window when resolving time travel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, updated.
if (enablementTimestampOpt.isPresent() && enablementVersionOpt.isPresent()) { | ||
return new Commit(enablementVersionOpt.get(), enablementTimestampOpt.get()); | ||
} else if (!enablementTimestampOpt.isPresent() && !enablementVersionOpt.isPresent()) { | ||
// This means that ICT has been enabled for the entire history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you help me to understand why this is the case? If my table doesn't support ICT --> then won't these fields be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we have already established that ICT is enabled on the table (L209). When ICT is enabled, this fields being empty indicates that ICT has been enabled for all of available history.
* Get the version of the checkpoint, checksum or delta file. Returns an empty optional if the | ||
* file is not a checkpoint, checksum or delta file. | ||
*/ | ||
public static Optional<Long> getFileVersionOpt(Path path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change? I don't see any usages of getFileVersionOpt
--> maybe I'm just missing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, this is not need for part 1. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great Thanks!
e45ebb5
to
3c5fb63
Compare
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> Adds initial support for performing ICT-based time travel. More specifically, this PR: 1. Updates all code paths that rely on timestamp-based time travel such that they all create a snapshot of the latest table version first. This allows us to determine whether ICT is enabled. 2. Adds an initial binary-search-based greatest lower bound implementation for time traveling based on ICT. A follow PR will make this more efficient by intelligently utilizing file modification timestamps for narrowing the search window. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Added test cases to DeltaHistoryManagerSuite ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> No Resolves: delta-io#4294
Which Delta project/connector is this regarding?
Description
Adds initial support for performing ICT-based time travel. More specifically, this PR:
How was this patch tested?
Added test cases to DeltaHistoryManagerSuite
Does this PR introduce any user-facing changes?
No
Resolves: #4294