-
Couldn't load subscription status.
- Fork 118
chore!: Modify read_actions to not require callers to know details about checkpoints. #1407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore!: Modify read_actions to not require callers to know details about checkpoints. #1407
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1407 +/- ##
==========================================
+ Coverage 84.66% 84.68% +0.01%
==========================================
Files 116 116
Lines 29723 29762 +39
Branches 29723 29762 +39
==========================================
+ Hits 25165 25203 +38
Misses 3343 3343
- Partials 1215 1216 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly looks good thanks! couple of small things
| assert!(record_batch.column(3).is_null(0)); | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for completeness should probably check that it works if you have ADD_NAME and also something that's not a file action. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in #1406 (should show up here as well).
kernel/src/log_segment.rs
Outdated
| .read_json_files( | ||
| &commits_and_compactions, | ||
| commit_read_schema, | ||
| commit_read_schema.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why clone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be left over from some refactoring that didn't work out. removing it does not seem to cause issues.
kernel/src/log_segment.rs
Outdated
| let checkpoint_read_schema = if !need_file_actions || | ||
| // Don't duplicate the column if it exists | ||
| action_schema.contains(SIDECAR_NAME) || | ||
| // Can't be v2 checkpoint so side case in't needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Can't be v2 checkpoint so side case in't needed | |
| // Can't be v2 checkpoint, so side cars aren't needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rephrased this and the comment above.
|
|
||
| pub(crate) const INTERNAL_DOMAIN_PREFIX: &str = "delta."; | ||
|
|
||
| static LOG_SCHEMA: LazyLock<SchemaRef> = LazyLock::new(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is now the COMMIT_SCHEMA isn't it. How horrible is it to rename it that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now private, do we mean also renaming get_log_schema() to get_commit_schema(), might be somewhat large, if it is OK, I'll do that it in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, even though it's private I'd like the naming to be more clear. But sounds good to take as a follow-up. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I think renaming here or in a follow-up is the last remaining question.
8975f45 to
2e102c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks
|
@emkornfield I reviewed the whole thing (although thanks for breaking it up!) so I'm okay with you just merging this and closing the parent. |
2e102c4 to
37f161d
Compare
37f161d to
ae64fb5
Compare
Rename schema's for more accurately describe schemas. This is follow-up from #1407
🥞 Stacked PR
Use this link to review incremental changes.
This PR proposes refactoring log replay to focus on actions that can occur in commits. It moves the logic for deciding on reading other checkpoint only metadata into
read_actions, to isolate other APIs from understanding the details of what is needed to expand checkpoints. it also makes the common case for reading actions in the code (identical checkpoint schema and commit file schema) the common case onlog_segmentwith a new method for the use-case where schemas are different.