-
Couldn't load subscription status.
- Fork 118
feat!(catalog-managed): introduce Committer (with FileSystemCommitter) #1349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1349 +/- ##
========================================
Coverage 84.65% 84.66%
========================================
Files 115 116 +1
Lines 29601 29723 +122
Branches 29601 29723 +122
========================================
+ Hits 25059 25165 +106
- Misses 3331 3343 +12
- Partials 1211 1215 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bd31ba4 to
e73b9da
Compare
e73b9da to
5394cb8
Compare
e3cc553 to
d6a63f4
Compare
d6a63f4 to
7a26d0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good. I just advocate for removing read_snapshot from FileSystemCommitter.
kernel/src/committer.rs
Outdated
| pub(crate) struct FileSystemCommitter { | ||
| read_snapshot: SnapshotRef, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me that a FileSystemCommitter needs a read_snapshot. Feels like this is a check that needs to be done in the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I advocate for:
Transaction.committer: Option<Box>
If !is_catalogManagad => set the filesystem committer
upon commit => If no committer specified, return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea i had toyed with this but ultimately decided it seemed cleaner to just always have a committer - and pushing the check into the committer itself is misuse-resistant (otherwise semantics are "use this method but only if you aren't catalog-managed")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move to taking committer as an arg to Transaction::try_new we can do the check there potentially. Might need one more trait function on committer like is_catalog() or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, one small thing
kernel/src/transaction/mod.rs
Outdated
| Ok(Transaction { | ||
| read_snapshot, | ||
| read_snapshot: read_snapshot.clone(), | ||
| committer: Box::new(FileSystemCommitter::new(read_snapshot)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a sharp edge. I don't think we should default to a file system committer if we just give a snapshot, since that's not the semantics that just having a snapshot imply. Could we instead take the committer as an argument so callers have to be explicit?
Reading lower I see you do catch the case that we passed a cc table here, but I still prefer explicit at compile time over a runtime error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yea fair - I'm mostly attempting to keep the FileSystemCommitter an internal detail though - for non-cc tables everything can just proceed as normal, but for cc tables we check that you did pass in a different committer.
What do you think? Do you think it's better to just expose FileSystemCommitter as pub and then make all callers include a Committer explicitly in the builder? I do agree that perhaps being explicit is nice but it comes at the cost of users having to suddenly grok a "filesystem committer"..
kernel/src/committer.rs
Outdated
| pub(crate) struct FileSystemCommitter { | ||
| read_snapshot: SnapshotRef, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move to taking committer as an arg to Transaction::try_new we can do the check there potentially. Might need one more trait function on committer like is_catalog() or something.
88843ea to
4c8cc9d
Compare
3a482d6 to
f285d6c
Compare
f285d6c to
20dcf84
Compare
What changes are proposed in this pull request?
add new
Committertrait for catalog-managed tables to provide their own catalog-based committer. when not provided we (1) ensure it's not catalog-managed (which requires a committer) and (2) fall back to the "old" way of atomically writing a JSON file, now encapsulated in a newFileSystemCommitternew APIs include:
Committertrait itself: currently just has one requiredcommitmethod which the kernel calls itself (there was previously some discussion on handing off actions to the engine to commit, but this having kernel call commiter.commit alows kernel to orchestrateTransactionstates more effectively)CommitMetadataandCommitResponse- new types for the input/output ofcommit- for now they include the minimal information to carry out a commit (paths, version) and report back either success or conflict at a version.I snuck in a somewhat ancillary change: adding a
LogRoottype which allows us to take a more structured approach to handling log paths and generating staged commit/published commit paths from it. Would like to gather some feedback and perhaps pursue as a follow-up?How was this change tested?
added a UT for new behavior, otherwise existing suffice - the new Committer is effectively a refactor.