-
Couldn't load subscription status.
- Fork 118
feat!: Add with_data_change to transaction #1281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: Add with_data_change to transaction #1281
Conversation
|
Need to look into CI failures. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1281 +/- ##
==========================================
- Coverage 84.96% 84.96% -0.01%
==========================================
Files 114 114
Lines 29439 29445 +6
Branches 29439 29445 +6
==========================================
+ Hits 25012 25017 +5
Misses 3220 3220
- Partials 1207 1208 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
553b09b to
14e87e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally good. I'm not sure we need to worry about maintaining backwards compatibility though. Let's just break things :)
ffi/src/transaction/mod.rs
Outdated
| // Explicitly define the schema here which duplicates Transaction::add_files_schema, to demonstrate | ||
| // engines can still specify data change explicitly (i.e. preserve backwards compatibility for | ||
| // non-default engines). | ||
| let mut schema_fields = vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just add a static method on Transaction that can return the right schema depending on if you want data change or not, and call that, since you have both correct schemas statically in txn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to decouple the test from add_schema code in general. I think if we are going to make more modifications to the code maintaining the static would potentially add complexity for test only code. If this is a common pattern in rust, happy to adapt it.
This is also only necessary because we want to test backwards compatibility, if we don't care about it it would be easier to cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after discussing offline, i've removed backwads compatibility and now pass this schema through from the transaction add_files_schema
56720ad to
0b10e6f
Compare
|
@nicklan thanks for the review, the code is now simplified to not be backwards compatible. Is the windows failure a known issue (on the surface seems like it might be unrelated to this PR but could be missing something) |
b8bbfa5 to
48d49e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks for simplifying! one more suggestion on the ffi
ffi/src/transaction/mod.rs
Outdated
| /// | ||
| /// # Safety | ||
| /// | ||
| /// Caller is responsible for passing a valid handle. CONSUMES TRANSACTION and commit info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These apis that consume the argument are inherently more dangerous because there's no compiler on the other side enforcing things like there in pure rust.
Could we just as_ref_mut() it instead, and set the bool on the txn? I think that might require a new method in handle.rs but it should be pretty trivial to write that (and if it's not stop and we can do it this way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take a look at doing this. I think the current underlying function consumes the input, so I might need to adjust there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a a new set_data_change method on Transaction and called it here (renaming this method as well) to avoid consuming the value.
Please let me know if this is what you had in mind. It seems at least at the surface as_mut on handle works here but I guess there might be subtleties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ffi/src/transaction/mod.rs
Outdated
| unsafe { | ||
| txn_with_engine_info | ||
| .shallow_copy() | ||
| .as_ref() | ||
| .add_files_schema() | ||
| } | ||
| .as_ref() | ||
| .try_into_arrow()?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: preference for pulling this out to a separate binding instead of inline deep nesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
kernel/src/engine/default/parquet.rs
Outdated
| /// Note that the schema does not contain the dataChange column. Users of the default engine | ||
| /// must add this column by calling [`crate::transaction::Transaction::with_data_change`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Note that the schema does not contain the dataChange column. Users of the default engine | |
| /// must add this column by calling [`crate::transaction::Transaction::with_data_change`]. | |
| /// Note that the schema does not contain the dataChange column. In order to set `data_change` flag, use [`crate::transaction::Transaction::with_data_change`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| /// Same as [`Transaction::with_data_change`] but set the value directly instead of | ||
| /// using a fluent API. | ||
| pub fn set_data_change(&mut self, data_change: bool) { | ||
| self.data_change = data_change; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm if this is mostly used for FFI how about just make this internal_api so we don't have many ways to do the same thing? (not a super strong preference, just hoping to simplify user API)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
kernel/src/transaction/mod.rs
Outdated
| /// Concretely, it is the expected schema for [`EngineData`] passed to [`add_files`], as it is the base | ||
| /// for constructing an add_file. Each row represents metadata about a | ||
| /// file to be added to the table. Kernel takes this information and extends it to the full add_file | ||
| /// action schema, adding additional fields (e.g., baseRowID) as necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// action schema, adding additional fields (e.g., baseRowID) as necessary. | |
| /// action schema, adding internal fields (e.g., baseRowID) as necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| /// | ||
| /// Note: While currently static, in the future the schema might change depending on | ||
| /// options set on the transaction or features enabled on the table. | ||
| /// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, thank you so much for the docs improvements!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh haha realizing this is code movement :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but still appreciate the other docs above etc!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, lgtm assuming you address zach's comments!
3ed0d98 to
67b8b31
Compare
1b0849e to
c20c377
Compare
6da6011 to
f431f59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, sorry forgot to post some of these comments!
| /// | ||
| /// Note: While currently static, in the future the schema might change depending on | ||
| /// options set on the transaction or features enabled on the table. | ||
| /// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh haha realizing this is code movement :)
| /// | ||
| /// Note: While currently static, in the future the schema might change depending on | ||
| /// options set on the transaction or features enabled on the table. | ||
| /// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but still appreciate the other docs above etc!)
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1281/files) to review incremental changes. - [**stack/with_data_change**](delta-io#1281) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1281/files)] - [stack/add_stats_to_remove](delta-io#1390) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1390/files/a9159f03f6260388d7bbf9f4fd2ca06f8db21f8f..7637fb90686733d81815f4fae5f9e47880f55a4f)] - [stack/remove_files](delta-io#1353) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1353/files/7637fb90686733d81815f4fae5f9e47880f55a4f..94e2108cbea608b98284c634807b4f265cddfa29)] --------- This PR adds a with_data_change method to apply whether a transaction represents a data change to all the file additions (and in the future removals). This helps prevents clients from making accidentally mixing files marked with dataChange = true or false, and is in the long term the direction we want to go in. In the future we will also like want to physically record dataChange at the commit level, and move away from per file denotation. Some implications of this change: * The schema of add_files is now dependent on whether this flag is sent (this is to allow connectors to maintain backwards compatibility). * Change the default engine to no longer pass through dataChange at the file level but instead require using the new setter. Testing: 1. Existing FFI tests verify backwards compatibility. 2. Write tests now call this method with the default engine and no change of output happens. 3. An additional test is added to verify setting the flag to false, writes the appropriate value on add actions. BREAKING CHANGE: Engines must use with_data_change on the transaction level instead of passing it to the method. `add_files_schema` is moved to be scoped on a the transaction.
🥞 Stacked PR
Use this link to review incremental changes.
This PR adds a with_data_change method to apply whether a transaction represents a data change to all the file additions (and in the future removals). This helps prevents clients from making accidentally mixing files marked with dataChange = true or false, and is in the long term the direction we want to go in.
In the future we will also like want to physically record dataChange at the commit level, and move away from per file denotation.
Some implications of this change:
Testing:
BREAKING CHANGE: Engines must use with_data_change on the transaction level instead of passing it to the method.
add_files_schemais moved to be scoped on a the transaction.