-
Couldn't load subscription status.
- Fork 118
feat!: add remove_files API #1353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3512a37 to
11784e3
Compare
45fcb18 to
d59a24e
Compare
52fe8f1 to
b31735d
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1353 +/- ##
==========================================
+ Coverage 84.96% 85.00% +0.04%
==========================================
Files 114 114
Lines 29445 29571 +126
Branches 29445 29571 +126
==========================================
+ Hits 25017 25137 +120
+ Misses 3220 3216 -4
- Partials 1208 1218 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4338d44 to
94e2108
Compare
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1281/files) to review incremental changes. - [**stack/with_data_change**](#1281) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1281/files)] - [stack/add_stats_to_remove](#1390) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1390/files/a9159f03f6260388d7bbf9f4fd2ca06f8db21f8f..7637fb90686733d81815f4fae5f9e47880f55a4f)] - [stack/remove_files](#1353) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1353/files/7637fb90686733d81815f4fae5f9e47880f55a4f..94e2108cbea608b98284c634807b4f265cddfa29)] --------- This PR adds a with_data_change method to apply whether a transaction represents a data change to all the file additions (and in the future removals). This helps prevents clients from making accidentally mixing files marked with dataChange = true or false, and is in the long term the direction we want to go in. In the future we will also like want to physically record dataChange at the commit level, and move away from per file denotation. Some implications of this change: * The schema of add_files is now dependent on whether this flag is sent (this is to allow connectors to maintain backwards compatibility). * Change the default engine to no longer pass through dataChange at the file level but instead require using the new setter. Testing: 1. Existing FFI tests verify backwards compatibility. 2. Write tests now call this method with the default engine and no change of output happens. 3. An additional test is added to verify setting the flag to false, writes the appropriate value on add actions. BREAKING CHANGE: Engines must use with_data_change on the transaction level instead of passing it to the method. `add_files_schema` is moved to be scoped on a the transaction.
94e2108 to
fd28a11
Compare
| .with_inserted_field( | ||
| // extended_file_metadata | ||
| Some("path"), | ||
| Expression::literal(true).into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to figure out if this needs to be adaptive when the fields it is true for are null, or if hard-coded true is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Java kernel also hard-codes this.
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1281/files) to review incremental changes. - [**stack/with_data_change**](delta-io#1281) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1281/files)] - [stack/add_stats_to_remove](delta-io#1390) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1390/files/a9159f03f6260388d7bbf9f4fd2ca06f8db21f8f..7637fb90686733d81815f4fae5f9e47880f55a4f)] - [stack/remove_files](delta-io#1353) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1353/files/7637fb90686733d81815f4fae5f9e47880f55a4f..94e2108cbea608b98284c634807b4f265cddfa29)] --------- This PR adds a with_data_change method to apply whether a transaction represents a data change to all the file additions (and in the future removals). This helps prevents clients from making accidentally mixing files marked with dataChange = true or false, and is in the long term the direction we want to go in. In the future we will also like want to physically record dataChange at the commit level, and move away from per file denotation. Some implications of this change: * The schema of add_files is now dependent on whether this flag is sent (this is to allow connectors to maintain backwards compatibility). * Change the default engine to no longer pass through dataChange at the file level but instead require using the new setter. Testing: 1. Existing FFI tests verify backwards compatibility. 2. Write tests now call this method with the default engine and no change of output happens. 3. An additional test is added to verify setting the flag to false, writes the appropriate value on add actions. BREAKING CHANGE: Engines must use with_data_change on the transaction level instead of passing it to the method. `add_files_schema` is moved to be scoped on a the transaction.
🥞 Stacked PR
Use this link to review incremental changes.
This PR addes the ability to remove files in a Transaction.
To accomplish this we re-use FilteredEngineData returned in the scan plan to remove file. In order to not drop fields some additional metadata is not passed through via "fileLevelConstants" struct, and transformed back. This does not add FFI Bindings yet.
IMPORTANT: One edge case this introduced is the ability to potentially drop stats if a table only has
parsed_statsavailable in checkpoints, as kernel does not currently read them. We should discuss if we are OK to merge given this constraint.Testing: Add unit tests to ensure serialized fields are round-tripped and that scan planning ignores the files.
BREAKING_CHANGE: Schema for scan rows is changed but this should mostly be backwards compatible.