-
Couldn't load subscription status.
- Fork 118
feat!: Add optional stats field to remove action #1390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
08fdb5c to
d959169
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1390 +/- ##
==========================================
- Coverage 84.96% 84.96% -0.01%
==========================================
Files 114 114
Lines 29445 29449 +4
Branches 29445 29449 +4
==========================================
+ Hits 25017 25020 +3
Misses 3220 3220
- Partials 1208 1209 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Please also run+fix the examples (inspect-table is already broken from previous schema changes that weren't incorporated, #1372)
d959169 to
05176bf
Compare
05176bf to
7637fb9
Compare
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1281/files) to review incremental changes. - [**stack/with_data_change**](#1281) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1281/files)] - [stack/add_stats_to_remove](#1390) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1390/files/a9159f03f6260388d7bbf9f4fd2ca06f8db21f8f..7637fb90686733d81815f4fae5f9e47880f55a4f)] - [stack/remove_files](#1353) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1353/files/7637fb90686733d81815f4fae5f9e47880f55a4f..94e2108cbea608b98284c634807b4f265cddfa29)] --------- This PR adds a with_data_change method to apply whether a transaction represents a data change to all the file additions (and in the future removals). This helps prevents clients from making accidentally mixing files marked with dataChange = true or false, and is in the long term the direction we want to go in. In the future we will also like want to physically record dataChange at the commit level, and move away from per file denotation. Some implications of this change: * The schema of add_files is now dependent on whether this flag is sent (this is to allow connectors to maintain backwards compatibility). * Change the default engine to no longer pass through dataChange at the file level but instead require using the new setter. Testing: 1. Existing FFI tests verify backwards compatibility. 2. Write tests now call this method with the default engine and no change of output happens. 3. An additional test is added to verify setting the flag to false, writes the appropriate value on add actions. BREAKING CHANGE: Engines must use with_data_change on the transaction level instead of passing it to the method. `add_files_schema` is moved to be scoped on a the transaction.
7637fb9 to
9317082
Compare
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1281/files) to review incremental changes. - [**stack/with_data_change**](delta-io#1281) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1281/files)] - [stack/add_stats_to_remove](delta-io#1390) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1390/files/a9159f03f6260388d7bbf9f4fd2ca06f8db21f8f..7637fb90686733d81815f4fae5f9e47880f55a4f)] - [stack/remove_files](delta-io#1353) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1353/files/7637fb90686733d81815f4fae5f9e47880f55a4f..94e2108cbea608b98284c634807b4f265cddfa29)] --------- This PR adds a with_data_change method to apply whether a transaction represents a data change to all the file additions (and in the future removals). This helps prevents clients from making accidentally mixing files marked with dataChange = true or false, and is in the long term the direction we want to go in. In the future we will also like want to physically record dataChange at the commit level, and move away from per file denotation. Some implications of this change: * The schema of add_files is now dependent on whether this flag is sent (this is to allow connectors to maintain backwards compatibility). * Change the default engine to no longer pass through dataChange at the file level but instead require using the new setter. Testing: 1. Existing FFI tests verify backwards compatibility. 2. Write tests now call this method with the default engine and no change of output happens. 3. An additional test is added to verify setting the flag to false, writes the appropriate value on add actions. BREAKING CHANGE: Engines must use with_data_change on the transaction level instead of passing it to the method. `add_files_schema` is moved to be scoped on a the transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine, but I think there's a small bug
| let base_row_id: Option<i64> = getters[12].get_opt(row_index, "remove.baseRowId")?; | ||
| let default_row_commit_version: Option<i64> = | ||
| getters[13].get_opt(row_index, "remove.defaultRowCommitVersion")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have to bump up too right?
| let base_row_id: Option<i64> = getters[13].get_opt(row_index, "remove.baseRowId")?; | |
| let default_row_commit_version: Option<i64> = | |
| getters[14].get_opt(row_index, "remove.defaultRowCommitVersion")?; |
Probably indicates we have a lack of test coverage for these
🥞 Stacked PR
Use this link to review incremental changes.
An optional stats field is part of the remove schema. This PR adds it here. While utility of stats can be limited it is important to propagate for remove_actions in transactions to appropriately record number of records removed on a remove action. It is likely also important to not drop this field when doing things like log_compaction.
BREAKING CHANGE: Adds a field in the middle of a the remove_file schema.