-
Couldn't load subscription status.
- Fork 118
refactor: Make get_cdf_transform_expr return Option<ExpressionRef> #1401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Make get_cdf_transform_expr return Option<ExpressionRef> #1401
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1401 +/- ##
=======================================
Coverage 84.65% 84.65%
=======================================
Files 115 115
Lines 29557 29601 +44
Branches 29557 29601 +44
=======================================
+ Hits 25021 25059 +38
- Misses 3329 3331 +2
- Partials 1207 1211 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly lgtm, but I think we should push the optional logic lower if we can.
| partition_values.extend(cdf_values); | ||
|
|
||
| get_transform_expr(transform_spec, partition_values, physical_schema) | ||
| let expr = get_transform_expr(transform_spec, partition_values, physical_schema)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it not make more sense to do the is_identity check inside get_transform_expr and have that return: DeltaResult<Option<ExpressionRef>>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i think it'd look like this:
if transform_spec.empty() {
return Ok(None);
}
// Regular get_transform_expr code
let mut transform = Transform::new_top_level();that way, we avoid any allocation of expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried doing this, but ColumnMapping breaks for normal read paths. We still need the Transform expression, even if it's the identity, for the evaluator to apply the logical schema and map column names in
delta-kernel-rs/kernel/src/scan/state.rs
Line 96 in 2ec1462
| pub fn transform_to_logical( |
I moved the transform expr skipping up so we don't construct the Expression unnecessarily for CDFs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, okay thanks. I guess an "identity" expression does communicate that we need some kind of transform, but it's at the schema level. I'll think a bit if we could express this better, but for this PR this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| partition_values.extend(cdf_values); | ||
|
|
||
| get_transform_expr(transform_spec, partition_values, physical_schema) | ||
| let expr = get_transform_expr(transform_spec, partition_values, physical_schema)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, okay thanks. I guess an "identity" expression does communicate that we need some kind of transform, but it's at the schema level. I'll think a bit if we could express this better, but for this PR this is fine.
What changes are proposed in this pull request?
#1378
Make get_cdf_transform_expr return Option to avoid creating unnecessary identity expressions when no transformation is needed for CDF scans.
How was this change tested?
Existing unit tests