Skip to content

Conversation

@DrakeLin
Copy link
Collaborator

What changes are proposed in this pull request?

#1378
Make get_cdf_transform_expr return Option to avoid creating unnecessary identity expressions when no transformation is needed for CDF scans.

How was this change tested?

Existing unit tests

@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

❌ Patch coverage is 89.09091% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.65%. Comparing base (2ec1462) to head (d8637ac).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/table_changes/physical_to_logical.rs 88.88% 1 Missing and 4 partials ⚠️
kernel/src/table_changes/scan.rs 90.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1401   +/-   ##
=======================================
  Coverage   84.65%   84.65%           
=======================================
  Files         115      115           
  Lines       29557    29601   +44     
  Branches    29557    29601   +44     
=======================================
+ Hits        25021    25059   +38     
- Misses       3329     3331    +2     
- Partials     1207     1211    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm, but I think we should push the optional logic lower if we can.

partition_values.extend(cdf_values);

get_transform_expr(transform_spec, partition_values, physical_schema)
let expr = get_transform_expr(transform_spec, partition_values, physical_schema)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it not make more sense to do the is_identity check inside get_transform_expr and have that return: DeltaResult<Option<ExpressionRef>>?

Copy link
Collaborator

@OussamaSaoudi OussamaSaoudi Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i think it'd look like this:

if transform_spec.empty() {
    return Ok(None);
}
// Regular get_transform_expr code
let mut transform = Transform::new_top_level();

that way, we avoid any allocation of expression.

Copy link
Collaborator Author

@DrakeLin DrakeLin Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried doing this, but ColumnMapping breaks for normal read paths. We still need the Transform expression, even if it's the identity, for the evaluator to apply the logical schema and map column names in

pub fn transform_to_logical(

I moved the transform expr skipping up so we don't construct the Expression unnecessarily for CDFs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, okay thanks. I guess an "identity" expression does communicate that we need some kind of transform, but it's at the schema level. I'll think a bit if we could express this better, but for this PR this is fine.

@DrakeLin DrakeLin requested a review from nicklan October 20, 2025 23:21
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

partition_values.extend(cdf_values);

get_transform_expr(transform_spec, partition_values, physical_schema)
let expr = get_transform_expr(transform_spec, partition_values, physical_schema)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, okay thanks. I guess an "identity" expression does communicate that we need some kind of transform, but it's at the schema level. I'll think a bit if we could express this better, but for this PR this is fine.

@DrakeLin DrakeLin merged commit 40bc46b into delta-io:main Oct 21, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants