Skip to content

Conversation

@DrakeLin
Copy link
Collaborator

@DrakeLin DrakeLin commented Oct 1, 2025

What changes are proposed in this pull request?

Consolidates regular scan and Change Data Feed (CDF) scan field handling by introducing a TransformFieldClassifier trait pattern.
Removed all_fields and ColumnType.

How was this change tested?

Existing unit tests

@DrakeLin DrakeLin changed the title vibe central [Do Not Review] vibe central Oct 1, 2025
@DrakeLin DrakeLin marked this pull request as draft October 1, 2025 05:52
@codecov
Copy link

codecov bot commented Oct 1, 2025

Codecov Report

❌ Patch coverage is 95.45455% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.90%. Comparing base (f431de0) to head (095156b).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/table_changes/scan.rs 91.66% 2 Missing and 3 partials ⚠️
kernel/src/table_changes/physical_to_logical.rs 93.54% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1359   +/-   ##
=======================================
  Coverage   84.89%   84.90%           
=======================================
  Files         113      114    +1     
  Lines       28966    28935   -31     
  Branches    28966    28935   -31     
=======================================
- Hits        24592    24566   -26     
  Misses       3200     3200           
+ Partials     1174     1169    -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Oct 1, 2025
@DrakeLin DrakeLin force-pushed the drake-lin_data/stack/cdf-state-info branch 3 times, most recently from 6bb2d3f to 15e632a Compare October 1, 2025 22:05
@DrakeLin DrakeLin changed the title [Do Not Review] vibe central refactor: Consolidate regular scan and CDF scan field handling Oct 2, 2025
@DrakeLin DrakeLin force-pushed the drake-lin_data/stack/cdf-state-info branch 2 times, most recently from 9127339 to e49f2f3 Compare October 3, 2025 00:38
@DrakeLin DrakeLin marked this pull request as ready for review October 3, 2025 00:38
@DrakeLin DrakeLin force-pushed the drake-lin_data/stack/cdf-state-info branch 3 times, most recently from f1ea04a to 1908575 Compare October 3, 2025 21:49
Copy link
Collaborator

@OussamaSaoudi OussamaSaoudi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly good, just some style stuff 👍

@DrakeLin DrakeLin requested a review from OussamaSaoudi October 6, 2025 18:53
@DrakeLin DrakeLin force-pushed the drake-lin_data/stack/cdf-state-info branch from 5d9d80f to 5d64781 Compare October 6, 2025 22:01
Copy link
Collaborator

@OussamaSaoudi OussamaSaoudi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just some small testing nits

Comment on lines 80 to 84
pub(crate) fn get_cdf_transform_expr(
scan_file: &CdfScanFile,
logical_schema: &SchemaRef,
transform_spec: &TransformSpec,
state_info: &StateInfo,
physical_schema: &StructType,
) -> DeltaResult<ExpressionRef> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably avoid creating a new expression if it's going to end up being the identity expression anyway. Could you add an issue to make this return Option<ExpressionRef>?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we also get to avoid creating the empty_spec needlessly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks great. i had one question


/// Regular scan field classifier for standard Delta table scans.
/// Handles partition columns as metadata-derived fields.
pub(crate) struct DefaultTransformFieldClassifier;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a ScanTransformFieldClassifier? It's used in scan, not be default right?

for (index, logical_field) in logical_schema.fields().enumerate() {
if partition_columns.contains(logical_field.name()) {
if logical_field.is_metadata_column() {
let transform = classifier.classify_field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some fields that need a transform also require modifying read_fields as well. For example, if the user requested row ids we need to tell the parquet reader to read both a row-index column and the physical row-id column if it's present. I think we can do that by changing the classify_field call to either return something that includes other fields to read, or by passing in &mut read_fields but just want to make sure that makes sense to you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. Would you be able to incorporate that in your row ids PR?

@DrakeLin DrakeLin force-pushed the drake-lin_data/stack/cdf-state-info branch from 0913eb5 to 1dbb566 Compare October 9, 2025 01:31
@DrakeLin DrakeLin requested a review from nicklan October 9, 2025 01:32
@DrakeLin DrakeLin force-pushed the drake-lin_data/stack/cdf-state-info branch from 1dbb566 to 3df2019 Compare October 9, 2025 17:44
@DrakeLin DrakeLin force-pushed the drake-lin_data/stack/cdf-state-info branch from 3df2019 to 095156b Compare October 9, 2025 17:47
@DrakeLin DrakeLin merged commit 94a15e0 into delta-io:main Oct 9, 2025
21 checks passed
@DrakeLin DrakeLin removed the breaking-change Change that require a major version bump label Oct 9, 2025
samansmink pushed a commit to samansmink/delta-kernel-rs that referenced this pull request Oct 19, 2025
…-io#1359)

## What changes are proposed in this pull request?

Consolidates regular scan and Change Data Feed (CDF) scan field handling
by introducing a TransformFieldClassifier trait pattern.
Removed `all_fields` and `ColumnType`.

## How was this change tested?
Existing unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants