Skip to content

Conversation

@FrankPortman
Copy link
Contributor

@FrankPortman FrankPortman commented Sep 9, 2025

Description

write_deltalake with mode="overwrite" and schema_mode=None was overwriting the nullability constraints of various columns in the schema, instead of accepting the existing schema fully (or failing on drift, if there was actually schema drift besides column constraints).

Upon reflection of the merge operation, it turned out that worked as intended, but this PR still introduces a test to ensure the schema behavior is consistent between the two.

Related Issue(s)

Closes #3744

Documentation

N/A

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Sep 9, 2025
@github-actions
Copy link

github-actions bot commented Sep 9, 2025

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@FrankPortman FrankPortman changed the title fix schema fix: write_deltalake with mode="overwrite" mode and schema_mode=None does not overwrite schema metadata Sep 9, 2025
@FrankPortman FrankPortman force-pushed the frank/null_overwrite_v2 branch from 50a7e2d to 3dc1ad5 Compare September 9, 2025 14:42
@FrankPortman
Copy link
Contributor Author

Do we feel covered by existing tests for the following cases:

  • schema_mode=None and schema drift detected ---> should fail with informative error
  • schema_mode=None and mode="append" ---> should enforce constraints that exist, not simply accept them
  • schema_mode=None and mode="overwrite" ---> should enforce constraints that exist, not simply accept them
    • This test case I am worried about, since this PR fixes a bug with this codepath to begin with, but perhaps the fix also hits the constraint validation test cases after the fact?

@codecov
Copy link

codecov bot commented Sep 9, 2025

Codecov Report

❌ Patch coverage is 91.18236% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.55%. Comparing base (d148c32) to head (a982489).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/operations/write/mod.rs 90.47% 0 Missing and 38 partials ⚠️
crates/core/src/operations/merge/mod.rs 94.00% 0 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3747      +/-   ##
==========================================
+ Coverage   75.37%   75.55%   +0.17%     
==========================================
  Files         145      145              
  Lines       43946    44424     +478     
  Branches    43946    44424     +478     
==========================================
+ Hits        33126    33565     +439     
+ Misses       9217     9215       -2     
- Partials     1603     1644      +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Sep 9, 2025

Do we feel covered by existing tests for the following cases:

  • schema_mode=None and schema drift detected ---> should fail with informative error

This should be already catched by existing tests I believe, and also raise at the beginning of schema checking

  • schema_mode=None and mode="append" ---> should enforce constraints that exist, not simply accept them

This should be already catched by existing tests I believe, and also raise at the beginning of schema checking

  • schema_mode=None and mode="overwrite" ---> should enforce constraints that exist, not simply accept them
    • This test case I am worried about, since this PR fixes a bug with this codepath to begin with, but perhaps the fix also hits the constraint validation test cases after the fact?

I guess it depends on the type of change if we go from not nullable to nullable with schema mode = None, then it's incorrect. If our data however gets casted from nullable to not nullable, but the table_schema remains the same it's logically correct, but the data should ideally not be changed because it affects the parquet schema

@FrankPortman
Copy link
Contributor Author

Can you expand more on what that means and any implications for this PR? I can't tell if what you said is just regarding testing or regarding the behavior this PR should change.

@ion-elgreco
Copy link
Collaborator

Can you expand more on what that means and any implications for this PR? I can't tell if what you said is just regarding testing or regarding the behavior this PR should change.

The first two points is rather, that it shouldn't require changing existing tests since that's already covered, the last remark I feel we need to add a test for to be sure

@FrankPortman
Copy link
Contributor Author

Added 2 test cases - please take a look.

Signed-off-by: Frank Portman <[email protected]>
Signed-off-by: Frank Portman <[email protected]>
Signed-off-by: Frank Portman <[email protected]>
@FrankPortman FrankPortman force-pushed the frank/null_overwrite_v2 branch from 8380a5f to a982489 Compare September 9, 2025 17:18
@FrankPortman
Copy link
Contributor Author

Also added a more specific test for merge even though that worked as intended already.

@rtyler rtyler merged commit 538a90c into delta-io:main Sep 10, 2025
29 checks passed
@FrankPortman
Copy link
Contributor Author

@rtyler you may want to squash my commits and give it the message from the PR title. That was my assumption as to what would happen when I gave relatively uninformative intra-PR commits, but it seems like they all landed as is 😄 .

Thanks for taking a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/rust Issues for the Rust crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

write_deltalake with schema_mode=None will overwrite nullable properties of columns

3 participants