-
Couldn't load subscription status.
- Fork 537
fix(pandas): implement-automatic-conversion-for-pandas-null-types #3695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pandas): implement-automatic-conversion-for-pandas-null-types #3695
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3695 +/- ##
=======================================
Coverage 75.36% 75.37%
=======================================
Files 144 144
Lines 43607 43607
Branches 43607 43607
=======================================
+ Hits 32866 32868 +2
+ Misses 9133 9131 -2
Partials 1608 1608 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0a75592 to
e62e814
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements automatic conversion of Pandas DataFrames containing null-typed columns to match their corresponding schema-defined types in Delta Lake. This fixes type inconsistencies that occur when object dtype columns with null values are converted to null types in PyArrow, which can cause validation or serialization errors when appending to existing tables.
- Modifies the
_convert_arro3_schema_to_deltafunction to accept an optional existing schema parameter for null type conversion - Updates the writer to pass the existing table schema when available for type conversion
- Adds comprehensive test coverage for null column conversion scenarios
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| python/deltalake/writer/writer.py | Passes existing table schema to conversion function for null type handling |
| python/deltalake/writer/_conversion.py | Implements null type conversion logic with recursion prevention |
| python/tests/test_conversion.py | Adds test cases for null column conversion and edge cases |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
fd8d945 to
20775be
Compare
- implement the conversion of object dtype columns (null) when we found the corresponding type existing in the schema - works recursively for nested types Signed-off-by: Florian Valeye <[email protected]>
20775be to
0f3ed7f
Compare
|
Thanks for jumping on this so quickly! |
| def dtype_to_delta_dtype( | ||
| dtype: DataType, field_name: str | None = None | ||
| ) -> DataType: | ||
| if DataType.is_null(dtype) and existing_schema and field_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer we also check here if is not None, it's s bit more clear. Rest of code lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean:
if dtype and DataType.is_null(dtype) and existing_schema and field_name:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, rather field_,name is not None and schema is not none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see!
Thanks for your review @ion-elgreco 🙏
Let's merge this after this change!
Signed-off-by: Florian Valeye <[email protected]>
6f0c6ca to
dcb808d
Compare
Description
This PR fixes type inconsistencies in Pandas DataFrames by implementing automatic conversion of
objectdtype columns containing null values into their correct schema-defined types (when available).Previously,
objectdtype columns withnullvalues could remain untyped, leading to mismatches with schema-defined types and causing validation or serialization errors.Now, these columns are automatically converted, ensuring consistency between DataFrame values and the schema while reducing type-related issues.
Related Issue(s)