fix(pandas): implement-automatic-conversion-for-pandas-null-types #3695

fvaleye · 2025-08-22T13:29:14Z

Description

This PR fixes type inconsistencies in Pandas DataFrames by implementing automatic conversion of object dtype columns containing null values into their correct schema-defined types (when available).

Previously, object dtype columns with null values could remain untyped, leading to mismatches with schema-defined types and causing validation or serialization errors.

Now, these columns are automatically converted, ensuring consistency between DataFrame values and the schema while reducing type-related issues.

Related Issue(s)

closes Python: Automatically convert Pandas null types to valid Delta Lake types in write_deltalake() #3691

codecov · 2025-08-22T13:31:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.37%. Comparing base (ef9e077) to head (dcb808d).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3695   +/-   ##
=======================================
  Coverage   75.36%   75.37%           
=======================================
  Files         144      144           
  Lines       43607    43607           
  Branches    43607    43607           
=======================================
+ Hits        32866    32868    +2     
+ Misses       9133     9131    -2     
  Partials     1608     1608

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull Request Overview

This PR implements automatic conversion of Pandas DataFrames containing null-typed columns to match their corresponding schema-defined types in Delta Lake. This fixes type inconsistencies that occur when object dtype columns with null values are converted to null types in PyArrow, which can cause validation or serialization errors when appending to existing tables.

Modifies the _convert_arro3_schema_to_delta function to accept an optional existing schema parameter for null type conversion
Updates the writer to pass the existing table schema when available for type conversion
Adds comprehensive test coverage for null column conversion scenarios

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
python/deltalake/writer/writer.py	Passes existing table schema to conversion function for null type handling
python/deltalake/writer/_conversion.py	Implements null type conversion logic with recursion prevention
python/tests/test_conversion.py	Adds test cases for null column conversion and edge cases

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

python/deltalake/writer/_conversion.py

- implement the conversion of object dtype columns (null) when we found the corresponding type existing in the schema - works recursively for nested types Signed-off-by: Florian Valeye <[email protected]>

FrankPortman · 2025-08-24T12:58:44Z

Thanks for jumping on this so quickly!

python/deltalake/writer/_conversion.py

ion-elgreco · 2025-08-24T18:33:14Z

python/deltalake/writer/_conversion.py

+    def dtype_to_delta_dtype(
+        dtype: DataType, field_name: str | None = None
+    ) -> DataType:
+        if DataType.is_null(dtype) and existing_schema and field_name:


I prefer we also check here if is not None, it's s bit more clear. Rest of code lgtm!

Do you mean:
if dtype and DataType.is_null(dtype) and existing_schema and field_name:

No, rather field_,name is not None and schema is not none

I see!
Thanks for your review @ion-elgreco 🙏
Let's merge this after this change!

Signed-off-by: Florian Valeye <[email protected]>

Copilot AI review requested due to automatic review settings August 22, 2025 13:29

fvaleye requested a review from ion-elgreco as a code owner August 22, 2025 13:29

github-actions bot added the binding/python Issues for the Python package label Aug 22, 2025

This comment was marked as outdated.

Sign in to view

fvaleye force-pushed the feature/implement-automatic-conversion-for-pandas-null-types branch 4 times, most recently from 0a75592 to e62e814 Compare August 22, 2025 13:49

fvaleye requested a review from Copilot August 22, 2025 13:50

Copilot AI reviewed Aug 22, 2025

View reviewed changes

python/deltalake/writer/_conversion.py Outdated Show resolved Hide resolved

python/deltalake/writer/_conversion.py Outdated Show resolved Hide resolved

fvaleye force-pushed the feature/implement-automatic-conversion-for-pandas-null-types branch 6 times, most recently from fd8d945 to 20775be Compare August 22, 2025 14:47

fix(pandas): implement-automatic-conversion-for-pandas-null-types

0f3ed7f

- implement the conversion of object dtype columns (null) when we found the corresponding type existing in the schema - works recursively for nested types Signed-off-by: Florian Valeye <[email protected]>

fvaleye force-pushed the feature/implement-automatic-conversion-for-pandas-null-types branch from 20775be to 0f3ed7f Compare August 22, 2025 14:55

ion-elgreco self-assigned this Aug 22, 2025

ion-elgreco reviewed Aug 24, 2025

View reviewed changes

python/deltalake/writer/_conversion.py Outdated Show resolved Hide resolved

ion-elgreco reviewed Aug 24, 2025

View reviewed changes

feat(pandas): use field access using arro3 on type conversion

dcb808d

Signed-off-by: Florian Valeye <[email protected]>

fvaleye force-pushed the feature/implement-automatic-conversion-for-pandas-null-types branch from 6f0c6ca to dcb808d Compare August 24, 2025 18:59

fvaleye enabled auto-merge (rebase) August 24, 2025 19:00

fvaleye merged commit b89d7a4 into delta-io:main Aug 24, 2025
28 of 29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(pandas): implement-automatic-conversion-for-pandas-null-types #3695

fix(pandas): implement-automatic-conversion-for-pandas-null-types #3695

fvaleye commented Aug 22, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

FrankPortman commented Aug 24, 2025

Uh oh!

Uh oh!

ion-elgreco Aug 24, 2025

Uh oh!

fvaleye Aug 24, 2025

Uh oh!

ion-elgreco Aug 24, 2025

Uh oh!

fvaleye Aug 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

fix(pandas): implement-automatic-conversion-for-pandas-null-types #3695

fix(pandas): implement-automatic-conversion-for-pandas-null-types #3695

Conversation

fvaleye commented Aug 22, 2025

Description

Related Issue(s)

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

FrankPortman commented Aug 24, 2025

Uh oh!

Uh oh!

ion-elgreco Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

fvaleye Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

ion-elgreco Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

fvaleye Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Aug 22, 2025 •

edited

Loading