Skip to content

write_deltalake with schema_mode=None will overwrite nullable properties of columns #3744

@FrankPortman

Description

@FrankPortman

Environment

Delta-rs version: 1.1.4

Binding: Python

Environment:

  • Cloud provider: Azure/Local
  • OS: Mac/Linux
  • Other:

Bug

What happened: Instantiating a Delta table with a specific schema, including nullable=False columns, and later writing to it via write_deltalake with mode="overwrite", schema_mode=None, and predicate=.<some predicate to emulate replaceWhere>, leads to overwriting the nullability properties of the columns.

What you expected to happen: Coercing all column safely to the types that match the existing Delta table, including adhering to existing nullability properties. In the case where all columns and dtypes from the to-be-written data already match the Delta table on disk, the nullability of the to-be-written data shouldn't even be considered. It should only validate data via the Delta protocol based on the existing schema if the existing schema says some columns are not nullable.

How to reproduce it: https://gist.github.com/FrankPortman/4a967f5bcb0bf5e7136eff5087ed6880

More details:

I went back and forth on whether this should be filed as a bug or not, but the behavior IMO is unexpected, and it behaves differently from Spark (setting no schema mode, attempts all safe column coercions to existing types but doesn't override nullability). Also this is in the Rust code /// whether to overwrite the schema or to merge it. None means to fail on schmema drift, and IMO failing on schema drift does not implicitly mean to overwrite column nullability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions