-
Couldn't load subscription status.
- Fork 537
Description
Environment
Delta-rs version: 1.1.4
Binding: Python
Environment:
- Cloud provider: Azure/Local
- OS: Mac/Linux
- Other:
Bug
What happened: Instantiating a Delta table with a specific schema, including nullable=False columns, and later writing to it via write_deltalake with mode="overwrite", schema_mode=None, and predicate=.<some predicate to emulate replaceWhere>, leads to overwriting the nullability properties of the columns.
What you expected to happen: Coercing all column safely to the types that match the existing Delta table, including adhering to existing nullability properties. In the case where all columns and dtypes from the to-be-written data already match the Delta table on disk, the nullability of the to-be-written data shouldn't even be considered. It should only validate data via the Delta protocol based on the existing schema if the existing schema says some columns are not nullable.
How to reproduce it: https://gist.github.com/FrankPortman/4a967f5bcb0bf5e7136eff5087ed6880
More details:
I went back and forth on whether this should be filed as a bug or not, but the behavior IMO is unexpected, and it behaves differently from Spark (setting no schema mode, attempts all safe column coercions to existing types but doesn't override nullability). Also this is in the Rust code /// whether to overwrite the schema or to merge it. None means to fail on schmema drift, and IMO failing on schema drift does not implicitly mean to overwrite column nullability.