-
Couldn't load subscription status.
- Fork 537
Closed
Closed
Copy link
Description
What happened?
I'm creating a DeltaTable using an explicit schema. Then I'm append-ing records to the table. It's possible that there's an all-null column in the data.
I get an exception:
Traceback (most recent call last):
File "/workspace/tmp/deltalake_write_null.py", line 43, in <module>
deltalake.writer.write_deltalake(
File "/usr/local/lib/python3.12/site-packages/deltalake/writer/writer.py", line 125, in write_deltalake
table._table.write(
_internal.SchemaMismatchError: Invalid data type for Delta Lake: Null
I'm running deltalake==1.1.4.
Expected behavior
With mode=append and schema_mode=merge, I expect the data to be appended.
Operating System
Linux
Binding
Python
Bindings Version
No response
Steps to reproduce
Reproducible example:
import os
import shutil
import deltalake.writer
import pandas as pd
import pyarrow as pa
from deltalake import DeltaTable
table_uri = "./tmp/deltalake_write_null/table"
if os.path.exists(table_uri):
shutil.rmtree(table_uri)
data = pd.DataFrame(
{
"id": [1, 2, 3, 4],
"category": ["A", "A", "B", "B"],
"value": [100, 200, 300, 400],
}
)
data["value"] = None
schema = pa.schema(
[
pa.field("id", pa.int64()),
pa.field("category", pa.string()),
pa.field("value", pa.int64(), nullable=True),
]
)
partition_by = ["category"]
storage_options = {}
DeltaTable.create(
table_uri=table_uri,
schema=schema,
storage_options=storage_options,
partition_by=partition_by,
mode="ignore",
)
deltalake.writer.write_deltalake(
table_uri,
data,
partition_by=partition_by,
mode="append",
schema_mode="merge",
storage_options=storage_options,
)Relevant logs
Metadata
Metadata
Assignees
Labels
No labels