Skip to content

[Bug]: SchemaMismatchError: Invalid data type for Delta Lake: Null with mode=append #3891

@j-bennet

Description

@j-bennet

What happened?

I'm creating a DeltaTable using an explicit schema. Then I'm append-ing records to the table. It's possible that there's an all-null column in the data.

I get an exception:

Traceback (most recent call last):
  File "/workspace/tmp/deltalake_write_null.py", line 43, in <module>
    deltalake.writer.write_deltalake(
  File "/usr/local/lib/python3.12/site-packages/deltalake/writer/writer.py", line 125, in write_deltalake
    table._table.write(
_internal.SchemaMismatchError: Invalid data type for Delta Lake: Null

I'm running deltalake==1.1.4.

Expected behavior

With mode=append and schema_mode=merge, I expect the data to be appended.

Operating System

Linux

Binding

Python

Bindings Version

No response

Steps to reproduce

Reproducible example:

import os
import shutil

import deltalake.writer
import pandas as pd
import pyarrow as pa
from deltalake import DeltaTable


table_uri = "./tmp/deltalake_write_null/table"

if os.path.exists(table_uri):
    shutil.rmtree(table_uri)

data = pd.DataFrame(
    {
        "id": [1, 2, 3, 4],
        "category": ["A", "A", "B", "B"],
        "value": [100, 200, 300, 400],
    }
)
data["value"] = None

schema = pa.schema(
    [
        pa.field("id", pa.int64()),
        pa.field("category", pa.string()),
        pa.field("value", pa.int64(), nullable=True),
    ]
)

partition_by = ["category"]
storage_options = {}

DeltaTable.create(
    table_uri=table_uri,
    schema=schema,
    storage_options=storage_options,
    partition_by=partition_by,
    mode="ignore",
)

deltalake.writer.write_deltalake(
    table_uri,
    data,
    partition_by=partition_by,
    mode="append",
    schema_mode="merge",
    storage_options=storage_options,
)

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions