Open
Description
Environment
Delta-rs version: v0.25.4 (see below for specifics)
Binding: Python, rust engine
Environment:
Local, S3
Bug
What happened:
Since the adoption of datafusion, it appears to struggling with schema merges if the originating table schema contains a list of structs (Pyarrow list for exact verbiage).
What you expected to happen:
Adding a non-list field to a schema with a list of structs field would merge, which worked previously.
How to reproduce it:
On v0.25.4, run the following Python code:
import pyarrow as pa
from deltalake import write_deltalake
# Define the path for the Delta table
delta_table_path = "./datafusion-repro-test-table"
# Define the data for the first write
data_first_write = [
{
"uid": "ws_2",
"event": {
"properties": {
"fields": [
{
"messageId": "veniam sed et elit adipisicing"
}
],
},
}
}
]
schema = pa.schema([
pa.field("uid", pa.string()),
pa.field("event", pa.struct([
pa.field("properties", pa.struct([
pa.field("fields", pa.list_(pa.struct([
pa.field("messageId", pa.string()),
]))),
])),
])),
])
print(schema)
first_write = pa.Table.from_pylist(data_first_write, schema=schema)
# Write data to Delta table for the first write
write_deltalake(delta_table_path, first_write, mode="append", engine="rust", schema_mode="merge")
#### NOW FOR THE SECOND WRITE THAT BREAKS ####
data_second_write = [
{
"uid": "ws_2",
"event": {
"properties": {
"someNewField": "test-value", # New field
"fields": [
{
"messageId": "veniam sed et elit adipisicing"
}
],
},
}
}
]
second_schema = pa.schema([
pa.field("uid", pa.string()),
pa.field("event", pa.struct([
pa.field("properties", pa.struct([
pa.field("someNewField", pa.string()), # New field
pa.field("fields", pa.list_(pa.struct([
pa.field("messageId", pa.string()),
]))),
])),
])),
])
second_write = pa.Table.from_pylist(data_second_write, schema=second_schema)
# Write data to Delta table for the second write
write_deltalake(delta_table_path, second_write, mode="append", engine="rust", schema_mode="merge")
More details:
The above code works as expected on the last version I was using, v0.19.2.