Open
Description
Environment
Delta-rs version:
How do i find the delta-rs version as a python user?
Binding: 0.18.1
Environment:
- OS: Ubuntu 22.04 LTS
Bug
What happened: I cannot use a predicate containing a column with a dot in the name, like " \"Product.Id\" = '1' "
when writing with rust engine. It's being interpreted as "Product"."Id"
instead of "Product.Id"
.
What you expected to happen: correctly parse the column name with the dot
How to reproduce it:
import deltalake
import pyarrow as pa
ta = pa.Table.from_pydict(
{
"Product.Id": ['x-0', 'x-1', 'x-2', 'x-3'],
}
)
fp = "./resources/path/to/table"
deltalake.write_deltalake(
table_or_uri=fp,
data=ta,
partition_by=["Product.Id"],
engine="rust",
mode="overwrite",
predicate="\"Product.Id\" = 'x-1'"
)
More details:
Here the stacktrace:
DeltaError Traceback (most recent call last)
Cell In[89], line 12
4 ta = pa.Table.from_pydict(
5 {
6 "Product.Id": ['x-0', 'x-1', 'x-2', 'x-3'],
7 }
8 )
10 fp = "./resources/path/to/table"
---> 12 deltalake.write_deltalake(
13 table_or_uri=fp,
14 data=ta,
15 partition_by=["Product.Id"],
16 engine="rust",
17 mode="overwrite",
18 predicate="\"Product.Id\" = 'x-1'"
19 )
File ~/mambaforge/envs/delta/lib/python3.12/site-packages/deltalake/writer.py:304, in write_deltalake(table_or_uri, data, schema, partition_by, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, schema_mode, storage_options, partition_filters, predicate, large_dtypes, engine, writer_properties, custom_metadata)
301 return
303 data = RecordBatchReader.from_batches(schema, (batch for batch in data))
--> 304 write_deltalake_rust(
305 table_uri=table_uri,
306 data=data,
307 partition_by=partition_by,
308 mode=mode,
309 table=table._table if table is not None else None,
310 schema_mode=schema_mode,
311 predicate=predicate,
312 name=name,
313 description=description,
314 configuration=configuration,
315 storage_options=storage_options,
316 writer_properties=(
317 writer_properties._to_dict() if writer_properties else None
318 ),
319 custom_metadata=custom_metadata,
320 )
321 if table:
322 table.update_incremental()
DeltaError: Generic DeltaTable error: Schema error: No field named "Product"."Id". Valid fields are "88e03a2f-8d4f-407c-98de-cb67462708d2"."Product.Id".
It seems that the predicate splits the column by the dot and then the sql backend (datafusion i suppose) interpret the first part as table name