Skip to content

Unable to specify columns with a dot in the name in predicate #2624

Open
@emanueledomingo

Description

@emanueledomingo

Environment

Delta-rs version:

How do i find the delta-rs version as a python user?

Binding: 0.18.1

Environment:

  • OS: Ubuntu 22.04 LTS

Bug

What happened: I cannot use a predicate containing a column with a dot in the name, like " \"Product.Id\" = '1' " when writing with rust engine. It's being interpreted as "Product"."Id" instead of "Product.Id".

What you expected to happen: correctly parse the column name with the dot

How to reproduce it:

import deltalake
import pyarrow as pa

ta = pa.Table.from_pydict(
    {
        "Product.Id": ['x-0', 'x-1', 'x-2', 'x-3'],
    }
)

fp = "./resources/path/to/table"

deltalake.write_deltalake(
    table_or_uri=fp,
    data=ta,
    partition_by=["Product.Id"],
    engine="rust",
    mode="overwrite",
    predicate="\"Product.Id\" = 'x-1'"
)

More details:

Here the stacktrace:

DeltaError                                Traceback (most recent call last)
Cell In[89], line 12
      4 ta = pa.Table.from_pydict(
      5     {
      6         "Product.Id": ['x-0', 'x-1', 'x-2', 'x-3'],
      7     }
      8 )
     10 fp = "./resources/path/to/table"
---> 12 deltalake.write_deltalake(
     13     table_or_uri=fp,
     14     data=ta,
     15     partition_by=["Product.Id"],
     16     engine="rust",
     17     mode="overwrite",
     18     predicate="\"Product.Id\" = 'x-1'"
     19 )

File ~/mambaforge/envs/delta/lib/python3.12/site-packages/deltalake/writer.py:304, in write_deltalake(table_or_uri, data, schema, partition_by, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, schema_mode, storage_options, partition_filters, predicate, large_dtypes, engine, writer_properties, custom_metadata)
    301     return
    303 data = RecordBatchReader.from_batches(schema, (batch for batch in data))
--> 304 write_deltalake_rust(
    305     table_uri=table_uri,
    306     data=data,
    307     partition_by=partition_by,
    308     mode=mode,
    309     table=table._table if table is not None else None,
    310     schema_mode=schema_mode,
    311     predicate=predicate,
    312     name=name,
    313     description=description,
    314     configuration=configuration,
    315     storage_options=storage_options,
    316     writer_properties=(
    317         writer_properties._to_dict() if writer_properties else None
    318     ),
    319     custom_metadata=custom_metadata,
    320 )
    321 if table:
    322     table.update_incremental()

DeltaError: Generic DeltaTable error: Schema error: No field named "Product"."Id". Valid fields are "88e03a2f-8d4f-407c-98de-cb67462708d2"."Product.Id".

It seems that the predicate splits the column by the dot and then the sql backend (datafusion i suppose) interpret the first part as table name

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions