Skip to content

Filter expressions not being applied #3032

@jpereiranexar

Description

@jpereiranexar

Environment

Delta-rs version: 0.22 (Tested multiple versions up to latest)
PyArrow version: 18.0.0

  • OS: Mac 14.4 (23E214)
  • Delta table on S3

Bug

What happened:

When applying a filter expression (lighting == "day") using pyarrow.dataset, no results are returned. However, if I do not apply the filter at this stage and instead filter the resulting pandas DataFrame (results[results["lighting"] == "day"]), I find that rows are filtered out, confirming that data matching the condition exists in the dataset.

What you expected to happen:

The filter method should correctly return rows where lighting == "day" when applied directly on the pyarrow.dataset.

How to reproduce it:

Given a delta table as such

CREATE TABLE hive_metastore.dwh.table_name (
  key STRING,
  ...
  lighting STRING,
  ...
  h3_id_res9 BIGINT)
USING delta
PARTITIONED BY (h3_id_res9)
LOCATION 'dbfs:s3_path'
TBLPROPERTIES (
  'delta.minReaderVersion' = '1',
  'delta.minWriterVersion' = '2')
# Python code

delta_table = get_delta_table(table_path, dynamo_table_name)
partitions = [("h3_id_res9", "in", str(608716487191953407))]
condition = pc.equal(ds.field("lighting"), "day")

# Apply filter directly on pyarrow dataset
results = (
    delta_table.to_pyarrow_dataset(partitions=partitions)
    .filter(expression=condition)
    .to_table()
    .to_pandas()
)

# Results are empty
assert results.empty, "Expected non-empty results, but got none."

# Remove filter and filter using pandas
results = (
    dt.to_pyarrow_dataset(partitions=partitions)
    .to_table()
    .to_pandas()
)
results_filtered = results[results["lighting"] == "day"].reset_index(drop=True)

# Results are non-empty and rows were filtered as expected
assert not results_filtered.empty, "Expected non-empty results, but got none after pandas filtering."

More Details:

  • In other tables, I am able to filter the data, so I don't think it's tied to data type

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmre-neededWhether an MRE needs to be provided

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions