Skip to content

Optimizer: infer same-table filter from transitive join equalities #909

@mbasmanova

Description

@mbasmanova

When a query has two join conditions that transitively imply an equality between columns of the same table, the optimizer should infer the same-table filter and reduce the join to a single key.

Example:

SELECT count(*) FROM t JOIN u ON t.a = u.x AND t.b = u.x

The conditions t.a = u.x and t.b = u.x transitively imply t.a = t.b. The optimal plan would:

  1. Push down a = b as a filter on t
  2. Join on a single key (t.a = u.x)

Current behavior:

The optimizer keeps both conditions as join keys without inferring the same-table equality:

Aggregation
  HashJoin[INNER a=x AND b=x]
    TableScan[t]
    TableScan[u]

Expected behavior:

Aggregation
  HashJoin[INNER a=x]
    Filter[a = b]
      TableScan[t]
    TableScan[u]

Filtering t first reduces the number of rows entering the join. Joining on a single key is also cheaper than joining on two keys.

Context:

addImpliedJoins in DerivedTable.cpp already infers transitive equalities across tables (e.g., t.a = u.x AND u.x = v.n implies t.a = v.n). This issue is about extending the inference to same-table equalities and using them as pushed-down filters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions