-
Notifications
You must be signed in to change notification settings - Fork 71
Description
When a query has two join conditions that transitively imply an equality between columns of the same table, the optimizer should infer the same-table filter and reduce the join to a single key.
Example:
SELECT count(*) FROM t JOIN u ON t.a = u.x AND t.b = u.xThe conditions t.a = u.x and t.b = u.x transitively imply t.a = t.b. The optimal plan would:
- Push down
a = bas a filter ont - Join on a single key (
t.a = u.x)
Current behavior:
The optimizer keeps both conditions as join keys without inferring the same-table equality:
Aggregation
HashJoin[INNER a=x AND b=x]
TableScan[t]
TableScan[u]
Expected behavior:
Aggregation
HashJoin[INNER a=x]
Filter[a = b]
TableScan[t]
TableScan[u]
Filtering t first reduces the number of rows entering the join. Joining on a single key is also cheaper than joining on two keys.
Context:
addImpliedJoins in DerivedTable.cpp already infers transitive equalities across tables (e.g., t.a = u.x AND u.x = v.n implies t.a = v.n). This issue is about extending the inference to same-table equalities and using them as pushed-down filters.