Summary
Add an optimizer rule that pushes an UnnestNode below an adjacent repartitioning ExchangeNode, valid when none of the exchange's partitioning keys is an unnested output variable (or the ordinality variable) — i.e. the exchange partitions only on columns that UNNEST passes through unchanged (its replicate columns). Under that condition UNNEST is partition-preserving on those keys, so Exchange[K](Unnest(x)) ≡ Unnest(Exchange[K](x)) when K ∩ unnestedVariables = ∅ (equivalently K ⊆ replicate(UNNEST)). Choosing between the two placements is then a pure cost decision.
Motivation
When an UNNEST sits just below a repartitioning exchange (e.g. the build side of a downstream join that keys on a replicate column carried through the unnest), the exchange shuffles the post-unnest, fan-out-expanded row set. Whether that's good or bad depends on the shape:
- If the unnested array element is narrow and the source row is wide (the wide payload is consumed by the unnest), the post-unnest rows are more rows but fewer bytes, so shuffling above the unnest (current behavior) is cheaper — the rule should NOT fire.
- If the fan-out is large relative to element width (post-unnest is many more bytes), shuffling the pre-unnest input moves far fewer bytes — the rule should fire.
Today the placement is fixed; there is no rule to move UNNEST across the exchange even when doing so reduces shuffle bytes. This also helps stacked self-join / late-materialization plans keep the repartition on the cheaper side.
Proposal
- New rule (default OFF, session-property gated) matching
Exchange (repartition on K) -> [pass-through Project?] -> Unnest where K contains no unnested output variable and no ordinality variable (K ⊆ UNNEST.replicateVariables).
- Rewrite to
[Project?] -> Unnest -> Exchange (repartition on K) (push the exchange below the unnest), preserving semantics because K is replicate.
- Cost-based trigger: apply only when the estimated pre-unnest shuffle size (rows × width) is smaller than the post-unnest shuffle size — i.e. when fan-out × element-width does not dominate the consumed source width.
Acceptance
- Unit (plan-shape) test: rule fires / does not fire per the "no exchange key is an unnested/ordinality variable" guard and the cost threshold; semantics-preserving.
- e2e: enabled-vs-disabled result equality over
... CROSS JOIN UNNEST(...) ... shapes (TPC-H, e.g. lineitem / orders).
(Follow-up; relates to existing UNNEST pushdown rules such as pushdown_through_unnest.)
Summary
Add an optimizer rule that pushes an
UnnestNodebelow an adjacent repartitioningExchangeNode, valid when none of the exchange's partitioning keys is an unnested output variable (or the ordinality variable) — i.e. the exchange partitions only on columns thatUNNESTpasses through unchanged (its replicate columns). Under that conditionUNNESTis partition-preserving on those keys, soExchange[K](Unnest(x))≡Unnest(Exchange[K](x))whenK ∩ unnestedVariables = ∅(equivalentlyK ⊆ replicate(UNNEST)). Choosing between the two placements is then a pure cost decision.Motivation
When an
UNNESTsits just below a repartitioning exchange (e.g. the build side of a downstream join that keys on a replicate column carried through the unnest), the exchange shuffles the post-unnest, fan-out-expanded row set. Whether that's good or bad depends on the shape:Today the placement is fixed; there is no rule to move
UNNESTacross the exchange even when doing so reduces shuffle bytes. This also helps stacked self-join / late-materialization plans keep the repartition on the cheaper side.Proposal
Exchange (repartition on K) -> [pass-through Project?] -> UnnestwhereKcontains no unnested output variable and no ordinality variable (K ⊆ UNNEST.replicateVariables).[Project?] -> Unnest -> Exchange (repartition on K)(push the exchange below the unnest), preserving semantics becauseKis replicate.Acceptance
... CROSS JOIN UNNEST(...) ...shapes (TPC-H, e.g.lineitem/orders).(Follow-up; relates to existing UNNEST pushdown rules such as
pushdown_through_unnest.)