You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#14833 / #14837 fixed DPP broadcast duplication in non-AQE mode by adding fixupNonAdaptiveBroadcastReuse — a post-hoc pass that walks the final plan, indexes the main-plan GpuBroadcastExchangeExecs, and rewrites matching DPP-side GpuBroadcastExchangeExecs to ReusedExchangeExec.
@revans2observed on the PR that the root cause is upstream of this fixup: GpuSubqueryBroadcastExec builds its underlying GpuBroadcastExchangeExec directly during GpuOverrides via exMeta.convertToGpu(), which bypasses GpuTransitionOverrides. Because of that bypass, insertCoalesce / optimizeCoalesce add GpuCoalesceBatches to the main-plan broadcast but not to the DPP-side broadcast — exactly the structural divergence that breaks ReuseExchangeAndSubquery's canonical-equality match.
The fixup pass paper-overs the symptom. The cleaner fix is to make the DPP-side broadcast go through the same GpuTransitionOverrides rewrites as any other broadcast, so the two converge to identical shapes at the same point in the pipeline and ReuseExchangeAndSubquery works without any rapids-side post-processing.
Goal
Apply GpuTransitionOverrides (or at least the insertCoalesce / optimizeCoalesce rewrites it depends on) to the broadcast child that GpuSubqueryBroadcastExec.convertToGpu constructs, so that:
The non-AQE DPP-side GpuBroadcastExchangeExec ends up structurally identical to the main-plan broadcast for the same logical CPU exchange.
Spark's stock ReuseExchangeAndSubquery matches them without any rapids-side intervention.
Other transition-order divergences that might surface in the future — not just GpuCoalesceBatches — are handled by the same consistent rewrite path, so we do not accumulate more special-case fixups.
Scope notes
AQE mode already has fixupAdaptiveExchangeReuse; this issue is specifically about the non-AQE path, but the design should ideally remove the need for either fixup pass.
Background
#14833 / #14837 fixed DPP broadcast duplication in non-AQE mode by adding
fixupNonAdaptiveBroadcastReuse— a post-hoc pass that walks the final plan, indexes the main-planGpuBroadcastExchangeExecs, and rewrites matching DPP-sideGpuBroadcastExchangeExecs toReusedExchangeExec.@revans2 observed on the PR that the root cause is upstream of this fixup:
GpuSubqueryBroadcastExecbuilds its underlyingGpuBroadcastExchangeExecdirectly duringGpuOverridesviaexMeta.convertToGpu(), which bypassesGpuTransitionOverrides. Because of that bypass,insertCoalesce/optimizeCoalesceaddGpuCoalesceBatchesto the main-plan broadcast but not to the DPP-side broadcast — exactly the structural divergence that breaksReuseExchangeAndSubquery's canonical-equality match.The fixup pass paper-overs the symptom. The cleaner fix is to make the DPP-side broadcast go through the same
GpuTransitionOverridesrewrites as any other broadcast, so the two converge to identical shapes at the same point in the pipeline andReuseExchangeAndSubqueryworks without any rapids-side post-processing.Goal
Apply
GpuTransitionOverrides(or at least theinsertCoalesce/optimizeCoalescerewrites it depends on) to the broadcast child thatGpuSubqueryBroadcastExec.convertToGpuconstructs, so that:GpuBroadcastExchangeExecends up structurally identical to the main-plan broadcast for the same logical CPU exchange.ReuseExchangeAndSubquerymatches them without any rapids-side intervention.fixupNonAdaptiveBroadcastReuse(added in [BUG] Dedup GpuBroadcastExchange across DPP subqueries in non-AQE mode #14837) becomes redundant and can be removed.GpuCoalesceBatches— are handled by the same consistent rewrite path, so we do not accumulate more special-case fixups.Scope notes
fixupAdaptiveExchangeReuse; this issue is specifically about the non-AQE path, but the design should ideally remove the need for either fixup pass.BroadcastHashJoin+ GPU DPP subquery) is tracked separately by [BUG] DPP broadcast cannot be reused when array/struct build key forces BroadcastHashJoin to CPU fallback #14836 and is out of scope here.fixupNonAdaptiveBroadcastReuseand itsspark.rapids.sql.nonAqeBroadcastReuseFixup.enablekill switch should be retired in the same change.References
GpuTransitionOverrides.fixupAdaptiveExchangeReuse— the analogous AQE-side fixup