[FEA] Apply GpuTransitionOverrides to GpuSubqueryBroadcast's broadcast child consistently in non-AQE mode

## Background

#14833 / #14837 fixed DPP broadcast duplication in non-AQE mode by adding `fixupNonAdaptiveBroadcastReuse` — a post-hoc pass that walks the final plan, indexes the main-plan `GpuBroadcastExchangeExec`s, and rewrites matching DPP-side `GpuBroadcastExchangeExec`s to `ReusedExchangeExec`.

@revans2 [observed on the PR](https://github.com/NVIDIA/spark-rapids/pull/14837#pullrequestreview-3320434181) that the *root* cause is upstream of this fixup: `GpuSubqueryBroadcastExec` builds its underlying `GpuBroadcastExchangeExec` directly during `GpuOverrides` via `exMeta.convertToGpu()`, which bypasses `GpuTransitionOverrides`. Because of that bypass, `insertCoalesce` / `optimizeCoalesce` add `GpuCoalesceBatches` to the main-plan broadcast but not to the DPP-side broadcast — exactly the structural divergence that breaks `ReuseExchangeAndSubquery`'s canonical-equality match.

The fixup pass paper-overs the symptom. The cleaner fix is to make the DPP-side broadcast go through the same `GpuTransitionOverrides` rewrites as any other broadcast, so the two converge to identical shapes at the same point in the pipeline and `ReuseExchangeAndSubquery` works without any rapids-side post-processing.

## Goal

Apply `GpuTransitionOverrides` (or at least the `insertCoalesce` / `optimizeCoalesce` rewrites it depends on) to the broadcast child that `GpuSubqueryBroadcastExec.convertToGpu` constructs, so that:

1. The non-AQE DPP-side `GpuBroadcastExchangeExec` ends up structurally identical to the main-plan broadcast for the same logical CPU exchange.
2. Spark's stock `ReuseExchangeAndSubquery` matches them without any rapids-side intervention.
3. `fixupNonAdaptiveBroadcastReuse` (added in #14837) becomes redundant and can be removed.
4. Other transition-order divergences that might surface in the future — not just `GpuCoalesceBatches` — are handled by the same consistent rewrite path, so we do not accumulate more special-case fixups.

## Scope notes

- AQE mode already has `fixupAdaptiveExchangeReuse`; this issue is specifically about the non-AQE path, but the design should ideally remove the need for either fixup pass.
- The cross-runtime case (CPU `BroadcastHashJoin` + GPU DPP subquery) is tracked separately by #14836 and is out of scope here.
- Once the proper fix lands, #14837's `fixupNonAdaptiveBroadcastReuse` and its `spark.rapids.sql.nonAqeBroadcastReuseFixup.enable` kill switch should be retired in the same change.

## References

- #14833 — original bug
- #14837 — current (post-hoc fixup) PR
- #14836 — related but distinct cross-runtime DPP fallback case
- `GpuTransitionOverrides.fixupAdaptiveExchangeReuse` — the analogous AQE-side fixup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Apply GpuTransitionOverrides to GpuSubqueryBroadcast's broadcast child consistently in non-AQE mode #14892

Background

Goal

Scope notes

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEA] Apply GpuTransitionOverrides to GpuSubqueryBroadcast's broadcast child consistently in non-AQE mode #14892

Description

Background

Goal

Scope notes

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions