Commit 143e9ca
[BUG] Dedup GpuBroadcastExchange across DPP subqueries in non-AQE mode (#14837)
Closes #14833
## Summary
In non-AQE mode with DPP, `GpuSubqueryBroadcastExec` builds its
underlying `GpuBroadcastExchangeExec` directly during `GpuOverrides`
(the first pass) via `exMeta.convertToGpu()`, bypassing
`GpuTransitionOverrides`. The DPP-side broadcast therefore ends up with
a structurally different child (missing `GpuCoalesceBatches` that
`insertCoalesce`/`optimizeCoalesce` add on the main plan) than the
join-side broadcast for the same logical CPU exchange. The
`cpuCanonical` field is also computed after GPU rewriting on the DPP
side but before it on the join side, so `ExchangeMappingCache` lookup by
`cpuCanonical` also fails to match. Spark's `ReuseExchangeAndSubquery`
rule cannot merge the two broadcasts, so the dim side is materialized
and broadcast twice — defeating DPP's intended performance benefit.
This adds `fixupNonAdaptiveBroadcastReuse` to `GpuTransitionOverrides`,
gated by a new `spark.rapids.sql.nonAqeBroadcastReuseFixup.enable` conf
(internal, default true). The pass collects main-plan
`GpuBroadcastExchangeExec` instances, indexes them by
`(mode.canonicalized, stripTransitions(child).canonicalized)` (where
`stripTransitions` removes `GpuCoalesceBatches` so the structural
difference does not block matching), then walks subquery expressions and
rewrites any matching DPP-side broadcast to `ReusedExchangeExec`
referencing the main-plan instance. Mirrors
`fixupAdaptiveExchangeReuse`, which already handles the equivalent gap
in AQE mode.
The cross-runtime case (CPU `BroadcastHashJoin` + GPU DPP subquery when
`array`/`struct` build keys force CPU fallback) is tracked separately by
#14836 and not addressed here.
## Traceability
- Issue: #14833 (filed before this PR per the issue-first rule)
- Follow-on (proper root-cause fix tracked separately): #14892 — apply
`GpuTransitionOverrides` to `GpuSubqueryBroadcast`'s broadcast child so
the post-hoc fixup added here can be retired.
- Migration PR (independent, adds the end-to-end DPP suite that exposes
this bug): #14781
- Related: #14836 (cross-runtime fallback case, distinct root cause)
- Existing analogue this fix mirrors:
`GpuTransitionOverrides.fixupAdaptiveExchangeReuse`.
## Testing
### Unit coverage (added in this PR)
`tests/src/test/scala/com/nvidia/spark/rapids/NonAqeBroadcastReuseFixupSuite.scala`
(added per @res-life's review request) directly exercises
`fixupNonAdaptiveBroadcastReuse`. It hand-builds the #14833 structural
divergence — a main-plan
`GpuBroadcastExchangeExec(GpuCoalesceBatches(range))` and a DPP-side
`GpuBroadcastExchangeExec(range)` sharing one dim-side `range` (so their
broadcast modes canonicalize identically, but the children differ by
exactly the `GpuCoalesceBatches` wrap):
- `fixupNonAdaptiveBroadcastReuse rewrites matching DPP broadcast to
ReusedExchangeExec` — once `stripGpuCoalesceBatches` normalizes the
structural difference, the DPP-side broadcast is rewritten to a
`ReusedExchangeExec` pointing at the main-plan instance.
- `fixupNonAdaptiveBroadcastReuse leaves plans with no main-plan
broadcast unchanged` — the `mainPlanBroadcasts.isEmpty` early-exit
returns the input plan unmodified.
- `ENABLE_NON_AQE_BROADCAST_REUSE_FIXUP conf accessor flips with the
kill switch` — the accessor reflects the kill switch and defaults to
`true` via `createWithDefault(true)`.
```
mvn package -pl tests -am -Dbuildver=330 -Dmaven.repo.local=./.mvn-repo \
-DwildcardSuites=com.nvidia.spark.rapids.NonAqeBroadcastReuseFixupSuite \
-Drapids.test.gpu.allocFraction=0.3 -Drapids.test.gpu.maxAllocFraction=0.3 \
-Drapids.test.gpu.minAllocFraction=0 -s jenkins/settings.xml -P mirror-apache-to-urm
```
→ **`Tests: succeeded 3, failed 0`** (verified locally on Apache Spark
3.3, buildver 330)
### End-to-end (DPP suite from #14781, applied locally)
Validated against Apache Spark 3.3 in a worktree, with the migrated
`RapidsDynamicPartitionPruningV1Suite` from #14781 applied locally and
the four `#14833` KNOWN_ISSUE excludes removed:
```
mvn package -pl tests -am -Dbuildver=330 -Dmaven.repo.local=./.mvn-repo \
-DwildcardSuites=org.apache.spark.sql.rapids.suites.RapidsDynamicPartitionPruningV1SuiteAEOff,org.apache.spark.sql.rapids.suites.RapidsDynamicPartitionPruningV1SuiteAEOn \
-Drapids.test.gpu.allocFraction=0.3 -Drapids.test.gpu.maxAllocFraction=0.3 \
-Drapids.test.gpu.minAllocFraction=0 -s jenkins/settings.xml -P mirror-apache-to-urm
```
- `RapidsDynamicPartitionPruningV1SuiteAEOff`: **`Tests: succeeded 34,
failed 1`** (the one failure is `SPARK-32659` — known partial-fallback
case from #14836, unrelated to this fix). The four previously-failing
tests now pass:
- `avoid reordering broadcast join keys to match input hash
partitioning`
- `Plan broadcast pruning only when the broadcast can be reused`
- `SPARK-32817: DPP throws error when the broadcast side is empty`
- `SPARK-38148: Do not add dynamic partition pruning if there exists
static partition pruning`
- `RapidsDynamicPartitionPruningV1SuiteAEOn`: **`Tests: succeeded 31,
failed 0`** — no AQE regression.
- `BroadcastHashJoinSuite` smoke: `Tests: succeeded 2, failed 0`.
### Cross-shim compile (per CLAUDE.md shim coverage rule)
- `mvn package -DskipTests -pl sql-plugin -am -Dbuildver=330`: BUILD
SUCCESS
- `mvn package -DskipTests -pl sql-plugin -am -Dbuildver=340`: BUILD
SUCCESS
- `mvn package -DskipTests -pl sql-plugin -am -Dbuildver=400` (via
`scala2.13/`): BUILD SUCCESS
`scripts/check-shim-coverage.sh`: no shim files changed, no
`Origin.context` leaks.
## Performance impact
Cold-path analysis. The pass runs once at the end of
`GpuTransitionOverrides` per query plan:
- For queries without `GpuBroadcastExchangeExec` (the vast majority),
the pass exits early after a single `SparkPlan.foreach` traversal (no
`transformAllExpressions` invocation).
- For queries with broadcasts and DPP, the work is O(n) plan traversal
plus O(m) canonical computation on small subquery trees, where m is the
size of the DPP subquery (typically a few nodes).
This is negligible compared to query planning overhead and execution
time. No benchmark required.
## Test recovery follow-up
The fix has direct unit coverage in this PR
(`NonAqeBroadcastReuseFixupSuite`). The end-to-end DPP recovery
additionally depends on #14781, which adds the migrated
`RapidsDynamicPartitionPruningV1Suite`. After both this fix and #14781
merge, a small follow-up PR removes the four `#14833` KNOWN_ISSUE
excludes in `RapidsTestSettings.scala` so the migrated suite exercises
the fix in CI. The two PRs are independent but both required for
end-to-end recovery.
Documentation
- [ ] Updated for new or modified user-facing features or behaviors
- [x] No user-facing change
Testing
- [x] Added or modified tests to cover new code paths
- [ ] Covered by existing tests
(Please provide the names of the existing tests in the PR description.)
- [ ] Not required
Performance
- [ ] Tests ran and results are added in the PR description
- [ ] Issue filed with a link in the PR description
- [x] Not required
---------
Signed-off-by: Allen Xu <allxu@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent f10cd40 commit 143e9ca
3 files changed
Lines changed: 269 additions & 1 deletion
File tree
- sql-plugin/src/main/scala/com/nvidia/spark/rapids
- tests/src/test/scala/com/nvidia/spark/rapids
Lines changed: 76 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
| 41 | + | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| |||
759 | 759 | | |
760 | 760 | | |
761 | 761 | | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
762 | 833 | | |
763 | 834 | | |
764 | 835 | | |
| |||
850 | 921 | | |
851 | 922 | | |
852 | 923 | | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
853 | 928 | | |
854 | 929 | | |
855 | 930 | | |
| |||
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2748 | 2748 | | |
2749 | 2749 | | |
2750 | 2750 | | |
| 2751 | + | |
| 2752 | + | |
| 2753 | + | |
| 2754 | + | |
| 2755 | + | |
| 2756 | + | |
| 2757 | + | |
| 2758 | + | |
| 2759 | + | |
| 2760 | + | |
| 2761 | + | |
| 2762 | + | |
2751 | 2763 | | |
2752 | 2764 | | |
2753 | 2765 | | |
| |||
4012 | 4024 | | |
4013 | 4025 | | |
4014 | 4026 | | |
| 4027 | + | |
| 4028 | + | |
| 4029 | + | |
4015 | 4030 | | |
4016 | 4031 | | |
4017 | 4032 | | |
| |||
Lines changed: 178 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
0 commit comments