Skip to content

[auto-merge] release/26.06 to main [skip ci] [bot]#14916

Merged
nvauto merged 1 commit into
mainfrom
release/26.06
May 30, 2026
Merged

[auto-merge] release/26.06 to main [skip ci] [bot]#14916
nvauto merged 1 commit into
mainfrom
release/26.06

Conversation

@nvauto
Copy link
Copy Markdown
Collaborator

@nvauto nvauto commented May 30, 2026

auto-merge triggered by github actions on release/26.06 to create a PR keeping main up-to-date. If this PR is unable to be merged due to conflicts, it will remain open until manually fix.

### Description

When AQE is enabled, transition cleanup can expose a bare GPU exchange
at the root of the final adaptive plan. That is valid while Spark is
preparing AQE query stages because Spark needs to see the exchange node,
but it is invalid once Spark is collecting rows from the final plan. On
a proprietary Spark distro this showed up as `IllegalStateException:
Row-based execution should not occur for GpuColumnarExchange` for
repartition collect paths.

This PR tags exchanges that are seen by `GpuQueryStagePrepOverrides` and
copies that tag from CPU exchange nodes to the GPU exchange
replacements. `optimizeAdaptiveTransitions` now exposes a root GPU
exchange only when that tag is present, and otherwise keeps
`GpuColumnarToRowExec` for final execution.

The user-visible effect is that final AQE repartition results can be
collected normally. There are no new configs or documented behavior
changes.

Test coverage added:

- `AdaptiveQueryExecSuite`: `Keep transition to row for final AQE
repartition exchange`

Validation performed:

- `mvn package -pl tests -am -Dbuildver=358
-DwildcardSuites=com.nvidia.spark.rapids.AdaptiveQueryExecSuite
-Dtests="Keep transition to row for final AQE repartition exchange"`
- Built and ran the affected `repart_test.py` subset in a proprietary
Spark distro Docker image with AQE enabled and `NUM_LOCAL_EXECS=2`; 63
tests passed with 0 failures/errors
- Confirmed the pre-fix parent still reproduces the original failure
without setting `NUM_LOCAL_EXECS`; 24 of 63 selected cases failed with
the `GpuColumnarExchange` row-execution error

### Checklists

Documentation
- [ ] Updated for new or modified user-facing features or behaviors
- [x] No user-facing change

Testing
- [x] Added or modified tests to cover new code paths
- [ ] Covered by existing tests
(Please provide the names of the existing tests in the PR description.)
- [ ] Not required

Performance
- [ ] Tests ran and results are added in the PR description
- [ ] Issue filed with a link in the PR description
- [x] Not required

---------

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@nvauto nvauto merged commit d217a24 into main May 30, 2026
@nvauto
Copy link
Copy Markdown
Collaborator Author

nvauto commented May 30, 2026

SUCCESS - auto-merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants