Skip to content

planner: preserve rollup alias grouping positions#68435

Open
hawkingrei wants to merge 3 commits into
pingcap:masterfrom
hawkingrei:issue-65965-rollup-duplicate-groups
Open

planner: preserve rollup alias grouping positions#68435
hawkingrei wants to merge 3 commits into
pingcap:masterfrom
hawkingrei:issue-65965-rollup-duplicate-groups

Conversation

@hawkingrei
Copy link
Copy Markdown
Member

@hawkingrei hawkingrei commented May 16, 2026

What problem does this PR solve?

Issue Number: close #65965

Problem Summary:

GROUP BY ... WITH ROLLUP could return duplicate visible rows when a GROUP BY item was a SELECT alias or ordinal that repeated an earlier grouping expression. For example, SELECT a, b, a AS d, SUM(c) FROM t1 GROUP BY a, b, d WITH ROLLUP reused the same projected grouping column for both a and d, so the {a,b,d} and {a,b} rollup levels both rendered d as non-NULL.

What changed and how does it work?

This PR preserves original GROUP BY item positions in the Expand plan only when a repeated grouping expression is introduced through a SELECT alias or ordinal. That lets different rollup levels null the repeated visible positions independently.

The existing deduplicated grouping-expression path is kept for ordinary repeated grouping expressions, avoiding broad planner behavior churn.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Validation:

go test ./pkg/planner/core/issuetest -run TestPlannerIssueRegressions -count=1 -tags=intest,deadlock
./tools/check/failpoint-go-test.sh pkg/planner/core/casetest/physicalplantest -run TestExplainExpand -count=1
git diff --check
make lint

Manual replay covered the issue query, GROUP BY 1, 2, 3 WITH ROLLUP, ordinary repeated grouping keys, and warning checks.

make bazel_prepare was not required because this PR only changes existing Go files and extends an existing top-level Go test function. It does not add, remove, move, or rename Go files, add a new top-level TestXxx, or touch Bazel/go.mod/go.sum files.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Fix an issue that `GROUP BY ... WITH ROLLUP` could return duplicate visible rows when a SELECT alias or ordinal repeated an earlier grouping expression.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed GROUP BY ... WITH ROLLUP to preserve grouping-item ordering and produce consistent summary rows and null placement for queries using aliases, ordinals, or repeated grouping keys.
  • Tests

    • Added a regression test exercising multiple GROUP BY WITH ROLLUP variants (including a mixed alias/repeated-key case) and verifying exact results and no warnings.

Review Change Stack

@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-triage-completed labels May 16, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign 0xpoe for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/planner SIG: Planner labels May 16, 2026
@hawkingrei hawkingrei added the AI-Correction Bugfix by AI label May 16, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4c31382c-4a12-49a1-8e70-94f3688d0b91

📥 Commits

Reviewing files that changed from the base of the PR and between e6939df and aeab453.

📒 Files selected for processing (1)
  • pkg/planner/core/logical_plan_builder.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/planner/core/logical_plan_builder.go

📝 Walkthrough

Walkthrough

Tracks the SELECT-field origin of each GROUP BY item through resolver and rollup expansion, preserves/restores GROUP BY positions when required for WITH ROLLUP, and performs field-index-aware grouping-set substitution; includes a cascades regression test covering explicit, positional, and mixed GROUP BY variants.

Changes

WITH ROLLUP grouping key deduplication and position tracking

Layer / File(s) Summary
LogicalExpand operator: source field index tracking
pkg/planner/core/operator/logicalop/logical_expand.go
New GbyItemSourceFieldIndices field records the SELECT-field origin per GROUP BY item (-1 means no SELECT-field binding). Adds TrySubstituteExprWithGroupingSetColByFieldIndex for index-aware substitution with fallbacks.
GROUP BY resolver: record source field indices
pkg/planner/core/logical_plan_builder.go
gbyResolver now includes sourceFieldIndex and sets it for non-expression GROUP BY columns and PositionExpr. resolveGbyExprs returns expressions plus a parallel sourceFieldIndices slice.
buildExpand: deduplication & position preservation
pkg/planner/core/logical_plan_builder.go
PlanBuilder.buildExpand accepts gbyItemSourceFieldIndices, uses canonical-hash checks to determine if ROLLUP requires preserving duplicate GROUP BY positions, computes deduplicated expressions and aligned source-index mappings, and passes them into LogicalExpand. Includes bytes import and call-site wiring.
Index-aware grouping-set substitution & projection
pkg/planner/core/logical_plan_builder.go
resolveGroupingTraverseAction gains SelectFieldIndex. Added replaceGroupingFuncByFieldIndex and projection-time calls so grouping-set expressions are rewritten by select-field index when an Expand exists.
buildSelect: propagate gbyItemSourceFieldIndices
pkg/planner/core/logical_plan_builder.go
buildSelect captures gbyItemSourceFieldIndices from resolveGbyExprs and forwards them into buildExpand when ROLLUP expansion is enabled.
Regression test: repeated grouping key with ROLLUP
pkg/planner/core/issuetest/planner_issue_test.go
New test issue-65965-rollup-alias-repeated-grouping-key resets DB, creates/populates t1, runs explicit, positional, and mixed GROUP BY ... WITH ROLLUP queries, asserts expected sorted rows and that SHOW WARNINGS returns none.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Planner as PlanBuilder.resolveGbyExprs
  participant BuildExpand as PlanBuilder.buildExpand
  participant LogicalExpand
  participant Projection

  Client->>Planner: parse SELECT with GROUP BY (incl. aliases/positions)
  Planner->>BuildExpand: resolved gbyExprs + sourceFieldIndices
  BuildExpand->>LogicalExpand: deduped/ordered gbyExprs + GbyItemSourceFieldIndices
  Projection->>LogicalExpand: TrySubstituteExprWithGroupingSetColByFieldIndex(fieldIndex)
  LogicalExpand-->>Projection: substituted grouping-set column expr
  Projection-->>Client: final projected rows (ROLLUP nullability/layout applied)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • guo-shaoge
  • qw4990

Poem

🐰
I hopped through GROUP BY, maps in paw,
Traced every alias to its first raw law,
Restored lost places where rollups would stray,
No duplicate shadows now lead us astray — hooray! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: preserving rollup alias grouping positions to fix duplicate row issues.
Description check ✅ Passed The description follows the template with all required sections completed: issue number, problem summary, changes explanation, comprehensive test checklist, side effects assessment, and detailed release note.
Linked Issues check ✅ Passed The PR code changes directly address issue #65965 by preserving GROUP BY item positions when aliases/ordinals repeat expressions, fixing duplicate rows in WITH ROLLUP queries.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the WITH ROLLUP duplicate row issue: test additions, GROUP BY resolution enhancement, and Expand plan modifications for position tracking.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/planner/core/logical_plan_builder.go`:
- Around line 153-157: The current logic uses the single boolean
keepGbyItemPositions (from needPreserveRollupGbyItemPositions) to skip
deduplication for the entire ROLLUP list; change this to compute and use a
set/list of specific positions to preserve (e.g., preservePositions or
preserveIndexMask) so only alias/ordinal-backed duplicate positions are kept and
the rest of gbyItems can be deduplicated. Update the code paths that call
expression.DeduplicateGbyExpression and deriveDeduplicatedGbySourceFieldIndices
(including the similar blocks around the other mentioned regions) to: 1) compute
deduplicated expandGbyExprs/gbyExprsRefPos by skipping only the preserved
indices, 2) derive expandGbySourceFieldIndices for the deduplicated result using
deriveDeduplicatedGbySourceFieldIndices with the preserved-index mapping, and 3)
ensure buildExpand logic uses the preserved positions to reinsert aliases while
still allowing ordinary repeated items to be removed by the deduplication
routine.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 725da1e3-21fd-4d92-bc6d-4d5c1b5098db

📥 Commits

Reviewing files that changed from the base of the PR and between 6a6eefe and d42ec75.

📒 Files selected for processing (3)
  • pkg/planner/core/issuetest/planner_issue_test.go
  • pkg/planner/core/logical_plan_builder.go
  • pkg/planner/core/operator/logicalop/logical_expand.go

Comment thread pkg/planner/core/logical_plan_builder.go Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 16, 2026

Codecov Report

❌ Patch coverage is 74.52830% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.4886%. Comparing base (6a6eefe) to head (aeab453).

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68435        +/-   ##
================================================
- Coverage   77.2762%   76.4886%   -0.7877%     
================================================
  Files          2010       1992        -18     
  Lines        555477     557697      +2220     
================================================
- Hits         429252     426575      -2677     
- Misses       125305     131077      +5772     
+ Partials        920         45       -875     
Flag Coverage Δ
integration 41.4937% <74.5283%> (+1.6997%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4888% <ø> (ø)
parser ∅ <ø> (∅)
br 49.9725% <ø> (-13.0354%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hawkingrei
Copy link
Copy Markdown
Member Author

/test check-dev

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 17, 2026

@hawkingrei: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test fast_test_tiprow
/test tidb_parser_test

Use /test all to run all jobs.

Details

In response to this:

/test check-dev

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI-Correction Bugfix by AI release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WITH ROLLUP generates duplicate group rows when grouping keys are repeated

1 participant