Skip to content

[feat][cp] Add mode Spark aggregate function#334

Draft
guhaiyan0221 wants to merge 1 commit intobytedance:mainfrom
guhaiyan0221:fix_cp_mode_agg
Draft

[feat][cp] Add mode Spark aggregate function#334
guhaiyan0221 wants to merge 1 commit intobytedance:mainfrom
guhaiyan0221:fix_cp_mode_agg

Conversation

@guhaiyan0221
Copy link
Collaborator

What problem does this PR solve?

Issue Number: discussion #191

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

Summary:
Doc: https://spark.apache.org/docs/latest/api/sql/#mode Code: https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala#L42

ComplexTypeAccumulator needs inputType parameter, which doesn't align with SimpleAggregateAdapter's intializer's signature. So we don't implement the mode function with simple API for complex types.

Corresponding PR: facebookincubator/velox#10462

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    Paste your google-benchmark or TPC-H results here.
    Before: 10.5s
    After:   8.2s  (+20%)
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Fixed a crash in `substr` when input is null.
- optimized `group by` performance by 20%.

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

@CLAassistant
Copy link

CLAassistant commented Mar 3, 2026

CLA assistant check
All committers have signed the CLA.

@guhaiyan0221 guhaiyan0221 marked this pull request as draft March 3, 2026 13:43
@guhaiyan0221 guhaiyan0221 force-pushed the main branch 2 times, most recently from 906a3a9 to b88fc0e Compare March 4, 2026 15:38
Summary:
Doc: https://spark.apache.org/docs/latest/api/sql/#mode
Code: https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala#L42

`ComplexTypeAccumulator` needs inputType parameter, which doesn't align with
`SimpleAggregateAdapter`'s intializer's signature. So we don't implement the mode
function with simple API for complex types.

Corresponding PR: facebookincubator/velox#10462
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants