Skip to content

[feat][cp] Adds shuffle compression support#346

Open
Weixin-Xu wants to merge 1 commit intobytedance:mainfrom
Weixin-Xu:shuffle_compression
Open

[feat][cp] Adds shuffle compression support#346
Weixin-Xu wants to merge 1 commit intobytedance:mainfrom
Weixin-Xu:shuffle_compression

Conversation

@Weixin-Xu
Copy link
Collaborator

cp from facebookincubator/velox#11914

Summary:
Adds per query shuffle compression support. Currently we configure compression kind in partition output buffer manager which enforces all the queries use the same compression kind and assume all the workers having the same compression kind which is not flexible not align with Presto java as well. This change removes the compression kind from partition output buffer manager and instead configure it through query config. Also the shuffle operators report the compression kind.

The followup is to integrate with Prestissimo work by setting LZ4 compression kind if the shuffle compression session property is set. Note Presto java doesn't allow to configure compression kind to use.

With Meta internal workloads, LZ4 compression kind can reduce e2e execution time by 20% with half of shuffle data volume reduction

What problem does this PR solve?

Issue Number: close #xxx

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

Describe your changes in detail.
For complex logic, explain the "Why" and "How".

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    Paste your google-benchmark or TPC-H results here.
    Before: 10.5s
    After:   8.2s  (+20%)
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- add shuffle compression option from query config

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

cp from facebookincubator/velox#11914

Summary:
Adds per query shuffle compression support. Currently we configure compression kind in partition output
buffer manager which enforces all the queries use the same compression kind and assume all the
workers having the same compression kind which is not flexible not align with Presto java as well. This change
removes the compression kind from partition output buffer manager and instead configure it through query config.
Also the shuffle operators report the compression kind.

The followup is to integrate with Prestissimo work by setting LZ4 compression kind if the shuffle compression
session property is set. Note Presto java doesn't allow to configure compression kind to use.

With Meta internal workloads, LZ4 compression kind can reduce e2e execution time by 20% with half
of shuffle data volume reduction
@frankobe frankobe requested a review from zhangxffff March 5, 2026 18:23
@frankobe
Copy link
Collaborator

frankobe commented Mar 5, 2026

@zhangxffff can you review this to see any conflict with the existing Spark shuffle compression?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants