Skip to content

fix: Dynamically adjust NestedLoopJoinProbe output batch size to prevent OOM#402

Open
zhangxffff wants to merge 1 commit intobytedance:mainfrom
zhangxffff:fix/nlj-dynamic-batch-size
Open

fix: Dynamically adjust NestedLoopJoinProbe output batch size to prevent OOM#402
zhangxffff wants to merge 1 commit intobytedance:mainfrom
zhangxffff:fix/nlj-dynamic-batch-size

Conversation

@zhangxffff
Copy link
Collaborator

@zhangxffff zhangxffff commented Mar 17, 2026

What problem does this PR solve?

NestedLoopJoinProbe OOMs when build-side rows are wide (e.g., containing complex nested types like ARRAY, MAP, or large VARCHAR columns). The fixed outputBatchSize_ causes two OOM scenarios:

  1. Inside NLJ: copyBuildValues()copyRanges on nested types triggers buffer doubling that requests huge single allocations, exceeding the memory pool cap.
  2. Downstream: PartialProject flattens the dictionary-wrapped probe columns via ensureFlatten, allocating numOutputRows × avgRowSize which also exceeds memory limits for large batches.

Issue Number: close #401

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

The root cause is that outputBatchSize_ is set once in the constructor via outputBatchRows() (no arguments → returns preferredOutputBatchRows) and never adjusted based on actual row size.

This PR adds dynamic batch size adjustment in:

  1. prepareOutput() — pre-allocation estimate: Before creating the output vectors and before any copyBuildValues call, estimate avgRowSize from probe + build data and call outputBatchRows(avgRowSize) to compute a byte-budget-aware batch size. This prevents the first batch from OOM.

  2. getOutput() — post-output refinement: After each output batch is produced, refine the batch size based on actual output->estimateFlatSize():

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    Paste your google-benchmark or TPC-H results here.
    Before: 10.5s
    After:   8.2s  (+20%)
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Fixed a crash in `substr` when input is null.
- optimized `group by` performance by 20%.

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

@zhangxffff zhangxffff requested a review from fzhedu March 17, 2026 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] NestedLoopJoinProbe OOM with wide build-side rows

1 participant