Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnest operator honor kPreferredOutputBatchRows strictly #10655

Open
jinchengchenghh opened this issue Aug 5, 2024 · 3 comments · May be fixed by #10711
Open

Unnest operator honor kPreferredOutputBatchRows strictly #10655

jinchengchenghh opened this issue Aug 5, 2024 · 3 comments · May be fixed by #10711
Labels
enhancement New feature or request

Comments

@jinchengchenghh
Copy link
Contributor

Description

This PR honor kPreferredOutputBatchRows config.
#7051

Now there is the constraint that single row output should be into single batch.
But for this case, an input row has a very large nested array+struct, the output batch size is also large.
So we need to respect kPreferredOutputBatchRows strictly.
There is several strategies:

  1. Split the row only if one row output batch size is more than maxOutputBatchSize.
  2. Always split the last row to match the output batch size .

I would prefer the second way, it can lead to accurate batch size.
We could add a benchmark to test the performance if we always split the end row.

#7051 (comment)

@jinchengchenghh jinchengchenghh added the enhancement New feature or request label Aug 5, 2024
@jinchengchenghh
Copy link
Contributor Author

CC @mbasmanova @FelixYBW

@mbasmanova
Copy link
Contributor

CC: @bikramSingh91 @pedroerp

@mbasmanova
Copy link
Contributor

I don't have a strong preference between 1 or 2.

It would be nice to figure out how to produce batches of a specified "size in bytes", rather than "number of rows". This doesn't have happen right away, but something to keep in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants