Description
Is your feature request related to a problem or challenge?
This is a follow-up for #14644:
See details in #14644 (comment) cc @alamb @Kontinuation
We need a complete solution about stable sort with spill, several issues until now:
- Ensuring we can always spill data (now spilling will sometimes fail if we run out of memory to sort the batches in)
- Ensuring that we can always merge the data that was spilled, even if it had a really wide fanout (like 1000 of spill files)
The problem 1 could be solved by potentially spilling unsorted batches (and then sorting them separately). This would be less efficient (read/write some tuples twice but would work.
Problem 2 could use the multi-pass merge:
We can implement a more complicated multi-stage merging phase which merges a small portion of streams each time, and perform multiple rounds of merges to produce the final merged stream.
Describe the solution you'd like
The problem 1 could be solved by potentially spilling unsorted batches (and then sorting them separately). This would be less efficient (read/write some tuples twice but would work.
Problem 2 could use the multi-pass merge:
We can implement a more complicated multi-stage merging phase which merges a small portion of streams each time, and perform multiple rounds of merges to produce the final merged stream.
Describe alternatives you've considered
No response
Additional context
No response