-
Notifications
You must be signed in to change notification settings - Fork 381
feat: Channel-less intermediate op #5999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| fn op_type(&self) -> NodeType; | ||
| fn multiline_display(&self) -> Vec<String>; | ||
| fn make_state(&self) -> DaftResult<Self::State>; | ||
| fn make_state(&self) -> Self::State; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a drive by, make_state should not need to be fallible
Greptile OverviewGreptile SummaryThis PR fundamentally refactors the IntermediateOp execution model to fix a critical bug where workers producing Key ChangesArchitecture Overhaul:
HasMoreOutput API Change:
State Management:
Code Sharing:
ImpactPerformance: The PR demonstrates a 50% reduction in peak memory usage for long pipelines of intermediate ops, as shown in the benchmark with chained UDFs. Correctness: Fixes the HasMoreOutput deadlock that could occur with operations like CrossJoin when maintain_order=True. Future Work: PR description notes that streaming sinks and blocking sinks still use the old dispatcher/channel architecture and will be refactored in future work to get similar benefits. Review FindingsThe implementation is sound and correct. The state machine logic properly handles:
All intermediate operators have been correctly updated to the new API. The change is internal to the crate and does not break external APIs. Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Child as Child Node
participant IntermediateOp as IntermediateNode
participant ProcessInput as process_input
participant TaskSet as OrderingAwareJoinSet
participant StatePool as State Pool
participant Buffer as RowBasedBuffer
participant Output as Output Channel
Child->>IntermediateOp: start()
IntermediateOp->>StatePool: Initialize states (max_concurrency)
IntermediateOp->>ProcessInput: spawn task
loop Until input_closed AND task_set empty
alt Task completion available
TaskSet-->>ProcessInput: join_next() returns result
ProcessInput->>ProcessInput: handle_task_completion()
alt NeedMoreInput(Some(mp))
ProcessInput->>Output: send(mp)
ProcessInput->>StatePool: return state
end
alt NeedMoreInput(None)
ProcessInput->>StatePool: return state
end
alt HasMoreOutput{input, output}
ProcessInput->>Output: send(output)
ProcessInput->>TaskSet: spawn_execution_task(same input, same state)
Note over ProcessInput,TaskSet: State NOT returned - will be reused
end
ProcessInput->>Buffer: spawn_ready_batches()
loop While states available AND batches ready
Buffer-->>ProcessInput: next_batch_if_ready()
StatePool-->>ProcessInput: get state
ProcessInput->>TaskSet: spawn_execution_task()
end
else Input available
Child-->>ProcessInput: recv() returns morsel
ProcessInput->>Buffer: push(morsel)
ProcessInput->>Buffer: spawn_ready_batches()
loop While states available AND batches ready
Buffer-->>ProcessInput: next_batch_if_ready()
StatePool-->>ProcessInput: get state
ProcessInput->>TaskSet: spawn_execution_task()
end
end
end
opt Remaining buffered data
Buffer-->>ProcessInput: pop_all()
StatePool-->>ProcessInput: get state
ProcessInput->>TaskSet: spawn_execution_task()
loop Until all tasks complete
TaskSet-->>ProcessInput: join_next()
ProcessInput->>ProcessInput: handle_task_completion()
opt HasMoreOutput
ProcessInput->>TaskSet: spawn again
Note over ProcessInput,TaskSet: Loop continues until NeedMoreInput
end
end
end
ProcessInput->>IntermediateOp: Done
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5999 +/- ##
==========================================
+ Coverage 72.69% 72.86% +0.16%
==========================================
Files 970 970
Lines 127333 126113 -1220
==========================================
- Hits 92570 91889 -681
+ Misses 34763 34224 -539
🚀 New features to boost your workflow:
|
Changes Made
There is currently a bug with
IntermediateOps where the worker that produces aHasMoreOutputis not being polled from again, in themaintain_order = Truecase.Recap
Quick recap, what is actually going in with intermediate ops, and why does this only affect HasMoreOutput.
We have intermediate op workers, that are responsible for actually executing the op on input morsels, a dispatcher that dispatches work to them, and a receiver that receives results. in the
maintain order = Truecase, we use a round robin dispatcher that dispatches work in round robin fashion, and anOrderingAwareReceiverthat receives results in a round robin fashion.Problem
If a worker produces a result with HasMoreOutput, the round robin receiver SHOULD PULL FROM THAT WORKER AGAIN, BUT IT DOES NOT and instead it moves on to the next worker.
This can create a deadlock where the worker that produces has more output wants to send again, but the round robin reciever is not awaiting it, and is instead awaiting from some other worker that could be, say, awaiting it's own input.
Solution
Just give the OrderingAwareReceiver info that it should pull from a certain receiver again. Bleh, this is now so messy.
So, lets just get rid of the channels altogether. No more channels.
We can remodel the intermediate op to a single concurrent state machine.
This allows us to get rid channels within the intermediate op altogether, and only have a single channel connecting intermdiate ops.
Results
This script simulates a long pipeline of connecting intermediate ops. Theoretically less channels means less intermediate data.
Before:

After:

We reduced peak memory by half.
Future
This
HasMoreOutputbug affects streaming sinks as well actually. So if this fix works we should implement it on streaming sink as well, and then might as well do so for blocking sinks and we can get rid of the intra-op channels altogether.Related Issues