-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Open
Labels
dataRay Data-related issuesRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to RayGreat starter issue for someone just starting to contribute to Ray
Description
The Problem
The completed() method in PhysicalOperator has a hidden side effect. When you call completed(), which sounds like it should just check if an operator is done, it actually modifies internal state:
def completed(self) -> bool:
# ... other code ...
if not self._execution_finished:
if (
self._inputs_complete
and internal_input_queue_num_blocks == 0
and self.num_active_tasks() == 0
):
self._execution_finished = True # 🐛 Side effect in a query method!
return (
self._execution_finished
and not self.has_next()
and internal_output_queue_num_blocks == 0
)This is problematic because it's not obvious from the method name that state is being changed.
The Fix
We need to refactor to separate the "checking if marked finished" from "computing if actually finished":
- Rename
_execution_finished→_is_execution_marked_finished- This field now only tracks whether someone explicitly called
mark_execution_finished()
- This field now only tracks whether someone explicitly called
- Rename
execution_finished()→has_execution_finished()- Make it a pure computed property that returns True if either:
- The operator was explicitly marked finished, OR
- The auto-completion conditions are met
- No side effects - just computes and returns
- Make it a pure computed property that returns True if either:
- Update
completed()to remove the side effect- Just call
has_execution_finished()instead of modifying state
- Just call
After the fix, both has_execution_finished() and completed() will be pure query methods - you can call them as many times as you want without changing anything.
Files to Modify
python/ray/data/_internal/execution/interfaces/physical_operator.py- A few test files that directly access
_execution_finished LimitOperatorwhich has one direct field access
Hints
- Search for
_execution_finishedto find all places that need updating - The logic you're moving should go into
has_execution_finished()without the assignment statement - Make sure to update
mark_execution_finished()to set the renamed field
Metadata
Metadata
Assignees
Labels
dataRay Data-related issuesRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to RayGreat starter issue for someone just starting to contribute to Ray