Skip to content

[Data] Refactor PhysicalOperator.completed to avoid hidden side effect #58884

@bveeramani

Description

@bveeramani

The Problem

The completed() method in PhysicalOperator has a hidden side effect. When you call completed(), which sounds like it should just check if an operator is done, it actually modifies internal state:

  def completed(self) -> bool:
      # ... other code ...
      if not self._execution_finished:
          if (
              self._inputs_complete
              and internal_input_queue_num_blocks == 0
              and self.num_active_tasks() == 0
          ):
              self._execution_finished = True  # 🐛 Side effect in a query method!

      return (
          self._execution_finished
          and not self.has_next()
          and internal_output_queue_num_blocks == 0
      )

This is problematic because it's not obvious from the method name that state is being changed.

The Fix

We need to refactor to separate the "checking if marked finished" from "computing if actually finished":

  1. Rename _execution_finished_is_execution_marked_finished
    1. This field now only tracks whether someone explicitly called mark_execution_finished()
  2. Rename execution_finished()has_execution_finished()
    1. Make it a pure computed property that returns True if either:
      1. The operator was explicitly marked finished, OR
      2. The auto-completion conditions are met
    2. No side effects - just computes and returns
  3. Update completed() to remove the side effect
    1. Just call has_execution_finished() instead of modifying state

After the fix, both has_execution_finished() and completed() will be pure query methods - you can call them as many times as you want without changing anything.

Files to Modify

  • python/ray/data/_internal/execution/interfaces/physical_operator.py
  • A few test files that directly access _execution_finished
  • LimitOperator which has one direct field access

Hints

  • Search for _execution_finished to find all places that need updating
  • The logic you're moving should go into has_execution_finished() without the assignment statement
  • Make sure to update mark_execution_finished() to set the renamed field

Metadata

Metadata

Assignees

Labels

dataRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to Ray

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions