Skip to content

Commit 3ddaca3

Browse files
committed
fix: use graceful actor termination to avoid Ray task_manager assertion
ray.kill() races with task completion callbacks in Ray's C++ task_manager, triggering a fatal assertion (ray-project/ray#54260) that crashes the process. Switch to __ray_terminate__ which queues behind pending tasks and escalates to force-kill after 30s. Fixes flaky CI failures in the integration test where the zephyr html-to-md step succeeds but the process crashes during actor cleanup: task_manager.cc:983: Check failed: it != submissible_tasks_.end() Tried to complete task that was not pending
1 parent e4121ee commit 3ddaca3

1 file changed

Lines changed: 10 additions & 3 deletions

File tree

lib/fray/src/fray/v2/ray_backend/backend.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -612,9 +612,16 @@ def is_done(self) -> bool:
612612
return False
613613

614614
def shutdown(self) -> None:
615-
"""Kill all Ray actors."""
615+
"""Gracefully terminate all Ray actors.
616+
617+
Uses __ray_terminate__ instead of ray.kill() so that in-flight tasks
618+
finish before the actor exits. ray.kill() races with task completion
619+
callbacks in Ray's C++ task_manager, triggering a fatal assertion
620+
(ray-project/ray#54260). __ray_terminate__ queues behind pending
621+
tasks and escalates to a force-kill after 30 s.
622+
"""
616623
for handle in self._handles:
617624
try:
618-
ray.kill(handle._actor_ref)
625+
handle._actor_ref.__ray_terminate__.remote()
619626
except Exception as e:
620-
logger.warning("Failed to kill Ray actor: %s", e)
627+
logger.warning("Failed to terminate Ray actor: %s", e)

0 commit comments

Comments
 (0)