-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Open
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CoreperformancestabilitytrainRay Train Related IssueRay Train Related IssuetriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)
Description
What happened + What you expected to happen
Recently (with Ray 2.54.0) I have observed nodes from autoscaling worker groups considered Active and not being deleted, even though no job is running.
This happened after running a Ray Train job. I noticed the problematic nodes still had active PlacementGroupCleaner actors on them (sometimes multiple such actors on a single node). I know the placement group cleaner was added recently, so I wonder if that's the main culprit.
This is a significant issue for me, because it wastes money on nodes that are doing no work.
Versions / Dependencies
Ray 2.54.0
Kuberay 1.5.0
Reproduction script
Not reliably reproducible. Try running some Ray Train jobs that scale up the cluster by using a placement group.
Issue Severity
Medium: It is a significant difficulty but I can work around it.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CoreperformancestabilitytrainRay Train Related IssueRay Train Related IssuetriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)
Type
Projects
Status
Todo