[Core, Train] Cluster does not downscale despite no jobs running, possibly due to PlacementGroupCleaner

### What happened + What you expected to happen

Recently (with Ray 2.54.0) I have observed nodes from autoscaling worker groups considered Active and not being deleted, even though no job is running.

This happened after running a Ray Train job. I noticed the problematic nodes still had active PlacementGroupCleaner actors on them (sometimes multiple such actors on a single node). I know the placement group cleaner was added recently, so I wonder if that's the main culprit.

This is a significant issue for me, because it wastes money on nodes that are doing no work.

### Versions / Dependencies

Ray 2.54.0
Kuberay 1.5.0

### Reproduction script

Not reliably reproducible. Try running some Ray Train jobs that scale up the cluster by using a placement group.

### Issue Severity

Medium: It is a significant difficulty but I can work around it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core, Train] Cluster does not downscale despite no jobs running, possibly due to PlacementGroupCleaner #61689

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Core, Train] Cluster does not downscale despite no jobs running, possibly due to PlacementGroupCleaner #61689

Description

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions