Skip to content

[Core] Autoscaler for virtual clusters #506

Open
@Chong-Li

Description

@Chong-Li

Description

Currently, when autoscaler (v2) is (periodically) triggered, it basically goes through the following steps:

  1. Reconciler._sync_from: get the latest ray cluster resource data (including pending resource demands) from gcs.
  2. Reconciler._step_next: make autoscaling decisions. It calls ResourceDemandScheduler.schedule, which includes:
    a. ResourceDemandScheduler._enforce_min/max_workers: determine the nodes to add or terminate to enforce the min/max node limits of the cluster.
    b. ResourceDemandScheduler._sched_(gang)_resource_requests: determine the nodes to add to fulfill the pending PGs, actors and tasks (resource demands).
    c. ResourceDemandScheduler._enforce_idle_termination: determine which nodes to terminate if being idle for long time.
  3. KubeRayProvider aggregates the autoscaling decisions and sends patch to k8s.

To make the autoscaler support virtual clusters, we have to do the following changes:

  • Reconciler._sync_from() has to additionally get the latest metadata (including pending demands) of virtual clusters from gcs.

  • Do 2a, 2b, 2c for each virtual cluster. To make it happen, we have to additionally support:
    a. User APIs to configure each virtual cluster's min/max limits.
    b. ResourceDemandScheduler._sched_(gang)_resource_requests simulates the scheduling of pending demands within the corresponding virtual cluster.
    c. Shrink specific ray nodes from a virtual cluster, which involves node draining.

  • KubeRayProvider aggregates the autoscaling decisions made by each virtual cluster. Before sending patch to k8s, it has to find any possibility for rebalancing (nodes shrunk by a virtual cluster could be replenished to another). It reduces the cost of pod creation and termination.

Besides the changes to autoscaler, we have to adapt GcsAutoscalerStateManager in order to provide the virtual cluster metadata to autoscaler. But if it requires too many changes to the existing rpc protocol, then we should do it in GcsVirtualClusterManager instead.

Use case

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions