-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Use-case: An administrator creates a composability request CR1: {size: 10, type: gpu, model:A100} but only 5 composable GPUs can be connected.
There are two possible approaches:
- approach 1: the operator waits all GPUs to be available before connecting them
- approach 2:the operator connects a subset of the GPUs requested (5 GPUs)
Approach 1:
The operator keeps CR1 in a pending state where resources need to be allocated until we reach the target number of resources requested.
Pros: easy lifecycle management of the ComposabilityRequest
Cons: in case of a targetNode is specified, a new ComposableResource might come in for a different node and take the 5 GPUs making the mechanism potentially unfair.
Approach 2:
The operator connects the 5 GPUs, the whole lifecycle of the composable resource will be managed separately. The ComposableResource will stay in a pending state until 5 GPUs become available.
Pros: resources (GPUs in the example) get assigned to the ComposabilityRequest (CR1) when created removing the risk of unfair allocation.
Cons: the operator will need to keep CR1 in a pending state while managing the lifecycle of the composable resource
With approach 2, based on @hase1128 and @fj-zhang-lei extension proposal of the composable resource operator, we are already considering a separated lifecycle management for the resources (ComposableResource). While the ComposabilityRequest has a NodeAllocating state nodes and resources are paired.