-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
For starters let's see the current implementation:
volcano/pkg/scheduler/plugins/capacity/capacity.go
Lines 105 to 133 in 0c6f3bf
| ssn.AddReclaimableFn(cp.Name(), func(reclaimer *api.TaskInfo, reclaimees []*api.TaskInfo) ([]*api.TaskInfo, int) { | |
| var victims []*api.TaskInfo | |
| allocations := map[api.QueueID]*api.Resource{} | |
| if !readyToSchedule { | |
| klog.V(3).Infof("Capacity plugin failed to check queue's hierarchical structure!") | |
| return victims, util.Reject | |
| } | |
| for _, reclaimee := range reclaimees { | |
| job := ssn.Jobs[reclaimee.Job] | |
| attr := cp.queueOpts[job.Queue] | |
| if _, found := allocations[job.Queue]; !found { | |
| allocations[job.Queue] = attr.allocated.Clone() | |
| } | |
| allocated := allocations[job.Queue] | |
| exceptReclaimee := allocated.Clone().Sub(reclaimee.Resreq) | |
| // When scalar resource not specified in deserved such as "pods", we should skip it and consider it as infinity, | |
| // so the following first condition will be true and the current queue will not be reclaimed. | |
| if allocated.LessEqual(attr.deserved, api.Infinity) || !attr.guarantee.LessEqual(exceptReclaimee, api.Zero) { | |
| continue | |
| } | |
| allocated.Sub(reclaimee.Resreq) | |
| victims = append(victims, reclaimee) | |
| } | |
| klog.V(4).Infof("Victims from capacity plugin, victims=%+v reclaimer=%s", victims, reclaimer) | |
| return victims, util.Permit | |
| }) |
Bug 1: Reclaimer Resource Dimension Ignored
The reclaimableFn currently ignores the actual resource dimensions requested by the reclaimer.
It expects it as a parameter but does not use it in the function at all:
https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/plugins/capacity/capacity.go#L105
The linters are not responding to this bug because in the klog output we use the reclaimer variable:
https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/plugins/capacity/capacity.go#L131
| klog.V(4).Infof("Victims from capacity plugin, victims=%+v reclaimer=%s", victims, reclaimer) |
Translating this means that the function is overselective regarding the reclaimees. Even when the reclaimer has no intersecting resource dimension with the reclaimee we will be reclaiming it.
Yes, we are evicting pods in the reclaiming process, even though it doesn't help to free up actual resources for the reclaimer.
Enhancement 1: Improve Victim Selection
The latter bug implies an enhancement idea regarding the reclaimee-deserved check.
If a reclaimee does not intersect with the queue's deserved resources but does intersect with the reclaimer, it should be considered a victim without further checks.
Enhancement 2: Logging and Comments
The victim selection process lacks sufficient logging and comments, making it difficult to understand and debug. (I went errands on this.)
This is mainly the reason why the bugs are hard to track here.
We just have a log line about the selected victims at the end of the function https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/plugins/capacity/capacity.go#L131 and that's all.
To understand what's the purpose of allocated/allocations in this function is mind-throttling:
https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/plugins/capacity/capacity.go#L120
Bug 2: Incorrect Use of LessEqual
As @zhifei92 correctly points out in the PR: #4893.
LessEqual is used wrongfully in:
https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/plugins/capacity/capacity.go#L125
If the allocated part in this comparision doesn't have a resource dimension but deserved has it, we will incorrectly assume that every task in the queue is a reclaimee.
Bug 3/Enhancement 3: Scalar-Only Queues are Not Supported yet from reclaiming perspective
There were already requests to be able to define Queues without cpu and memory in volcano in deserved. This is not possible yet from the capacity plugin's reclaiming perspective.
Resource API problem description
The current resource API in volcano does not allow us to define a resource actually without cpu or memory:
volcano/pkg/scheduler/api/resource_info.go
Lines 58 to 69 in 0c6f3bf
| type Resource struct { | |
| MilliCPU float64 | |
| Memory float64 | |
| // ScalarResources | |
| ScalarResources map[v1.ResourceName]float64 | |
| // MaxTaskNum is only used by predicates; it should NOT | |
| // be accounted in other operators, e.g. Add. | |
| MaxTaskNum int | |
| } | |
The scalar resources are handled as map so we can check if a scalar resource dimension with name exists in r resource with:
_, ok := r.ScalarResources[name]
On the other hand r.MilliCPU will return 0 if you define an r := api.EmptyResource() since it's go's default value for float64.
This is the reason why if we create a queue (just create a queue), it will not report scalar resources as allocated but cpu and memory as 0:
kubectl get queues default -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: default
spec:
guarantee: {}
parent: root
reclaimable: true
weight: 1
status:
allocated:
cpu: "0"
memory: "0"
reservation: {}
state: OpenThis is probably historical, since we had cpu and memory dimensions first and appended scalar resources later on.
Why does this matter?
We cannot define the deserved part of a queue without CPU and Memory in volcano because every pod in the queue will be considered as a reclaimee as in Bug2.
This comes from the problem that the allocated.LessEqual(attr.deserved, api.Infinity) comparision won't actually compare these dimensions to be Infinity when they are not defined. We want in this comparision the cpu and memory dimensions to be disregarded like the scalars to support such queues.
This is currently not possible in our resource representation, see:
https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/api/resource_info.go#L409
volcano/pkg/scheduler/api/resource_info.go
Line 409 in 0c6f3bf
| if !lessEqualFunc(r.MilliCPU, rr.MilliCPU, minResource) { |
And I came to the conclusion that it shouldn't be, we can create a new resources API or live with this and come up with a hacky workaround for this special case.
Documentation Updates
- This feature requires design or user documentation changes.
- If documentation changes are required, I will ensure the relevant documents are updated and published to the Volcano official website (https://volcano.sh) via the volcano-sh/website repository.
PRs to solve the issue:
- As an initial step we shall have the proper resource comparision logic implemented:
- Add unit tests for these issues.
- Second commit of: fix/feat(capacity): AddReclaimableFn refactorย #4919
- Based on the defined resource comparision logic we can fix the bugs.
- First commit of: fix/feat(capacity): AddReclaimableFn refactorย #4919
- Update the documentation about the new capability of the plugin.