Skip to content

Various bugs/enhancement requests in the reclaimableFn in the capacity pluginย #4918

@hajnalmt

Description

@hajnalmt

Description

For starters let's see the current implementation:

ssn.AddReclaimableFn(cp.Name(), func(reclaimer *api.TaskInfo, reclaimees []*api.TaskInfo) ([]*api.TaskInfo, int) {
var victims []*api.TaskInfo
allocations := map[api.QueueID]*api.Resource{}
if !readyToSchedule {
klog.V(3).Infof("Capacity plugin failed to check queue's hierarchical structure!")
return victims, util.Reject
}
for _, reclaimee := range reclaimees {
job := ssn.Jobs[reclaimee.Job]
attr := cp.queueOpts[job.Queue]
if _, found := allocations[job.Queue]; !found {
allocations[job.Queue] = attr.allocated.Clone()
}
allocated := allocations[job.Queue]
exceptReclaimee := allocated.Clone().Sub(reclaimee.Resreq)
// When scalar resource not specified in deserved such as "pods", we should skip it and consider it as infinity,
// so the following first condition will be true and the current queue will not be reclaimed.
if allocated.LessEqual(attr.deserved, api.Infinity) || !attr.guarantee.LessEqual(exceptReclaimee, api.Zero) {
continue
}
allocated.Sub(reclaimee.Resreq)
victims = append(victims, reclaimee)
}
klog.V(4).Infof("Victims from capacity plugin, victims=%+v reclaimer=%s", victims, reclaimer)
return victims, util.Permit
})

Bug 1: Reclaimer Resource Dimension Ignored

The reclaimableFn currently ignores the actual resource dimensions requested by the reclaimer.

It expects it as a parameter but does not use it in the function at all:
https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/plugins/capacity/capacity.go#L105

The linters are not responding to this bug because in the klog output we use the reclaimer variable:
https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/plugins/capacity/capacity.go#L131

klog.V(4).Infof("Victims from capacity plugin, victims=%+v reclaimer=%s", victims, reclaimer)

Translating this means that the function is overselective regarding the reclaimees. Even when the reclaimer has no intersecting resource dimension with the reclaimee we will be reclaiming it.
Yes, we are evicting pods in the reclaiming process, even though it doesn't help to free up actual resources for the reclaimer.

Enhancement 1: Improve Victim Selection

The latter bug implies an enhancement idea regarding the reclaimee-deserved check.

If a reclaimee does not intersect with the queue's deserved resources but does intersect with the reclaimer, it should be considered a victim without further checks.

Enhancement 2: Logging and Comments

The victim selection process lacks sufficient logging and comments, making it difficult to understand and debug. (I went errands on this.)
This is mainly the reason why the bugs are hard to track here.

We just have a log line about the selected victims at the end of the function https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/plugins/capacity/capacity.go#L131 and that's all.

To understand what's the purpose of allocated/allocations in this function is mind-throttling:
https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/plugins/capacity/capacity.go#L120

Bug 2: Incorrect Use of LessEqual

As @zhifei92 correctly points out in the PR: #4893.

LessEqual is used wrongfully in:
https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/plugins/capacity/capacity.go#L125

If the allocated part in this comparision doesn't have a resource dimension but deserved has it, we will incorrectly assume that every task in the queue is a reclaimee.

Bug 3/Enhancement 3: Scalar-Only Queues are Not Supported yet from reclaiming perspective

There were already requests to be able to define Queues without cpu and memory in volcano in deserved. This is not possible yet from the capacity plugin's reclaiming perspective.

Resource API problem description

The current resource API in volcano does not allow us to define a resource actually without cpu or memory:

type Resource struct {
MilliCPU float64
Memory float64
// ScalarResources
ScalarResources map[v1.ResourceName]float64
// MaxTaskNum is only used by predicates; it should NOT
// be accounted in other operators, e.g. Add.
MaxTaskNum int
}

The scalar resources are handled as map so we can check if a scalar resource dimension with name exists in r resource with:

_, ok := r.ScalarResources[name]

On the other hand r.MilliCPU will return 0 if you define an r := api.EmptyResource() since it's go's default value for float64.
This is the reason why if we create a queue (just create a queue), it will not report scalar resources as allocated but cpu and memory as 0:

kubectl get queues default -o yaml

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: default
spec:
  guarantee: {}
  parent: root
  reclaimable: true
  weight: 1
status:
  allocated:
    cpu: "0"
    memory: "0"
  reservation: {}
  state: Open

This is probably historical, since we had cpu and memory dimensions first and appended scalar resources later on.

Why does this matter?

We cannot define the deserved part of a queue without CPU and Memory in volcano because every pod in the queue will be considered as a reclaimee as in Bug2.

This comes from the problem that the allocated.LessEqual(attr.deserved, api.Infinity) comparision won't actually compare these dimensions to be Infinity when they are not defined. We want in this comparision the cpu and memory dimensions to be disregarded like the scalars to support such queues.

This is currently not possible in our resource representation, see:
https://github.com/volcano-sh/volcano/blob/v1.13.1/pkg/scheduler/api/resource_info.go#L409

if !lessEqualFunc(r.MilliCPU, rr.MilliCPU, minResource) {

And I came to the conclusion that it shouldn't be, we can create a new resources API or live with this and come up with a hacky workaround for this special case.

Documentation Updates

  • This feature requires design or user documentation changes.
  • If documentation changes are required, I will ensure the relevant documents are updated and published to the Volcano official website (https://volcano.sh) via the volcano-sh/website repository.
PRs to solve the issue:

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions