Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1040: Weighted borrowing #4733

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 63 additions & 12 deletions keps/sig-api-machinery/1040-priority-and-fairness/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -787,11 +787,7 @@ type ExemptPriorityLevelConfiguration struct {

The fields of `ExemptPriorityLevelConfiguration` limit the borrowing
from exempt priority levels. This type and its use are added in all
the versions (`v1alpha1`, `v1beta1`, `v1beta2`, and `v1beta3`). In
the next version, the common fields of
`LimitedPriorityLevelConfiguration` and
`ExemptPriorityLevelConfiguration` will move to their common ancestor
`PriorityLevelConfigurationSpec`.
Comment on lines -791 to -794
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cutting this out because this plan did not come to fruition.

the versions (`v1alpha1`, `v1beta1`, `v1beta2`, and `v1beta3`).

The limits on borrowing are two-sided: a given priority level has a
limit on how much it may borrow and a limit on how much may be
Expand Down Expand Up @@ -829,6 +825,40 @@ configuration.
| ---- | -------------- | -------- |
| exempt | 0 | 50% |

From release 1.32 onward, borrowing is weighted and the weight is
given in the configuration objects. Prior to release 1.32, borrowing
is unweighted. Put another way, prior to release 1.32 every priority
level has the same weight and it is not a field in the configuration
objects but rather fixed in the implementation.

In release 1.32 the following field is added to
`LimitedPriorityLevelConfiguration`.

```go
type LimitedPriorityLevelConfiguration struct {
...
// `weight` gives this priority level more or less priority in the borrowing
// of concurrency.
// The default configuration objects use values in the range 10--100.
// The default value is 30.
// +optional
Weight *int32
}
```

Following are the weights in the default non-exempt
PriorityLevelConfiguration objects.

| Name | Weight |
| ---- | -----: |
| leader-election | 100 |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel LE should be higher than node-high & system as it can cause severe disruptions than the other two but not sure what practical difference it'd make with different weight values.

| node-high | 100 |
| system | 100 |
| workload-high | 60 |
| workload-low | 30 |
| global-default | 10 |
| catch-all | 10 |

Every priority level `i` has two concurrency limits: its
NominalConcurrencyLimit (`NominalCL(i)`) as defined above by
configuration, and a CurrentConcurrencyLimit (`CurrentCL(i)`) ---
Expand Down Expand Up @@ -953,11 +983,14 @@ When `MinCLSum < RemainingServerCL < MinCurrentCLSum`, the concurrency
available to non-exempt levels is above their rock-bottom limit but
not enough to give each its `MinCurrentCL`. In this case and when
`RemainingServerCL = MinCurrentCLSum` we share the excess above
`MinCL` proportionally, using the following formula.
`MinCL` proportionally to each level's deficit and weight, using the
following formulae.

```
CurrentCL(i) = MinCL(i) + ( MinCurrentCL(i) - MinCL(i) ) *
(RemainingServerCL-MinCLSum) / (MinCurrentCLSum-MinCLSum)
CurrentCL(i) = MinCL(i) + ( MinCurrentCL(i) - MinCL(i) ) * Weight(i) *
(RemainingServerCL-MinCLSum) / WeightedDiffSum
WeightedDiffSum = sum[non-exempt priority level i]
( MinCurrentCL(i) - MinCL(i) ) * Weight(i)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not have fully understood those formulae, but questions:

  • why can't we just apply the previous logic -- scaling down the CurrentCL proportional to RemainingServerCL / LowerBoundSum? why do the weights have to impact this if no borrowing is happening here.
  • A more direct question: I tested APF a while ago and learned a rule of "the owner keeps the shares if inuse", now would the weights break this rule? i.e. PLs with lower weights will be borrowed even when their demands are higher than their NominalCL?

Copy link
Member Author

@MikeSpreitzer MikeSpreitzer Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, remember that the PR that you are looking at is actually combining two changes: one is #4623 , which aims to heal the long-standing divergence between code and KEP by moving both to a middle ground, and then there is the change to add weighting to the priority levels.

It is true that when MinCLSum < RemainingServerCL < MinCurrentCLSum there is no borrowing between non-exempt priority levels, but the exempt level(s) have borrowed so much from the non-exempt ones that each must give up some concurrency. So it is actually a borrowing scenario.

Remember that you have not tested the behavior described in #4623 unless you tested its implementation in kubernetes/kubernetes#124736 .

Regarding "the owner keeps the shares if in use", I would be a little more precise: the rule is that each non-exempt level gets its nominal concurrency limit if at least that much has been in use AND the borrowing by exempt levels is not so great as to cut into that. The formulae that you have commented on concern the case where the exempt level(s) have borrowed so much that the non-exempt ones can not get their nominal concurrency limits.

```

Finally, when `MinCurrentCLSum < RemainingServerCL` there _is_ wiggle
Expand All @@ -974,22 +1007,24 @@ Target(i) = max( MinCurrentCL(i), SmoothSeatDemand(i) )

Sadly, the sum of the Target values --- let's name that TargetSum ---
is not necessarily equal to `RemainingServerCL`. However, if
`TargetSum < RemainingServerCL` then all the Targets could be scaled
`TargetSum < RemainingServerCL` then all the Targets could (if we wanted even weighting) be scaled
up in the same proportion `FairProp = RemainingServerCL / TargetSum`
(if that did not violate any upper bound) to get the new concurrency
limits `CurrentCL(i) := FairProp * Target(i)` for each non-exempt
priority level `i`. Similarly, if `TargetSum > RemainingServerCL`
then all the Targets could be scaled down in the same proportion (if
then all the Targets could (if we wanted even weighting) be scaled down in the same proportion (if
that did not violate any lower bound) to get the new concurrency
limits. This shares the wealth or the pain proportionally among the
priority levels (but note: the upper bound does not affect the target,
lest the pain of not achieving a high SmoothSeatDemand be distorted,
while the lower bound _does_ affect the target, so that merely
achieving the lower bound is not considered a gain). The following
computation generalizes this idea to respect the relevant bounds.
computation generalizes this idea to (a) respect the relevant bounds
and (b) give different weights to different levels' borrowing/lending.

First consider just the bounds, assuming even weighting.
We can not necessarily scale all the Targets by the same factor ---
because that might violate some upper or lower bounds. The problem is
because that might violate some upper or lower bounds. The even-weighted problem is
to find a proportion `FairProp` that can be shared by all the
non-exempt priority levels except those with a bound that forbids it.
This means to find a value of `FairProp` that simultaneously solves
Expand All @@ -1002,6 +1037,22 @@ that is OK, because they all produce the same CurrentCL values.
CurrentCL(i) = min( MaxCL(i), max( MinCurrentCL(i), FairProp * Target(i) ))
```

The weighted version replaces `FairProp` with `1 + Delta * Weight(i)
** sgn(Delta)`, with `Delta` becoming the scalar value to solve
for. Here `**` is the power function (X to the power of Y) and `sgn`
is the sign function: it maps 0 to 0, every negative number to -1, and
every positive number to 1. That is: when there is extra concurrency
to distribute among the non-exempt priority levels, the ones with
higher weight get more, and when all must give up some, the ones with
higher weight give up less. Thus, taking weights into account, the
formula for CurrentCL is as follows.

```
CurrentCL(i) = min( MaxCL(i), max( MinCurrentCL(i),
(1 + Delta * Weight(i) ** sgn(Delta))
* Target(i) ))
```

This is similar to the max-min fairness problem and can be solved
using sorting and then a greedy algorithm, taking O(N log N) time and
O(N) space.
Expand Down