Skip to content

Commit 27081bb

Browse files
committed
Update README.md
1 parent 7b4f893 commit 27081bb

File tree

1 file changed

+53
-39
lines changed

1 file changed

+53
-39
lines changed

kep/594-resourcepolicy/README.md

+53-39
Original file line numberDiff line numberDiff line change
@@ -23,39 +23,34 @@
2323
<!-- /toc -->
2424

2525
## Summary
26-
This proposal introduces a plugin to allow users to specify the priority of different resources and max resource
27-
consumption for workload on differnet resources.
26+
This proposal introduces a plugin that enables users to set priorities for various resources and define maximum resource consumption limits for workloads across different resources.
2827

2928
## Motivation
30-
The machines in a Kubernetes cluster are typically heterogeneous, with varying CPU, memory, GPU, and pricing. To
29+
A Kubernetes cluster typically consists of heterogeneous machines, with varying SKUs on CPU, memory, GPU, and pricing. To
3130
efficiently utilize the different resources available in the cluster, users can set priorities for machines of different
3231
types and configure resource allocations for different workloads. Additionally, they may choose to delete pods running
3332
on low priority nodes instead of high priority ones.
3433

3534
### Use Cases
3635

37-
1. As a user of cloud services, there are some stable but expensive ECS instances and some unstable but cheaper Spot
38-
instances in my cluster. I hope that my workload can be deployed first on stable ECS instances, and during business peak
39-
periods, the Pods that are scaled out are deployed on Spot instances. At the end of the business peak, the Pods on Spot
40-
instances are prioritized to be scaled in.
36+
1. As a user of cloud services, there are some static but expensive VM instances and some dynamic but cheaper Spot
37+
instances in my cluster. I hope that my workload can be deployed first on static VM instances, and during business peak
38+
periods, the Pods that are scaled up are deployed on Spot instances. At the end of the business peak, the Pods on Spot
39+
instances are prioritized to be scaled down.
4140

4241
### Goals
4342

44-
1. Develop a filter plugin to restrict the resource consumption on each unit for different workloads.
45-
2. Develop a score plugin to favor nodes matched by a high priority unit.
43+
1. Develop a filter plugin to restrict the resource consumption on each kind of resource for different workloads.
44+
2. Develop a score plugin to favor nodes matched by a high priority kind of resource.
4645
3. Automatically setting deletion costs on Pods to control the scaling in sequence of workloads through a controller.
4746

4847
### Non-Goals
4948

50-
1. Modify the workload controller to support deletion costs. If the workload don't support deletion costs, scaling in
51-
sequence will be random.
52-
2. When creating a ResourcePolicy, if the number of Pods has already violated the quantity constraint of the
53-
ResourcePolicy, we will not attempt to delete the excess Pods.
54-
49+
1. Scheduler will not delete the pods.
5550

5651
## Proposal
5752

58-
### CRD API
53+
### API
5954
```yaml
6055
apiVersion: scheduling.sigs.x-k8s.io/v1alpha1
6156
kind: ResourcePolicy
@@ -67,8 +62,6 @@ spec:
6762
- pod-template-hash
6863
matchPolicy:
6964
ignoreTerminatingPod: true
70-
ignorePreviousPod: false
71-
forceMaxNum: false
7265
podSelector:
7366
matchExpressions:
7467
- key: key1
@@ -105,49 +98,70 @@ spec:
10598
key1: value3
10699
```
107100
108-
`Priority` define the priority of each unit. Pods will be scheduled on units with a higher priority.
101+
```go
102+
type ResourcePolicy struct {
103+
ObjectMeta
104+
TypeMeta
105+
106+
Spec ResourcePolicySpec
107+
}
108+
type ResourcePolicySpec struct {
109+
MatchLabelKeys []string
110+
MatchPolicy MatchPolicy
111+
Strategy string
112+
PodSelector metav1.LabelSelector
113+
Units []Unit
114+
}
115+
type MatchPolicy struct {
116+
IgnoreTerminatingPod bool
117+
}
118+
type Unit struct {
119+
Priority *int32
120+
MaxCount *int32
121+
NodeSelector metav1.LabelSelector
122+
}
123+
```
124+
125+
Pods will be matched by the ResourcePolicy in same namespace when the `.spec.podSelector`. And if `.spec.matchPolicy.ignoreTerminatingPod` is `true`, pods with Non-Zero `.spec.deletionTimestamp` will be ignored.
126+
ResourcePolicies will never match pods in different namesapces. One pod can not be matched by more than one Resource Policies.
127+
128+
Pods can only be scheduled on units defined in `.spec.units` and this behavior can be changed by `.spec.strategy`. Each item in `.spec.units` contains a set of nodes that match the `NodeSelector` which describes a kind of resource in the cluster.
129+
130+
`.spec.units[].priority` define the priority of each unit. Units with higher priority will get higher score in the score plugin.
109131
If all units have the same priority, resourcepolicy will only limit the max pod on these units.
132+
If the `.spec.units[].priority` is not set, the default value is 0.
133+
`.spec.units[].maxCount` define the maximum number of pods that can be scheduled on each unit. If `.spec.units[].maxCount` is not set, pods can always be scheduled on the units except there is no enough resource.
110134

111-
`Strategy` indicate how we treat the nodes doesn't match any unit.
135+
`.spec.strategy` indicate how we treat the nodes doesn't match any unit.
112136
If strategy is `required`, the pod can only be scheduled on nodes that match the units in resource policy.
113137
If strategy is `prefer`, the pod can be scheduled on all nodes, these nodes not match the units will be
114138
considered after all nodes match the units. So if the strategy is `required`, we will return `unschedulable`
115139
for those nodes not match the units.
116140

117-
`MatchLabelKeys` indicate how we group the pods matched by `podSelector` and `matchPolicy`, its behavior is like
118-
`MatchLabelKeys` in `PodTopologySpread`.
119-
120-
`matchPolicy` indicate if we should ignore some kind pods when calculate pods in certain unit.
121-
122-
If `forceMaxNum` is set `true`, we will not try the next units when one unit is not full, this property have no effect
123-
when `max` is not set in units.
141+
`.spec.matchLabelKeys` indicate how we group the pods matched by `podSelector` and `matchPolicy`, its behavior is like
142+
`.spec.matchLabelKeys` in `PodTopologySpread`.
124143

125144
### Implementation Details
126145

127-
#### Scheduler Plugins
128-
129-
For each unit, we will record which pods were scheduled on it to prevent too many pods scheduled on it.
130-
131-
##### PreFilter
146+
#### PreFilter
132147
PreFilter check if the current pods match only one resource policy. If not, PreFilter will reject the pod.
133148
If yes, PreFilter will get the number of pods on each unit to determine which units are available for the pod
134149
and write this information into cycleState.
135150

136-
##### Filter
151+
#### Filter
137152
Filter check if the node belongs to an available unit. If the node doesn't belong to any unit, we will return
138-
success if the strategy is `prefer`, otherwise we will return unschedulable.
153+
success if the `.spec.strategy` is `prefer`, otherwise we will return unschedulable.
139154

140155
Besides, filter will check if the pods that was scheduled on the unit has already violated the quantity constraint.
141-
If the number of pods has reach the `maxCount`, all the nodes in unit will be marked unschedulable.
156+
If the number of pods has reach the `.spec.unit[].maxCount`, all the nodes in unit will be marked unschedulable.
142157

143-
##### Score
144-
If `priority` is set in resource policy, we will schedule pod based on `priority`. Default priority is 1, and minimum
145-
priority is 1.
158+
#### Score
159+
If `.spec.unit[].priority` is set in resource policy, we will schedule pod based on `.spec.unit[].priority`. Default priority is 0, and minimum
160+
priority is 0.
146161

147162
Score calculation details:
148163

149-
1. calculate priority score, `scorePriority = (priority-1) * 20`, to make sure we give nodes without priority a minimum
150-
score.
164+
1. calculate priority score, `scorePriority = (priority) * 20`, to make sure we give nodes without priority a minimum score.
151165
2. normalize score
152166

153167
#### Resource Policy Controller

0 commit comments

Comments
 (0)