Skip to content

Commit 595d47d

Browse files
committed
add kep
1 parent 5af1783 commit 595d47d

File tree

2 files changed

+149
-0
lines changed

2 files changed

+149
-0
lines changed

kep/##-resourcepolicy/README.md

+144
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# Resource Policy
2+
3+
## Table of Contents
4+
5+
- Summary
6+
- Motivation
7+
- Goals
8+
- Non-Goals
9+
- Proposal
10+
- CRD API
11+
- Implementation details
12+
- Use Cases
13+
- Known limitations
14+
- Test plans
15+
- Graduation criteria
16+
- Production Readiness Review Questionnaire
17+
- Feature enablement and rollback
18+
- Implementation history
19+
20+
## Summary
21+
This proposal introduces a plugin to allow users to specify the priority of different resources and max resource consumption for workload on differnet resources.
22+
23+
## Motivation
24+
The machines in a Kubernetes cluster are typically heterogeneous, with varying CPU, memory, GPU, and pricing. To efficiently utilize the different resources available in the cluster, users can set priorities for machines of different types and configure resource allocations for different workloads. Additionally, they may choose to delete pods running on low priority nodes instead of high priority ones.
25+
26+
### Use Cases
27+
28+
1. As a user of cloud services, there are some stable but expensive ECS instances and some unstable but cheaper Spot instances in my cluster. I hope that my workload can be deployed first on stable ECS instances, and during business peak periods, the Pods that are scaled out are deployed on Spot instances. At the end of the business peak, the Pods on Spot instances are prioritized to be scaled in.
29+
30+
### Goals
31+
32+
1. Delvelop a filter plugin to restrict the resource consumption on each unit for different workloads.
33+
2. Develop a score plugin to favor nodes matched by a high priority unit.
34+
3. Automatically setting deletion costs on Pods to control the scaling in sequence of workloads through a controller.
35+
36+
### Non-Goals
37+
38+
1. Modify the workload controller to support deletion costs. If the workload don't support deletion costs, scaling in sequence will be random.
39+
2. When creating a ResourcePolicy, if the number of Pods has already violated the quantity constraint of the ResourcePolicy, we will not attempt to delete the excess Pods.
40+
41+
42+
## Proposal
43+
44+
### CRD API
45+
```yaml
46+
apiVersion: scheduling.sigs.x-k8s.io/v1alpha1
47+
kind: ResourcePolicy
48+
metadata:
49+
name: xxx
50+
namespace: xxx
51+
spec:
52+
podSelector:
53+
matchExpressions:
54+
- key: key1
55+
operator: In
56+
values:
57+
- value1
58+
matchLabels:
59+
key1: value1
60+
strategy: prefer
61+
units:
62+
- name: unit1
63+
priority: 5
64+
maxCount: 10
65+
nodeSelector:
66+
matchExpressions:
67+
- key: key1
68+
operator: In
69+
values:
70+
- value1
71+
- name: unit2
72+
priority: 5
73+
maxCount: 10
74+
nodeSelector:
75+
matchExpressions:
76+
- key: key1
77+
operator: In
78+
values:
79+
- value2
80+
- name: unit3
81+
priority: 4
82+
maxCount: 20
83+
nodeSelector:
84+
matchLabels:
85+
key1: value3
86+
```
87+
88+
`Priority` define the priority of each unit. Pods will be scheduled on units with a higher priority.
89+
If all units have the same priority, resourcepolicy will only limit the max pod on these units.
90+
91+
`Strategy` indicate how we treat the nodes doesn't match any unit.
92+
If strategy is `required`, the pod can only be scheduled on nodes that match the units in resource policy.
93+
If strategy is `prefer`, the pod can be scheduled on all nodes, these nodes not match the units will be
94+
considered after all nodes match the units. So if the strategy is `required`, we will return `unschedulable`
95+
for those nodes not match the units.
96+
97+
### Implementation Details
98+
99+
100+
#### Scheduler Plugins
101+
102+
For each unit, we will record which pods were scheduled on it to prevent too many pods scheduled on it.
103+
104+
##### PreFilter
105+
PreFilter check if the current pods match only one resource policy. If not, PreFilter will reject the pod.
106+
If yes, PreFilter will get the number of pods on each unit to determine which units are available for the pod
107+
and write this information into cycleState.
108+
109+
##### Filter
110+
Filter check if the node belongs to an available unit. If the node doesn't belong to any unit, we will return
111+
success if the strategy is `prefer`, otherwise we will return unschedulable.
112+
113+
##### Score
114+
If `priority` and `weight` is set in resource policy, we will schedule pod based on `priority` first. For units with the same `priority`, we will spread pods based on `weight`.
115+
116+
Score calculation details:
117+
118+
1. calculate priority score, `scorePriority = priority * 20`
119+
2. normalize score
120+
121+
##### PostFilter
122+
123+
124+
#### Resource Policy Controller
125+
Resource policy controller set deletion cost on pods when the related resource policies were updated or added.
126+
127+
## Known limitations
128+
129+
- Currently deletion costs only take effect on deployment workload.
130+
131+
## Test plans
132+
133+
1. Add detailed unit and integration tests for the plugin and controller.
134+
2. Add basic e2e tests, to ensure all components are working together.
135+
136+
## Graduation criteria
137+
138+
## Production Readiness Review Questionnaire
139+
140+
## Feature enablement and rollback
141+
142+
## Implementation history
143+
144+

kep/##-resourcepolicy/kep.yaml

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
title: Resourcepolicy
2+
kep-number: 593
3+
authors:
4+
- "@KunWuLuan"
5+
- "@fjding"

0 commit comments

Comments
 (0)