Skip to content

Commit 851b89d

Browse files
committed
Add deploy yaml&guide
Signed-off-by: Monokaix <[email protected]>
1 parent 7b7e170 commit 851b89d

File tree

4 files changed

+264
-4
lines changed

4 files changed

+264
-4
lines changed

.github/workflows/code_verify.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,6 @@ jobs:
3434
- name: Run verify test
3535
run: |
3636
make verify
37-
make all
37+
make image
3838
sudo make unit-test
3939
working-directory: ./src/github.com/${{ github.repository }}

Makefile

+2-2
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ else
5454
GOARCH?=$(OSARCH)
5555
endif
5656

57-
# Run `make images DOCKER_PLATFORMS="linux/amd64,linux/arm64" BUILDX_OUTPUT_TYPE=registry IMAGE_PREFIX=[yourregistry]` to push multi-platform
57+
# Run `make image DOCKER_PLATFORMS="linux/amd64,linux/arm64" BUILDX_OUTPUT_TYPE=registry IMAGE_PREFIX=[yourregistry]` to push multi-platform
5858
DOCKER_PLATFORMS ?= "linux/${GOARCH}"
5959

6060
GOOS ?= linux
@@ -74,7 +74,7 @@ vc-descheduler: init
7474

7575
image_bins: vc-descheduler
7676

77-
images:
77+
image:
7878
for name in descheduler; do\
7979
docker buildx build -t "${IMAGE_PREFIX}/vc-$$name:$(TAG)" . -f ./installer/dockerfile/$$name/Dockerfile --output=type=${BUILDX_OUTPUT_TYPE} --platform ${DOCKER_PLATFORMS} --build-arg APK_MIRROR=${APK_MIRROR} --build-arg OPEN_EULER_IMAGE_TAG=${OPEN_EULER_IMAGE_TAG}; \
8080
done

README.md

+122-1
Original file line numberDiff line numberDiff line change
@@ -42,4 +42,125 @@ The principle of LoadAware is shown in the figure above:
4242

4343
- Over-utilized nodes: nodes with resource utilization higher than 80%. Hotspot nodes will evict some Pods and reduce the load level to no more than 80%. The descheduler will schedule the Pods on the hotspot nodes to the idle nodes.
4444

45-
- Under-utilized nodes: nodes with resource utilization lower than 30%.
45+
- Under-utilized nodes: nodes with resource utilization lower than 30%.
46+
47+
# Quick start
48+
49+
## Prepare
50+
51+
Install [prometheue](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus) or [prometheus-adaptor](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-adapter), and [prometheus-node-exporter](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-node-exporter), The real load of the node is exposed to the `Volcano descheduler` through node-exporter and prometheus.
52+
53+
Add the following automatic discovery and node label replacement rules for the node-exporter service in the `scrape_configs` configuration of prometheus. This step is very important, otherwise `Volcano descheduler` cannot get the real load metrics of the node. For more details about `scrape_configs`, please refer to [Configuration | Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
54+
55+
```yaml
56+
scrape_configs:
57+
- job_name: 'kubernetes-service-endpoints'
58+
kubernetes_sd_configs:
59+
- role: endpoints
60+
relabel_configs:
61+
- source_labels: [__meta_kubernetes_pod_node_name]
62+
action: replace
63+
target_label: instance
64+
```
65+
66+
## Install Volcano descheduler
67+
68+
### Install via yaml
69+
70+
```shell
71+
# create ns first.
72+
kubectl create ns volcano-system
73+
# deploy descheduler yaml.
74+
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/descheduler/main/installer/volcano-descheduler-development.yaml
75+
```
76+
77+
## Configurations
78+
79+
The default descheduling configuration is in the `volcano-descheduler` configMap under the `volcano-system` namespace. You can update the descheduling configuration by modifying the data in the configMap. The plugins enabled by default are `LoadAware` and `DefaultEvictor`, which perform load-aware descheduling and eviction respectively.
80+
81+
```yaml
82+
apiVersion: "descheduler/v1alpha2"
83+
kind: "DeschedulerPolicy"
84+
profiles:
85+
- name: default
86+
pluginConfig:
87+
- args:
88+
ignorePvcPods: true
89+
nodeFit: true
90+
priorityThreshold:
91+
value: 10000
92+
name: DefaultEvictor
93+
- args:
94+
evictableNamespaces:
95+
exclude:
96+
- kube-system
97+
metrics:
98+
address: null
99+
type: null
100+
targetThresholds:
101+
cpu: 80 # Eviction will be triggered when the node CPU utilization exceeds 80%
102+
memory: 85 # Eviction will be triggered when the node memory utilization exceeds 85%
103+
thresholds:
104+
cpu: 30 # Pods can be scheduled to nodes whose CPU resource utilization is less than 30%
105+
memory: 30 # Pods can be scheduled to nodes whose memory resource utilization is less than 30%.
106+
name: LoadAware
107+
plugins:
108+
balance:
109+
enabled:
110+
- LoadAware
111+
```
112+
113+
For the full configuration and parameter description of the `DefaultEvictor` plugin, please refer to: [DefaultEvictor Configuration](https://github.com/kubernetes-sigs/descheduler/tree/master#evictor-plugin-configuration-default-evictor).
114+
115+
`LoadAware` plugin parameter description:
116+
117+
| Name | type | Default Value | Description |
118+
| :-----------------: | :------------------: | :-----------: | :----------------------------------------------------------: |
119+
| nodeSelector | string | nil | Limiting the nodes which are processed |
120+
| evictableNamespaces | map(string:[]string) | nil | Exclude evicting pods under excluded namespaces |
121+
| nodeFit | bool | false | Set to `true` the descheduler will consider whether or not the pods that meet eviction criteria will fit on other nodes before evicting them. |
122+
| numberOfNodes | int | 0 | This parameter can be configured to activate the strategy only when the number of under utilized nodes are above the configured value. This could be helpful in large clusters where a few nodes could go under utilized frequently or for a short period of time. |
123+
| duration | string | 2m | The time range specified when querying the actual utilization metrics of nodes, only takes effect when `metrics.type` is configured as `prometheus`. |
124+
| metrics | map(string:string) | nil | **Required Field**<br/>Contains two parameters: <br/>type: The type of metrics source, only supports `prometheus` and `prometheus_adaptor`.<br/>address: The service address of `prometheus`. |
125+
| targetThresholds | map(string:int) | nil | **Required Field**<br/>Supported configuration keys are `cpu`, `memory`, and `pods`.<br/>When the node resource utilization (for `cpu` or `memory`) exceeds the setting threshold, it will trigger Pods eviction on the node, with the unit being %.<br/>When the number of Pods on the node exceeds the set threshold, it will trigger Pods eviction on the node, with the unit being number. |
126+
| thresholds | map(string:int) | nil | **Required Field**<br/>The evicted Pods should be scheduled to nodes with utilization below the `thresholds`.<br/>The threshold for the same resource type cannot exceed the threshold set in `targetThresholds`. |
127+
128+
In addition to the above `LoadAware plugin` enhancements, `Volcano descheduler` also supports native descheduler functions and plugins. If you want to configure other native plugins, please refer to: [kubernetes-sigs/descheduler](https://github.com/kubernetes-sigs/descheduler/blob/master/docs/user-guide.md).
129+
130+
# Best practices
131+
132+
When the Pods on the node with relatively high resource utilization are evicted, we expect that the new created Pods should avoid being scheduled to the node with relatively high resource utilization again. Therefore, the `Volcano scheduler` also needs to enable the plugin `usage` based on real load awareness, for detailed description and configuration of `usage`, please refer to: [volcano usage plugin](https://github.com/volcano-sh/volcano/blob/master/docs/design/usage-based-scheduling.md).
133+
134+
# Trouble shotting
135+
136+
When the configuration parameter `metrics.type` of the LoadAware plugin is set to `prometheus`, `Volcano scheduler` queries the actual utilization of cpu and memory through the following `PromQL` statement. When the expected eviction behavior does not occur, you can query it manually through prometheus, check whether the node metrics are correctly exposed, and compare it with the log of `Volcano descheduler` to judge its actual behavior.
137+
138+
**cpu:**
139+
140+
```shell
141+
avg_over_time((1 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle",instance="$replace_with_your_node_name"}[30s])) * 1))[2m:30s])
142+
```
143+
144+
**memory:**
145+
146+
```shell
147+
avg_over_time(((1-node_memory_MemAvailable_bytes{instance="$replace_with_your_node_name"}/node_memory_MemTotal_bytes{instance="$replace_with_your_node_name"}))[2m:30s])
148+
```
149+
150+
# Development
151+
152+
## build binary
153+
154+
```shell
155+
make vc-descheduler
156+
```
157+
158+
## build image
159+
160+
```shell
161+
make image
162+
```
163+
164+
# Release Guide
165+
166+
The release cadence of the `descheduler` is not synchronized with that of [Volcano](https://github.com/volcano-sh/volcano). This is because the `descheduler` is a sub-repository under volcano-sh, and its code and feature changes are relatively minor. We will adapt to the upstream Kubernetes community's descheduler project as needed and release new versions accordingly.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
apiVersion: v1
2+
kind: ServiceAccount
3+
metadata:
4+
name: volcano-descheduler
5+
namespace: volcano-system
6+
7+
---
8+
apiVersion: v1
9+
kind: ConfigMap
10+
metadata:
11+
name: volcano-descheduler
12+
namespace: volcano-system
13+
data:
14+
policy.yaml: |
15+
apiVersion: "descheduler/v1alpha2"
16+
kind: "DeschedulerPolicy"
17+
profiles:
18+
- name: default
19+
pluginConfig:
20+
- args:
21+
ignorePvcPods: true
22+
nodeFit: true
23+
priorityThreshold:
24+
value: 10000
25+
name: DefaultEvictor
26+
- args:
27+
evictableNamespaces:
28+
exclude:
29+
- kube-system
30+
metrics:
31+
address: null
32+
type: null
33+
targetThresholds:
34+
cpu: 80
35+
memory: 85
36+
thresholds:
37+
cpu: 30
38+
memory: 30
39+
name: LoadAware
40+
plugins:
41+
balance:
42+
enabled:
43+
- LoadAware
44+
45+
---
46+
kind: ClusterRole
47+
apiVersion: rbac.authorization.k8s.io/v1
48+
metadata:
49+
name: volcano-descheduler
50+
rules:
51+
- apiGroups: ["events.k8s.io"]
52+
resources: ["events"]
53+
verbs: ["create", "update"]
54+
- apiGroups: [""]
55+
resources: ["nodes"]
56+
verbs: ["get", "watch", "list"]
57+
- apiGroups: [""]
58+
resources: ["namespaces"]
59+
verbs: ["get", "watch", "list"]
60+
- apiGroups: [""]
61+
resources: ["pods"]
62+
verbs: ["get", "watch", "list", "delete"]
63+
- apiGroups: [""]
64+
resources: ["pods/eviction"]
65+
verbs: ["create"]
66+
- apiGroups: ["scheduling.k8s.io"]
67+
resources: ["priorityclasses"]
68+
verbs: ["get", "watch", "list"]
69+
- apiGroups: ["metrics.k8s.io"]
70+
resources: ["pods"]
71+
verbs: ["get", "list", "watch"]
72+
73+
---
74+
apiVersion: rbac.authorization.k8s.io/v1
75+
kind: ClusterRoleBinding
76+
metadata:
77+
name: volcano-descheduler
78+
roleRef:
79+
apiGroup: rbac.authorization.k8s.io
80+
kind: ClusterRole
81+
name: volcano-descheduler
82+
subjects:
83+
- kind: ServiceAccount
84+
name: volcano-descheduler
85+
namespace: volcano-system
86+
87+
---
88+
kind: Deployment
89+
apiVersion: apps/v1
90+
metadata:
91+
name: volcano-descheduler
92+
namespace: volcano-system
93+
labels:
94+
app: descheduler
95+
k8s-app: descheduler
96+
spec:
97+
replicas: 1
98+
revisionHistoryLimit: 10
99+
selector:
100+
matchLabels:
101+
app: descheduler
102+
k8s-app: descheduler
103+
template:
104+
metadata:
105+
labels:
106+
app: descheduler
107+
k8s-app: descheduler
108+
spec:
109+
serviceAccountName: volcano-descheduler
110+
volumes:
111+
- name: policy-volume
112+
configMap:
113+
name: volcano-descheduler
114+
- name: log
115+
hostPath:
116+
path: /var/log/volcano/descheduler
117+
containers:
118+
- name: descheduler
119+
image: docker.io/volcanosh/vc-descheduler:latest
120+
command: ["sh", "-c"]
121+
args:
122+
- >
123+
/vc-descheduler --descheduling-interval-cron-expression='*/10 * * * *'
124+
--descheduling-interval=10m
125+
--policy-config-file=/policy-dir/policy.yaml
126+
--leader-elect=false
127+
--leader-elect-resource-namespace=volcano-system
128+
--v=3 1>>/var/log/volcano/descheduler/descheduler.log 2>&1
129+
imagePullPolicy: Always
130+
env:
131+
- name: POD_NAMESPACE
132+
valueFrom:
133+
fieldRef:
134+
fieldPath: metadata.namespace
135+
volumeMounts:
136+
- mountPath: /policy-dir
137+
name: policy-volume
138+
- name: log
139+
mountPath: /var/log/volcano/descheduler

0 commit comments

Comments
 (0)