You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+122-1
Original file line number
Diff line number
Diff line change
@@ -42,4 +42,125 @@ The principle of LoadAware is shown in the figure above:
42
42
43
43
- Over-utilized nodes: nodes with resource utilization higher than 80%. Hotspot nodes will evict some Pods and reduce the load level to no more than 80%. The descheduler will schedule the Pods on the hotspot nodes to the idle nodes.
44
44
45
-
- Under-utilized nodes: nodes with resource utilization lower than 30%.
45
+
- Under-utilized nodes: nodes with resource utilization lower than 30%.
46
+
47
+
# Quick start
48
+
49
+
## Prepare
50
+
51
+
Install [prometheue](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus) or [prometheus-adaptor](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-adapter), and [prometheus-node-exporter](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-node-exporter), The real load of the node is exposed to the `Volcano descheduler` through node-exporter and prometheus.
52
+
53
+
Add the following automatic discovery and node label replacement rules for the node-exporter service in the `scrape_configs` configuration of prometheus. This step is very important, otherwise `Volcano descheduler` cannot get the real load metrics of the node. For more details about `scrape_configs`, please refer to [Configuration | Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
The default descheduling configuration is in the `volcano-descheduler` configMap under the `volcano-system` namespace. You can update the descheduling configuration by modifying the data in the configMap. The plugins enabled by default are `LoadAware` and `DefaultEvictor`, which perform load-aware descheduling and eviction respectively.
80
+
81
+
```yaml
82
+
apiVersion: "descheduler/v1alpha2"
83
+
kind: "DeschedulerPolicy"
84
+
profiles:
85
+
- name: default
86
+
pluginConfig:
87
+
- args:
88
+
ignorePvcPods: true
89
+
nodeFit: true
90
+
priorityThreshold:
91
+
value: 10000
92
+
name: DefaultEvictor
93
+
- args:
94
+
evictableNamespaces:
95
+
exclude:
96
+
- kube-system
97
+
metrics:
98
+
address: null
99
+
type: null
100
+
targetThresholds:
101
+
cpu: 80# Eviction will be triggered when the node CPU utilization exceeds 80%
102
+
memory: 85# Eviction will be triggered when the node memory utilization exceeds 85%
103
+
thresholds:
104
+
cpu: 30# Pods can be scheduled to nodes whose CPU resource utilization is less than 30%
105
+
memory: 30# Pods can be scheduled to nodes whose memory resource utilization is less than 30%.
106
+
name: LoadAware
107
+
plugins:
108
+
balance:
109
+
enabled:
110
+
- LoadAware
111
+
```
112
+
113
+
For the full configuration and parameter description of the `DefaultEvictor` plugin, please refer to: [DefaultEvictor Configuration](https://github.com/kubernetes-sigs/descheduler/tree/master#evictor-plugin-configuration-default-evictor).
| nodeFit | bool | false | Set to `true` the descheduler will consider whether or not the pods that meet eviction criteria will fit on other nodes before evicting them. |
122
+
| numberOfNodes | int | 0 | This parameter can be configured to activate the strategy only when the number of under utilized nodes are above the configured value. This could be helpful in large clusters where a few nodes could go under utilized frequently or for a short period of time. |
123
+
| duration | string | 2m | The time range specified when querying the actual utilization metrics of nodes, only takes effect when `metrics.type` is configured as `prometheus`. |
124
+
| metrics | map(string:string) | nil | **Required Field**<br/>Contains two parameters: <br/>type: The type of metrics source, only supports `prometheus` and `prometheus_adaptor`.<br/>address: The service address of `prometheus`. |
125
+
| targetThresholds | map(string:int) | nil | **Required Field**<br/>Supported configuration keys are `cpu`, `memory`, and `pods`.<br/>When the node resource utilization (for `cpu` or `memory`) exceeds the setting threshold, it will trigger Pods eviction on the node, with the unit being %.<br/>When the number of Pods on the node exceeds the set threshold, it will trigger Pods eviction on the node, with the unit being number. |
126
+
| thresholds | map(string:int) | nil | **Required Field**<br/>The evicted Pods should be scheduled to nodes with utilization below the `thresholds`.<br/>The threshold for the same resource type cannot exceed the threshold set in `targetThresholds`. |
127
+
128
+
In addition to the above `LoadAware plugin` enhancements, `Volcano descheduler` also supports native descheduler functions and plugins. If you want to configure other native plugins, please refer to: [kubernetes-sigs/descheduler](https://github.com/kubernetes-sigs/descheduler/blob/master/docs/user-guide.md).
129
+
130
+
# Best practices
131
+
132
+
When the Pods on the node with relatively high resource utilization are evicted, we expect that the new created Pods should avoid being scheduled to the node with relatively high resource utilization again. Therefore, the `Volcano scheduler` also needs to enable the plugin `usage` based on real load awareness, for detailed description and configuration of `usage`, please refer to: [volcano usage plugin](https://github.com/volcano-sh/volcano/blob/master/docs/design/usage-based-scheduling.md).
133
+
134
+
# Trouble shotting
135
+
136
+
When the configuration parameter `metrics.type` of the LoadAware plugin is set to `prometheus`, `Volcano scheduler` queries the actual utilization of cpu and memory through the following `PromQL` statement. When the expected eviction behavior does not occur, you can query it manually through prometheus, check whether the node metrics are correctly exposed, and compare it with the log of `Volcano descheduler` to judge its actual behavior.
137
+
138
+
**cpu:**
139
+
140
+
```shell
141
+
avg_over_time((1 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle",instance="$replace_with_your_node_name"}[30s])) * 1))[2m:30s])
The release cadence of the `descheduler` is not synchronized with that of [Volcano](https://github.com/volcano-sh/volcano). This is because the `descheduler` is a sub-repository under volcano-sh, and its code and feature changes are relatively minor. We will adapt to the upstream Kubernetes community's descheduler project as needed and release new versions accordingly.
0 commit comments