-
Notifications
You must be signed in to change notification settings - Fork 762
feat: Adding a feature maxNodesToProcess to HighNodeUtilization Plugin to limit the nodes processed on each execution
#1706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
a37f36f
d5c3d14
e5b5069
dd5a5c1
5f8bddd
a9c4a94
db64c81
a5b9323
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -483,6 +483,7 @@ actual usage metrics. Implementing metrics-based descheduling is currently TODO | |
| |---|---| | ||
| |`thresholds`|map(string:int)| | ||
| |`numberOfNodes`|int| | ||
| | `maxNodesToProcess` | int | 0 | Maximum number of nodes to process in each pass (0 means no limit). | | ||
| |`evictionModes`|list(string)| | ||
| |`evictableNamespaces`|(see [namespace filtering](#namespace-filtering))| | ||
|
|
||
|
|
@@ -527,7 +528,9 @@ Policy should pass the following validation checks: | |
| There is another parameter associated with the `HighNodeUtilization` strategy, called `numberOfNodes`. | ||
| This parameter can be configured to activate the strategy only when the number of under utilized nodes | ||
| is above the configured value. This could be helpful in large clusters where a few nodes could go | ||
| under utilized frequently or for a short period of time. By default, `numberOfNodes` is set to zero. | ||
| under utilized frequently or for a short period of time. By default, `numberOfNodes` is set to zero. The parameter `maxNodesToProcess` is used to limit how many nodes should be processed by the descheduler plugin on each execution. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current implementation for Can you also put the new sentence on a separate line? To limit the number of characters per line. |
||
|
|
||
|
|
||
|
|
||
| ### RemovePodsViolatingInterPodAntiAffinity | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -213,6 +213,12 @@ func (h *HighNodeUtilization) Balance(ctx context.Context, nodes []*v1.Node) *fr | |
|
|
||
| lowNodes, schedulableNodes := nodeInfos[0], nodeInfos[1] | ||
|
|
||
| // limit the number of nodes processed each execution if `MaxNodesToProcess` is set | ||
| if h.args.MaxNodesToProcess > 0 && len(lowNodes) > h.args.MaxNodesToProcess { | ||
| lowNodes = lowNodes[:h.args.MaxNodesToProcess] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The index needs to rotate so all nodes are eventually processed.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👀
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at possible ways to do this I see a couple ways to solve it, but would love feedback.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An observation: it's not guaranteed the list of
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Option 1. looks good. So the second version of the code changes in the comment. Worth moving the code under a function so it can be unit test separately. Also, |
||
| logger.V(1).Info("Limiting the number of underutilized nodes to process", "maxNodesToProcess", h.args.MaxNodesToProcess) | ||
| } | ||
|
|
||
| logger.V(1).Info("Criteria for a node below target utilization", h.criteria...) | ||
| logger.V(1).Info("Number of underutilized nodes", "totalNumber", len(lowNodes)) | ||
|
|
||
|
|
@@ -269,6 +275,7 @@ func (h *HighNodeUtilization) Balance(ctx context.Context, nodes []*v1.Node) *fr | |
| continueEvictionCond, | ||
| h.usageClient, | ||
| nil, | ||
| h.handle, | ||
| ) | ||
|
|
||
| return nil | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -481,6 +481,29 @@ func TestHighNodeUtilization(t *testing.T) { | |
| }, | ||
| expectedPodsEvicted: 0, | ||
| }, | ||
| { | ||
| name: "limits number of underutilized nodes processed per run with MaxNodesToProcess", | ||
| thresholds: api.ResourceThresholds{ | ||
| v1.ResourceCPU: 30, | ||
| v1.ResourcePods: 30, | ||
| }, | ||
| nodes: []*v1.Node{ | ||
| test.BuildTestNode("n1", 4000, 3000, 10, nil), | ||
| test.BuildTestNode("n2", 4000, 3000, 10, nil), | ||
| test.BuildTestNode("n3", 4000, 3000, 10, nil), | ||
| test.BuildTestNode("n4", 4000, 3000, 10, nil), | ||
| }, | ||
| pods: []*v1.Pod{ | ||
| test.BuildTestPod("p1", 400, 0, "n1", test.SetRSOwnerRef), | ||
| test.BuildTestPod("p2", 400, 0, "n2", test.SetRSOwnerRef), | ||
| test.BuildTestPod("p3", 400, 0, "n3", test.SetRSOwnerRef), | ||
| test.BuildTestPod("p4", 400, 0, "n4", test.SetRSOwnerRef), | ||
| }, | ||
| expectedPodsEvicted: 1, // Only one node's pod should be evicted per run | ||
| evictedPods: []string{"p1", "p2", "p3", "p4"}, // Any one of these is valid | ||
| evictionModes: nil, | ||
| // We'll set MaxNodesToProcess in the plugin args below | ||
| }, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This new test is broken. Also, please make "maxNodesToProcess" part of the test definition so each test can set its own value. |
||
| } | ||
|
|
||
| for _, testCase := range testCases { | ||
|
|
@@ -639,6 +662,7 @@ func TestHighNodeUtilizationWithTaints(t *testing.T) { | |
| } | ||
|
|
||
| plugin, err := NewHighNodeUtilization(ctx, &HighNodeUtilizationArgs{ | ||
| MaxNodesToProcess: 1, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would not expect this to be necessary. Have you introduced this by mistake ? |
||
| Thresholds: api.ResourceThresholds{ | ||
| v1.ResourceCPU: 40, | ||
| }, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth deprecating
numberOfNodesand renaming it tominNodesToProcess. Alternatively, introducing a newnodesToProcessLimits(or a better name) withminandmaxfields. Are min and max the only useful limits to introduce? Resp. should only a number be introduced? Or, percentages as well?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For us we're specifically looking to slow this down, so unsure if there is other use cases out there. The default settings are aggressive enough I'm not sure if
minwould be useful for company in any way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just supporting
maxandpercentagewould be fine? We might care more about avoiding evicting all high utilization nodes. 🤔(The name is just an example, not recommendation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @ingvagabund here, the
numberOfNodesconfiguration relates a lot with the newly introducedmaxNodesToProcess. To deprecate the first one and to make both part of the same structure (with min and max properties) would be better here.I don't think we should be going as far as have "percentages" here except if there is a clear usecase.