perf: respect multinode consolidation timeout in all cases #2025

rschalo · 2025-02-21T21:42:24Z

Fixes #N/A

Description
Scheduling for 50 nodes in multinode consolidation can take a long time, especially in large clusters where a scheduling decision for a node can take 20 seconds or longer. This can cause multinode consolidation to block drift, emptiness, and single node consolidation for longer than intended.

How was this change tested?
Deployed with a 5 second timeout and saw multinode consolidation bail before exhausting the list of candidates.

disruption/multinodeconsolidation.go:83","message":"failed to find a multi-node consolidation after timeout, last considered batch had 8 candidates

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

pkg/controllers/disruption/multinodeconsolidation.go

coveralls · 2025-02-21T22:04:46Z

Pull Request Test Coverage Report for Build 13956030074

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

25 of 47 (53.19%) changed or added relevant lines in 4 files are covered.
12 unchanged lines in 4 files lost coverage.
Overall coverage increased (+0.1%) to 81.707%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/controllers/disruption/helpers.go	3	5	60.0%
pkg/controllers/provisioning/scheduling/scheduler.go	9	11	81.82%
pkg/controllers/disruption/multinodeconsolidation.go	5	13	38.46%
pkg/controllers/provisioning/provisioner.go	8	18	44.44%

Files with Coverage Reduction	New Missed Lines	%
pkg/controllers/provisioning/provisioner.go	1	83.33%
pkg/test/expectations/expectations.go	2	95.0%
pkg/controllers/disruption/consolidation.go	4	85.55%
pkg/controllers/disruption/singlenodeconsolidation.go	5	93.75%

Totals
Change from base Build 13728019186:	0.1%
Covered Lines:	9657
Relevant Lines:	11819

💛 - Coveralls

pkg/controllers/disruption/multinodeconsolidation.go

pkg/controllers/disruption/helpers.go

This reverts commit ff9f119.

pkg/controllers/disruption/drift.go

pkg/controllers/provisioning/provisioner.go

pkg/controllers/provisioning/scheduling/scheduler.go

jonathan-innis

/lgtm
/approve

k8s-ci-robot · 2025-03-19T20:26:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jonathan-innis, rschalo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jonathan-innis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…s-sigs#2025)

perf: respect multinode consolidation timeout in all cases

b4cfac1

k8s-ci-robot requested review from jackfrancis and tallaxes February 21, 2025 21:42

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 21, 2025

rschalo added 2 commits February 21, 2025 13:46

put comments back

40dde20

inline clock.Now()

37323a4

tzneal reviewed Feb 21, 2025

View reviewed changes

pkg/controllers/disruption/multinodeconsolidation.go Outdated Show resolved Hide resolved

pkg/controllers/disruption/multinodeconsolidation.go Outdated Show resolved Hide resolved

rschalo marked this pull request as draft February 24, 2025 18:50

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 24, 2025

expand context deadline handling

8753963

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 24, 2025

rschalo added 3 commits February 24, 2025 15:45

fix comment

d6ab2b6

short circuit ctx canceled in solve

2c432e3

Merge branch 'main' into multinode-consolidation-timeout-fix

56a681d

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 24, 2025

rschalo added 4 commits February 24, 2025 15:59

fix spacing

b92e99b

remove topology short circuit

767fbed

nolint gocyclo silence

43dfd0a

add test for context deadline timeout

f021362

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 25, 2025

rschalo marked this pull request as ready for review February 25, 2025 21:12

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 25, 2025

tzneal reviewed Feb 25, 2025

View reviewed changes

pkg/controllers/disruption/multinodeconsolidation.go Outdated Show resolved Hide resolved

pkg/controllers/disruption/helpers.go Outdated Show resolved Hide resolved

remove select block

12d1adc

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 25, 2025

fix comment

9b552a8

rschalo added 9 commits March 7, 2025 15:09

add additional context handling in provisioner

f631c7f

reduce scope

0320177

remove unused function

fb32f53

return error directly

3729641

use timeout instead of context cancels

ff9f119

Revert "use timeout instead of context cancels"

e77389d

This reverts commit ff9f119.

remove pod error and break out of loop if ctx err

f8be51a

remove binary from history

1f6f9bf

remove ctx check in nodePools

b153118

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 14, 2025

jonathan-innis reviewed Mar 17, 2025

View reviewed changes

pkg/controllers/disruption/drift.go Outdated Show resolved Hide resolved

pkg/controllers/provisioning/provisioner.go Outdated Show resolved Hide resolved

pkg/controllers/provisioning/scheduling/scheduler.go Show resolved Hide resolved

rschalo added 3 commits March 18, 2025 15:03

another pr round

c56e10f

reduce diff

6f0187c

return error from solve

b34e1b2

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 18, 2025

rschalo added 6 commits March 18, 2025 16:06

fix presubmit

eb5849c

fix test assertion

05d24b7

add comment for provisioning loop

8120724

updates for pr

209f725

more updates

f3d18de

log change

3c7a50f

jonathan-innis approved these changes Mar 19, 2025

View reviewed changes

k8s-ci-robot assigned jonathan-innis Mar 19, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 19, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2025

k8s-ci-robot merged commit deba19e into kubernetes-sigs:main Mar 19, 2025
14 checks passed

rschalo added a commit to rschalo/karpenter that referenced this pull request Mar 21, 2025

perf: respect multinode consolidation timeout in all cases (kubernete…

6c646da

…s-sigs#2025)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: respect multinode consolidation timeout in all cases #2025

perf: respect multinode consolidation timeout in all cases #2025

Uh oh!

rschalo commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

coveralls commented Feb 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan-innis left a comment

Uh oh!

k8s-ci-robot commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

perf: respect multinode consolidation timeout in all cases #2025

perf: respect multinode consolidation timeout in all cases #2025

Uh oh!

Conversation

rschalo commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

coveralls commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 13956030074

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan-innis left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

coveralls commented Feb 21, 2025 •

edited

Loading