fix(nodeclaim): dedup logs marking consolidatable #2018

flavono123 · 2025-02-21T05:21:29Z

Fixes #2002

Description

after set a nodeclaim is consolidatable, pause a while for a node handler does not work duplicated jobs

How was this change tested?

no, but guess it works since there are other cases referenced

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

k8s-ci-robot · 2025-02-21T05:21:38Z

Hi @flavono123. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coveralls · 2025-02-21T05:47:32Z

Pull Request Test Coverage Report for Build 14258446430

Details

2 of 2 (100.0%) changed or added relevant lines in 2 files are covered.
13 unchanged lines in 4 files lost coverage.
Overall coverage decreased (-0.08%) to 81.601%

Files with Coverage Reduction	New Missed Lines	%
pkg/test/cachesyncingclient.go	2	84.38%
pkg/utils/termination/termination.go	2	88.24%
pkg/controllers/provisioning/scheduling/topologydomaingroup.go	4	86.21%
pkg/controllers/node/termination/controller.go	5	69.54%

Totals
Change from base Build 14255724831:	-0.08%
Covered Lines:	9713
Relevant Lines:	11903

💛 - Coveralls

jonathan-innis · 2025-02-21T12:12:26Z

pkg/controllers/nodeclaim/disruption/consolidation.go

 	nodeClaim.StatusConditions().SetTrue(v1.ConditionTypeConsolidatable)
+	// We sleep here after a set operation since we want to ensure that we are able to read our own writes
+	// so that we avoid duplicating log lines due to quick re-queues from our node watcher
+	time.Sleep(100 * time.Millisecond)


I think we should only sleep if the condition has changed. Otherwise, we risk sleeping on every iteration

jonathan-innis · 2025-02-22T16:15:45Z

pkg/controllers/nodeclaim/disruption/consolidation.go

+		nodeClaim.StatusConditions().SetTrue(v1.ConditionTypeConsolidatable)
+		// We sleep here after a set operation since we want to ensure that we are able to read our own writes
+		// so that we avoid duplicating log lines due to quick re-queues from our node watcher
+		time.Sleep(100 * time.Millisecond)


So, this sleep here isn't actually going to do anything. The sleep needs to come after the Patch to give the cache time to update. If we just sleep here without doing a cache update, we aren't really accomplishing the deduping that we intended

i did change following how lifecycle controller does, you mentioned in the issue, but not sure not duplicating logs

engedaam · 2025-03-03T23:08:35Z

/assign @jonathan-innis

k8s-ci-robot · 2025-03-18T06:56:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: flavono123
Once this PR has been reviewed and has the lgtm label, please ask for approval from jonathan-innis. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jonathan-innis · 2025-05-27T05:50:47Z

pkg/controllers/nodeclaim/disruption/controller.go

+		// We sleep here after a patch operation since we want to ensure that we are able to read our own writes
+		// so that we avoid duplicating metrics and log lines due to quick re-queues from our node watcher
+		// USE CAUTION when determining whether to increase this timeout or remove this line
+		time.Sleep(time.Second)


I get that we are doing this in the lifecycle controller but looking back at this change, this sleep actually makes me a tad nervous about performance. Given that we only have 10 async threads occurring at one time, this has the potential to drastically slow down the performance of this controller.

I wonder if a better option here is to create some caching mechanism for the controller where we track how recently a key has been seen and processed and then avoid re-processing that key within a certain amount of time since we know that we just processed it

added *cache.Cache for the controller by referencing the consistency controller does https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/nodeclaim/consistency/controller.go#L52

plz check the dedupe time window should be tuned

Signed-off-by: flavono123 <[email protected]>

…datable Signed-off-by: flavono123 <[email protected]>

…ller Signed-off-by: flavono123 <[email protected]>

Signed-off-by: flavono123 <[email protected]>

flavono123 · 2025-05-27T06:40:38Z

make verify works on my local machine. no idea why the error is on ci

k8s-ci-robot · 2025-05-31T01:56:40Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 21, 2025

k8s-ci-robot requested review from jmdeal and tallaxes February 21, 2025 05:21

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 21, 2025

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 21, 2025

flavono123 force-pushed the fix/dedup-logs-marking-consolidatable branch from 501f898 to 518ff10 Compare February 21, 2025 05:24

jonathan-innis reviewed Feb 21, 2025

View reviewed changes

flavono123 requested a review from jonathan-innis February 22, 2025 00:25

jonathan-innis reviewed Feb 22, 2025

View reviewed changes

flavono123 requested a review from jonathan-innis February 26, 2025 06:14

k8s-ci-robot assigned jonathan-innis Mar 3, 2025

flavono123 force-pushed the fix/dedup-logs-marking-consolidatable branch from 91a9b2c to e55458c Compare May 21, 2025 13:49

jonathan-innis reviewed May 27, 2025

View reviewed changes

flavono123 added 4 commits May 27, 2025 15:15

fix(nodeclaim): dedup logs marking consolidatable

c532a02

Signed-off-by: flavono123 <[email protected]>

fix(nodeclaim): set and sleep only when the nodeclaim becomes consoli…

8aa997a

…datable Signed-off-by: flavono123 <[email protected]>

fix(nodeclaim): sleep for each reconciliation of consolidation contro…

7cefbcf

…ller Signed-off-by: flavono123 <[email protected]>

feat(nodeclaim): dedupe disruption reconcile

a849e41

Signed-off-by: flavono123 <[email protected]>

flavono123 force-pushed the fix/dedup-logs-marking-consolidatable branch from e55458c to a849e41 Compare May 27, 2025 06:15

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 27, 2025

refactor(nodeclaim): extract method to avoid gocyclo

e15173e

Signed-off-by: flavono123 <[email protected]>

flavono123 requested a review from jonathan-innis May 27, 2025 06:54

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(nodeclaim): dedup logs marking consolidatable #2018

fix(nodeclaim): dedup logs marking consolidatable #2018

Uh oh!

flavono123 commented Feb 21, 2025

Uh oh!

k8s-ci-robot commented Feb 21, 2025

Uh oh!

coveralls commented Feb 21, 2025 •

edited

Loading

Uh oh!

jonathan-innis Feb 21, 2025

Uh oh!

jonathan-innis Feb 22, 2025

Uh oh!

flavono123 Feb 26, 2025 •

edited

Loading

Uh oh!

engedaam commented Mar 3, 2025

Uh oh!

k8s-ci-robot commented Mar 18, 2025

Uh oh!

jonathan-innis May 27, 2025

Uh oh!

flavono123 May 27, 2025

Uh oh!

flavono123 commented May 27, 2025

Uh oh!

k8s-ci-robot commented May 31, 2025

Uh oh!

Uh oh!

fix(nodeclaim): dedup logs marking consolidatable #2018

Are you sure you want to change the base?

fix(nodeclaim): dedup logs marking consolidatable #2018

Uh oh!

Conversation

flavono123 commented Feb 21, 2025

Uh oh!

k8s-ci-robot commented Feb 21, 2025

Uh oh!

coveralls commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 14258446430

Details

💛 - Coveralls

Uh oh!

jonathan-innis Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-innis Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

flavono123 Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

engedaam commented Mar 3, 2025

Uh oh!

k8s-ci-robot commented Mar 18, 2025

Uh oh!

jonathan-innis May 27, 2025

Choose a reason for hiding this comment

Uh oh!

flavono123 May 27, 2025

Choose a reason for hiding this comment

Uh oh!

flavono123 commented May 27, 2025

Uh oh!

k8s-ci-robot commented May 31, 2025

Uh oh!

Uh oh!

coveralls commented Feb 21, 2025 •

edited

Loading

flavono123 Feb 26, 2025 •

edited

Loading