-
Notifications
You must be signed in to change notification settings - Fork 276
fix(nodeclaim): dedup logs marking consolidatable #2018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(nodeclaim): dedup logs marking consolidatable #2018
Conversation
Hi @flavono123. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
501f898
to
518ff10
Compare
Pull Request Test Coverage Report for Build 14258446430Details
💛 - Coveralls |
nodeClaim.StatusConditions().SetTrue(v1.ConditionTypeConsolidatable) | ||
// We sleep here after a set operation since we want to ensure that we are able to read our own writes | ||
// so that we avoid duplicating log lines due to quick re-queues from our node watcher | ||
time.Sleep(100 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only sleep if the condition has changed. Otherwise, we risk sleeping on every iteration
nodeClaim.StatusConditions().SetTrue(v1.ConditionTypeConsolidatable) | ||
// We sleep here after a set operation since we want to ensure that we are able to read our own writes | ||
// so that we avoid duplicating log lines due to quick re-queues from our node watcher | ||
time.Sleep(100 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this sleep here isn't actually going to do anything. The sleep needs to come after the Patch to give the cache time to update. If we just sleep here without doing a cache update, we aren't really accomplishing the deduping that we intended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did change following how lifecycle controller does, you mentioned in the issue, but not sure not duplicating logs
/assign @jonathan-innis |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: flavono123 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
91a9b2c
to
e55458c
Compare
// We sleep here after a patch operation since we want to ensure that we are able to read our own writes | ||
// so that we avoid duplicating metrics and log lines due to quick re-queues from our node watcher | ||
// USE CAUTION when determining whether to increase this timeout or remove this line | ||
time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that we are doing this in the lifecycle controller but looking back at this change, this sleep actually makes me a tad nervous about performance. Given that we only have 10 async threads occurring at one time, this has the potential to drastically slow down the performance of this controller.
I wonder if a better option here is to create some caching mechanism for the controller where we track how recently a key has been seen and processed and then avoid re-processing that key within a certain amount of time since we know that we just processed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added *cache.Cache for the controller by referencing the consistency controller does https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/nodeclaim/consistency/controller.go#L52
plz check the dedupe time window should be tuned
Signed-off-by: flavono123 <[email protected]>
…datable Signed-off-by: flavono123 <[email protected]>
…ller Signed-off-by: flavono123 <[email protected]>
Signed-off-by: flavono123 <[email protected]>
e55458c
to
a849e41
Compare
Signed-off-by: flavono123 <[email protected]>
|
Fixes #2002
Description
after set a nodeclaim is consolidatable, pause a while for a node handler does not work duplicated jobs
How was this change tested?
no, but guess it works since there are other cases referenced
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.