-
Notifications
You must be signed in to change notification settings - Fork 42.1k
[InPlacePodVerticalScaling] create an admission plugin to perform the OS and node capacity checks #136043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: natasha41575 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
omerap12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a VPA perspective, I have concerns about the error handling here (please correct me if I’m missing something).
VPA needs to be able to programmatically distinguish between different failure modes, such as:
- infeasible resizes (requests exceed node allocatable),
- unsupported platforms (e.g., non-Linux nodes),
- transient errors, etc.
Each of these cases should be handled differently. However, they all currently return admission.NewForbidden() with only different error messages. This forces consumers to parse error strings, which is fragile - any change in the message text could cause VPA to break silently.
| p.SetReadyFunc(nodeInformer.Informer().HasSynced) | ||
| } | ||
|
|
||
| // SetFeatures sets the feature gates for the plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| // SetFeatures sets the feature gates for the plugin. | |
| // InspectFeatureGates sets the feature gates for the plugin. |
Today, all "infeasible" resizes -- i.e. exceeding node allocatable, the node is on an unsupported platform, or the node has a feature enabled that is not compatible with resize like swap or static cpu/memory manager -- are all surfaced in the API the same way through a (I'll also think on what options we have to make it easier to programatically distinguish them) |
It doesn't, the VPA checks if the pod is in |
Understood. Transient errors would not return Distinguishing the node capacity check from the other feasibility checks (like OS) is a net-new feature request that I don't necessarily think this change needs to be blocked on... but let me circle back on this. I have an idea but I'm not 100% sure about it so I need to double check and might need to ask some other folks about it too. I see how this could be useful for ETA: I pushed a change. See my comment below: #136043 (comment) |
b472849 to
45c2557
Compare
| return admission.NewForbidden(a, err) | ||
| statusErr := admission.NewForbidden(a, err).(*apierrors.StatusError) | ||
| statusErr.ErrStatus.Details.Causes = append(statusErr.ErrStatus.Details.Causes, metav1.StatusCause{ | ||
| Type: ReasonNodeCapacity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omerap12 I added a CauseType here. Programatically, clients can do this:
_, err = clientset.CoreV1().Pods(ns).UpdateResize(ctx, podName, latestPod, metav1.UpdateOptions{})
if err != nil {
if statusErr, ok := err.(*apierrors.StatusError); ok {
for _, cause := range statusErr.ErrStatus.Details.Causes {
fmt.Printf("Cause is: %s\n", cause.Type)
}
}
}
I tried this out with a quick little go script. Hope this solves your use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I think that solves it :)
Thanks!
@adrianmoisey , we will incorporate this logic into the VPA so I believe we are good to go right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, seems good to me.
We'll need to handle both old and new methods though, since we can't guarantee which version of Kubernetes someone is running the VPA on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already done :)
|
@natasha41575: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind cleanup
/kind feature
What this PR does / why we need it:
Create an admission plugin to perform the OS and node capacity checks for pod resizes.
The last commit removes the OS feasibility check from the kubelet - the OS label on the node should be reliable, up-to-date, and IIUC immutable by the time the node is ready to have pods scheduled to it. But I would still like a second opinion that this is safe to remove.
Which issue(s) this PR is related to:
#135341
Special notes for your reviewer:
I wasn't sure if it is safe to remove the node capacity check from the kubelet? If a node is downsized, could there be a race window where the node could finish downsizing & the kubelet has restarted, but the new node allocatable in the status is not yet updated to reflect the smaller size?
Does this PR introduce a user-facing change?
/hold
for alignment with kubernetes/autoscaler#8818
/sig node
/assign @tallclair