Skip to content

Conversation

@natasha41575
Copy link
Contributor

@natasha41575 natasha41575 commented Jan 5, 2026

What type of PR is this?

/kind cleanup
/kind feature

What this PR does / why we need it:

Create an admission plugin to perform the OS and node capacity checks for pod resizes.

The last commit removes the OS feasibility check from the kubelet - the OS label on the node should be reliable, up-to-date, and IIUC immutable by the time the node is ready to have pods scheduled to it. But I would still like a second opinion that this is safe to remove.

Which issue(s) this PR is related to:

#135341

Special notes for your reviewer:

I wasn't sure if it is safe to remove the node capacity check from the kubelet? If a node is downsized, could there be a race window where the node could finish downsizing & the kubelet has restarted, but the new node allocatable in the status is not yet updated to reflect the smaller size?

Does this PR introduce a user-facing change?

For pod resizes requested on nodes where the resize request exceeds the node's allocatable capacity or the node is running an OS that does not support resize, the request will now fail in admission rather than be marked as Infeasible in the pod status later. 

/hold
for alignment with kubernetes/autoscaler#8818

/sig node
/assign @tallclair

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jan 5, 2026
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 5, 2026
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 5, 2026
@k8s-ci-robot k8s-ci-robot requested review from deads2k and jpbetz January 5, 2026 21:46
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 5, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: natasha41575
Once this PR has been reviewed and has the lgtm label, please assign deads2k, derekwaynecarr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jan 5, 2026
@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Jan 5, 2026
@natasha41575 natasha41575 changed the title [InPlacePodVerticalScaling] move trivial feasibility checks to an admission plugin [InPlacePodVerticalScaling] create an admission plugin to perform the OS and node capacity checks Jan 5, 2026
Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a VPA perspective, I have concerns about the error handling here (please correct me if I’m missing something).

VPA needs to be able to programmatically distinguish between different failure modes, such as:

  • infeasible resizes (requests exceed node allocatable),
  • unsupported platforms (e.g., non-Linux nodes),
  • transient errors, etc.

Each of these cases should be handled differently. However, they all currently return admission.NewForbidden() with only different error messages. This forces consumers to parse error strings, which is fragile - any change in the message text could cause VPA to break silently.

p.SetReadyFunc(nodeInformer.Informer().HasSynced)
}

// SetFeatures sets the feature gates for the plugin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// SetFeatures sets the feature gates for the plugin.
// InspectFeatureGates sets the feature gates for the plugin.

@natasha41575
Copy link
Contributor Author

natasha41575 commented Jan 6, 2026

From a VPA perspective, I have concerns about the error handling here (please correct me if I’m missing something).

VPA needs to be able to programmatically distinguish between different failure modes, such as:

  • infeasible resizes (requests exceed node allocatable),
  • unsupported platforms (e.g., non-Linux nodes),
  • transient errors, etc.

Each of these cases should be handled differently. However, they all currently return admission.NewForbidden() with only different error messages. This forces consumers to parse error strings, which is fragile - any change in the message text could cause VPA to break silently.

Today, all "infeasible" resizes -- i.e. exceeding node allocatable, the node is on an unsupported platform, or the node has a feature enabled that is not compatible with resize like swap or static cpu/memory manager -- are all surfaced in the API the same way through a PodResizePending condition in the status with the Reason set to Infeasible and the only differentiation between the different failure modes being a human-readable message in the Message field. How does VPA distinguish them today?

(I'll also think on what options we have to make it easier to programatically distinguish them)

@omerap12
Copy link
Member

omerap12 commented Jan 6, 2026

From a VPA perspective, I have concerns about the error handling here (please correct me if I’m missing something).
VPA needs to be able to programmatically distinguish between different failure modes, such as:

  • infeasible resizes (requests exceed node allocatable),
  • unsupported platforms (e.g., non-Linux nodes),
  • transient errors, etc.

Each of these cases should be handled differently. However, they all currently return admission.NewForbidden() with only different error messages. This forces consumers to parse error strings, which is fragile - any change in the message text could cause VPA to break silently.

Today, all "infeasible" resizes -- i.e. exceeding node allocatable, the node is on an unsupported platform, or the node has a feature enabled that is not compatible with resize like swap or static cpu/memory manager -- are all surfaced in the API the same way through a PodResizePending condition in the status with the Reason set to Infeasible and the only differentiation between the different failure modes being a human-readable message in the Message field. How does VPA distinguish them today?

(I'll also think on what options we have to make it easier to programatically distinguish them)

It doesn't, the VPA checks if the pod is in PodResizePending state with a Reason set to PodReasonInfeasible and evicts based on that (we have some logic to skip evictions in some cases but that's irrelevant ).
But we can't follow the same pattern in admission and we want to have some logic based on that error.
/cc @maxcao13

@k8s-ci-robot k8s-ci-robot requested a review from maxcao13 January 6, 2026 15:42
@natasha41575
Copy link
Contributor Author

natasha41575 commented Jan 6, 2026

From a VPA perspective, I have concerns about the error handling here (please correct me if I’m missing something).
VPA needs to be able to programmatically distinguish between different failure modes, such as:

  • infeasible resizes (requests exceed node allocatable),
  • unsupported platforms (e.g., non-Linux nodes),
  • transient errors, etc.

Each of these cases should be handled differently. However, they all currently return admission.NewForbidden() with only different error messages. This forces consumers to parse error strings, which is fragile - any change in the message text could cause VPA to break silently.

Today, all "infeasible" resizes -- i.e. exceeding node allocatable, the node is on an unsupported platform, or the node has a feature enabled that is not compatible with resize like swap or static cpu/memory manager -- are all surfaced in the API the same way through a PodResizePending condition in the status with the Reason set to Infeasible and the only differentiation between the different failure modes being a human-readable message in the Message field. How does VPA distinguish them today?
(I'll also think on what options we have to make it easier to programatically distinguish them)

It doesn't, the VPA checks if the pod is in PodResizePending state with a Reason set to PodReasonInfeasible and evicts based on that (we have some logic to skip evictions in some cases but that's irrelevant ). But we can't follow the same pattern in admission and we want to have some logic based on that error. /cc @maxcao13

Understood.

Transient errors would not return admission.NewForbidden(). VPA should be able to filter that out programatically, so from that perspective this PR does not introduce any behavior that is worse than what exists today.

Distinguishing the node capacity check from the other feasibility checks (like OS) is a net-new feature request that I don't necessarily think this change needs to be blocked on... but let me circle back on this. I have an idea but I'm not 100% sure about it so I need to double check and might need to ask some other folks about it too. I see how this could be useful for InPlace mode.

ETA: I pushed a change. See my comment below: #136043 (comment)

return admission.NewForbidden(a, err)
statusErr := admission.NewForbidden(a, err).(*apierrors.StatusError)
statusErr.ErrStatus.Details.Causes = append(statusErr.ErrStatus.Details.Causes, metav1.StatusCause{
Type: ReasonNodeCapacity,
Copy link
Contributor Author

@natasha41575 natasha41575 Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omerap12 I added a CauseType here. Programatically, clients can do this:

	_, err = clientset.CoreV1().Pods(ns).UpdateResize(ctx, podName, latestPod, metav1.UpdateOptions{})
	if err != nil {
		if statusErr, ok := err.(*apierrors.StatusError); ok {
			for _, cause := range statusErr.ErrStatus.Details.Causes {
				fmt.Printf("Cause is: %s\n", cause.Type)
			}
		}
	}

I tried this out with a quick little go script. Hope this solves your use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I think that solves it :)
Thanks!
@adrianmoisey , we will incorporate this logic into the VPA so I believe we are good to go right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, seems good to me.
We'll need to handle both old and new methods though, since we can't guarantee which version of Kubernetes someone is running the VPA on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done :)

@k8s-ci-robot
Copy link
Contributor

@natasha41575: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-integration-go-compatibility b472849 link true /test pull-kubernetes-integration-go-compatibility
pull-kubernetes-unit 45c2557 link true /test pull-kubernetes-unit
pull-kubernetes-linter-hints 45c2557 link false /test pull-kubernetes-linter-hints
pull-kubernetes-verify 45c2557 link true /test pull-kubernetes-verify
pull-kubernetes-unit-windows-master 45c2557 link false /test pull-kubernetes-unit-windows-master
pull-kubernetes-e2e-capz-windows-master 45c2557 link false /test pull-kubernetes-e2e-capz-windows-master
pull-kubernetes-integration 45c2557 link true /test pull-kubernetes-integration

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/apiserver area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Development

Successfully merging this pull request may close these issues.

5 participants