Skip to content

Conversation

@maxcao13
Copy link
Member

@maxcao13 maxcao13 commented May 8, 2025

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This PR is the initial alpha attempt to implement VPA in-place vertical scaling/in-place resize according to AEP-4016. It uses the VPA updater to actuate recommendations by sending resize patch requests to pods which allows automatic in-place resize as enabled by the InPlacePodVerticalScaling feature flag in k8s 1.27.0 alpha/1.33 beta and above (and by eventual graduation).

Also introduces feature-gates to VPA, and includes the first feature-gate InPlaceOrRecreate which allows the use of InPlaceOrRecreate update mode.

This PR is the amalgamation of these following PRs that were merged into the in-place-updates branch.

This PR is based on this large PR that was broken up into the above pieces: #7673 which was the continuation of #6652 started by @jkyros.

Which issue(s) this PR fixes:

Fixes #4016

Special notes for your reviewer:

Most of the reviews happened in the separate PRs linked above. This PR should serve as the final sanity check to be able to merge to master, and to be able to resolve any merge conflicts that accrued from the time the feature branch was created, to now.

Does this PR introduce a user-facing change?

In-place resize with the VPA is implemented. It can be enabled by setting `updateMode` on your VPA to `InPlaceOrRecreate` and enabling the required VPA-level feature gates on the vpa-updater and vpa-admission deployments.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[AEP] https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support
Depends on: 
[KEP] https://github.com/kubernetes/enhancements/tree/25e53c93e4730146e4ae2f22d0599124d52d02e7/keps/sig-node/1287-in-place-update-pod-resources

jkyros and others added 26 commits March 14, 2025 19:17
This adds the UpdateModeInPlaceOrRecreate mode to the types so we
can use it.

Signed-off-by: Max Cao <[email protected]>
…ediately after cutting a release branch so that new development is done against the new version
VPA: Add UpdateModeInPlaceOrRecreate to types
VPA: Introduce VPA feature gates, add `InPlaceOrRecreate` feature gate
Allows you to specify an env var FEATURE_GATES which adds feature gates to all vpa components during vpa-up and e2e tests.
Also allows local e2e tests to run kind with a new kind-config file which enables KEP-1287 InPlacePodVerticalScaling feature gate.
Separates the admission-controller service into a separate deploy manifest.

Signed-off-by: Max Cao <[email protected]>
VPA: Allow local dev and e2e scripts to run feature-gated features
Only allow VPA objects with InPlaceOrRecreate update mode to be created if InPlaceOrRecreate feature gate is enabled. If a VPA object already exists with this mode on, and the feature gate is disabled, this prevents further objects to be created with InPlaceOrRecreate, but this does not prevent the existing InPlaceOrRecreate VPA objects with from being modified.

Signed-off-by: Max Cao <[email protected]>
VPA: (InPlaceOrRecreate) Allow admission-controller to validate in-place spec
We might want to add a few more that are combined disruption counters,
e.g. in-place + eviction totals, but for now just add some separate
counters to keep track of what in-place updates are doing.
Introduces large changes in the updater component to allow InPlaceOrRecreate mode.
If the feature gate is enabled and the VPA update mode is InPlaceOrRecreate, the updater will attempt an in place update by first
checking a number of preconditions before actuation (e.g., if the pod's qosClass would be changed, whether we are already in-place resizing,
whether an in-place update may potentially violate disruption(previously eviction) tolerance, etc.).
After the preconditions are validated, we send an update signal to the InPlacePodVerticalScaling API with the recommendation, which may or may not fail.
Failures are handled in subsequent updater loops.

As for implementation details, patchCalculators have been re-used from the admission-controllers code for the updater in order to calculate recommendations for the updater to actuate.
InPlace logic has been mostly stuffed in the eviction package for now because of similarities and ease (user-initated API calls eviction vs. in-place; both cause disruption).
It may or may not be useful to refactor this later.

Signed-off-by: Max Cao <[email protected]>
The script needs to also check if the yaml input is a Deployment, and no longer needs to check for vpa-component names.

Signed-off-by: Max Cao <[email protected]>
This commit refactors inplace logic outside of the pods eviction restriction and separates them into their own files.
Also this commit adds PatchResourceTarget to calculators to allow them to explictly specify to the caller which
resource/subresource they should be patched to. This commit also creates a utils subpackage in order to prevent
dependency cycles in the unit tests, and adds various unit tests. Lastly, this commit adds a rateLimiter specifically
for limiting inPlaceResize API calls.

Signed-off-by: Max Cao <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Co-authored-by: Max Cao <[email protected]>
VPA: (InPlaceOrRecreate) Allow updater to actuate InPlaceOrRecreate updates
This commit refactors the VPA e2e test ginkgo wrappers so that they we can easily supply ginkgo decorators.
This allows us to add ginkgo v2 labels to suites so that later we can run tests that only run FG tests.
For now, this would only be useful for FG:InPlaceOrRecreate

Signed-off-by: Max Cao <[email protected]>
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 8, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: maxcao13
Once this PR has been reviewed and has the lgtm label, please assign adrianmoisey for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from jbartosik and omerap12 May 8, 2025 21:15
@k8s-ci-robot k8s-ci-robot added area/vertical-pod-autoscaler size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 8, 2025
@k8s-ci-robot
Copy link
Contributor

@maxcao13: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-autoscaling-vpa-full 97a67b7 link false /test pull-kubernetes-e2e-autoscaling-vpa-full

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@maxcao13
Copy link
Member Author

maxcao13 commented May 8, 2025

I'm going to try resolve the conflicts now, wish me luck 🙃

EDIT: although I am not actually sure if I can push directly to this branch. I guess we will see.
EDIT2: I cannot, oops.

@maxcao13
Copy link
Member Author

maxcao13 commented May 8, 2025

Sorry, if I disturbed anyone with this PR. Gonna close and Pr from my own branch.

/close

@k8s-ci-robot
Copy link
Contributor

@maxcao13: Closed this PR.

In response to this:

Sorry, if I disturbed anyone with this PR. Gonna close and Pr from my own branch.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support in-place Pod vertical scaling in VPA

5 participants