Skip to content

Conversation

@maxcao13
Copy link
Member

@maxcao13 maxcao13 commented May 9, 2025

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This PR is the initial alpha attempt to implement VPA in-place vertical scaling/in-place resize according to AEP-4016. It uses the VPA updater to actuate recommendations by sending resize patch requests to pods which allows automatic in-place resize as enabled by the InPlacePodVerticalScaling feature flag in k8s 1.27.0 alpha/1.33 beta and above (and by eventual graduation).

Also introduces feature-gates to VPA, and includes the first feature-gate InPlaceOrRecreate which allows the use of InPlaceOrRecreate update mode.

This PR is the amalgamation of these following PRs that were merged into the in-place-updates branch.

This PR is based on this large PR that was broken up into the above pieces: #7673 which was the continuation of #6652 started by @jkyros.

Which issue(s) this PR fixes:

Fixes #4016

Special notes for your reviewer:

Most of the reviews happened in the separate PRs linked above. This PR should serve as the final sanity check to be able to merge to master, and to be able to resolve any merge conflicts that accrued from the time the feature branch was created, to now.

However, there are some new commits that were made in order to get this branch up to date with master. They are the ones including and after this commit: 0fa50f3

Does this PR introduce a user-facing change?

In-place resize with the VPA is implemented. It can be enabled by setting `updateMode` on your VPA to `InPlaceOrRecreate` and enabling the required VPA-level feature gates on the vpa-updater and vpa-admission containers.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[AEP] https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support
Depends on: 
[KEP] https://github.com/kubernetes/enhancements/tree/25e53c93e4730146e4ae2f22d0599124d52d02e7/keps/sig-node/1287-in-place-update-pod-resources

jkyros and others added 18 commits May 8, 2025 14:37
This adds the UpdateModeInPlaceOrRecreate mode to the types so we
can use it.

Signed-off-by: Max Cao <[email protected]>
Allows you to specify an env var FEATURE_GATES which adds feature gates to all vpa components during vpa-up and e2e tests.
Also allows local e2e tests to run kind with a new kind-config file which enables KEP-1287 InPlacePodVerticalScaling feature gate.
Separates the admission-controller service into a separate deploy manifest.

Signed-off-by: Max Cao <[email protected]>
Only allow VPA objects with InPlaceOrRecreate update mode to be created if InPlaceOrRecreate feature gate is enabled. If a VPA object already exists with this mode on, and the feature gate is disabled, this prevents further objects to be created with InPlaceOrRecreate, but this does not prevent the existing InPlaceOrRecreate VPA objects with from being modified.

Signed-off-by: Max Cao <[email protected]>
We might want to add a few more that are combined disruption counters,
e.g. in-place + eviction totals, but for now just add some separate
counters to keep track of what in-place updates are doing.
Introduces large changes in the updater component to allow InPlaceOrRecreate mode.
If the feature gate is enabled and the VPA update mode is InPlaceOrRecreate, the updater will attempt an in place update by first
checking a number of preconditions before actuation (e.g., if the pod's qosClass would be changed, whether we are already in-place resizing,
whether an in-place update may potentially violate disruption(previously eviction) tolerance, etc.).
After the preconditions are validated, we send an update signal to the InPlacePodVerticalScaling API with the recommendation, which may or may not fail.
Failures are handled in subsequent updater loops.

As for implementation details, patchCalculators have been re-used from the admission-controllers code for the updater in order to calculate recommendations for the updater to actuate.
InPlace logic has been mostly stuffed in the eviction package for now because of similarities and ease (user-initated API calls eviction vs. in-place; both cause disruption).
It may or may not be useful to refactor this later.

Signed-off-by: Max Cao <[email protected]>
The script needs to also check if the yaml input is a Deployment, and no longer needs to check for vpa-component names.

Signed-off-by: Max Cao <[email protected]>
This commit refactors inplace logic outside of the pods eviction restriction and separates them into their own files.
Also this commit adds PatchResourceTarget to calculators to allow them to explictly specify to the caller which
resource/subresource they should be patched to. This commit also creates a utils subpackage in order to prevent
dependency cycles in the unit tests, and adds various unit tests. Lastly, this commit adds a rateLimiter specifically
for limiting inPlaceResize API calls.

Signed-off-by: Max Cao <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Co-authored-by: Max Cao <[email protected]>
This commit refactors the VPA e2e test ginkgo wrappers so that they we can easily supply ginkgo decorators.
This allows us to add ginkgo v2 labels to suites so that later we can run tests that only run FG tests.
For now, this would only be useful for FG:InPlaceOrRecreate

Signed-off-by: Max Cao <[email protected]>
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 9, 2025
@k8s-ci-robot k8s-ci-robot added area/vertical-pod-autoscaler size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 9, 2025
@maxcao13 maxcao13 force-pushed the kubernetes-in-place-updates branch from 5d67d4d to 3039f3c Compare May 9, 2025 20:38
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label May 9, 2025
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@omerap12
Copy link
Member

omerap12 commented May 10, 2025

so currently the e2e tests for the in-place feature are triggered every time right?

@maxcao13
Copy link
Member Author

They are not running because the feature gates are not automatically being applied through the CI.

@omerap12
Copy link
Member

They are not running because the feature gates are not automatically being applied through the CI.

I think they should run by default for now. We can figure out how to run them dynamically later. Since our main focus is the in-place feature, we need the e2e tests to run.

@adrianmoisey
Copy link
Member

They are not running because the feature gates are not automatically being applied through the CI.

The Kubernetes feature-gate is enabled by default, and we could enable the VPA featute-gate in the CI tests

Copy link
Member

@raywainman raywainman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 12, 2025
@maxcao13
Copy link
Member Author

maxcao13 commented May 12, 2025

The Kubernetes feature-gate is enabled by default, and we could enable the VPA featute-gate in the CI tests

I'm assuming we are okay with enabling it after merging this then. I can attest to the e2e tests passing, but there's a small chance that it won't on the kube-infra.

@omerap12
Copy link
Member

The Kubernetes feature-gate is enabled by default, and we could enable the VPA featute-gate in the CI tests

I'm assuming we are okay with enabling it after merging this then. I can attest to the e2e tests passing, but there's a small chance that it won't on the kube-infra.

Fine with me.
/lgtm

@raywainman
Copy link
Member

I'm fine with that too since this functionality is all off by default.

I reviewed this with a keen eye for the "default happy path" where in-place updates is disabled and it all LGTM. It would be great to have folks play with this now that we have this in the main branch to make sure we didn't miss anything.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maxcao13, raywainman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2025
@k8s-ci-robot k8s-ci-robot merged commit 2b33c4c into kubernetes:master May 12, 2025
7 checks passed
@adrianmoisey
Copy link
Member

I reviewed this with a keen eye for the "default happy path" where in-place updates is disabled and it all LGTM. It would be great to have folks play with this now that we have this in the main branch to make sure we didn't miss anything.

Agreed!

/approve

Thanks Max for getting this done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support in-place Pod vertical scaling in VPA

7 participants