Skip to content

Conversation

@maxcao13
Copy link
Member

@maxcao13 maxcao13 commented Mar 22, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Introduces large changes in the updater component to allow InPlaceOrRecreate mode. If the feature gate is enabled and the VPA update mode is InPlaceOrRecreate, the updater will attempt an in place update by first checking a number of preconditions before actuation (e.g., whether we are already in-place resizing,
whether an in-place update may potentially violate disruption(previously eviction) tolerance, etc.). After the preconditions are validated, we send an update signal to the InPlacePodVerticalScaling API with the recommendation, which may or may not fail. Failures are handled in subsequent updater loops.

We need this in order to actually actuate in place updates.

Which issue(s) this PR fixes:

Part of AEP-4016 (InPlaceVerticalScaling/InPlaceOrRecreate)
This PR is part of the larger feature PR in #7673

Depends on:

Note that this PR is merging into the in-place-updates feature branch, which will be merged when this feature is all reviewed and ready.

Special notes for your reviewer:

As for implementation details, patchCalculators have been re-used from the admission-controllers code for the updater in order to calculate recommendations for the updater to actuate. InPlace logic has been mostly stuffed in the eviction package for now because of similarities and ease (user-initated API calls eviction vs. in-place; both cause disruption), maybe we also need a "disruption limiter". It may or may not be useful to refactor this later.

There are some TODOs littered throughout the code. I've limited as many TODOS as possible, but I'm using the rest of the TODOs as review markers to draw attention to parts of the discussion that need community discussion. Otherwise, they are items that should be fixed after 1.33 1287 KEP changes are released, or other future work.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[AEP-4016]: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/vertical-pod-autoscaler labels Mar 22, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 22, 2025
@maxcao13
Copy link
Member Author

maxcao13 commented Mar 22, 2025

/hold
unit tests are failing bc we need #7934 to merge

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 22, 2025
@upodroid
Copy link
Member

/test pull-kubernetes-e2e-autoscaling-vpa-full

@maxcao13 maxcao13 force-pushed the in-place-updates-updater branch from 6dd33cc to 7f9b9c4 Compare March 24, 2025 19:09
@maxcao13
Copy link
Member Author

FYI:

Verify tests are failing b/c I rebased on the admission PR, which had the same issue which I fixed, but I didn't fix the issue in this PR. Once that's merged, I'll rebase again.

We might want to add a few more that are combined disruption counters,
e.g. in-place + eviction totals, but for now just add some separate
counters to keep track of what in-place updates are doing.
@maxcao13 maxcao13 force-pushed the in-place-updates-updater branch from 7f9b9c4 to 114091d Compare March 25, 2025 21:18
Copy link
Member

@raywainman raywainman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass through. Need to do another pass but thought I'd shoot off some questions to start.

Lots of great work here, thanks a lot @maxcao13!

@maxcao13 maxcao13 force-pushed the in-place-updates-updater branch from 114091d to 93a37e4 Compare March 27, 2025 22:57
@maxcao13 maxcao13 force-pushed the in-place-updates-updater branch from e6356b6 to 5a3f674 Compare April 7, 2025 00:54
@maxcao13
Copy link
Member Author

maxcao13 commented Apr 7, 2025

I refactored a lot of the in-place stuff out of the eviction code and separated them in 5a3f674.

I need to write more unit tests for the inplaceRestriction stuff but wanted to push this for now for review.

@maxcao13 maxcao13 force-pushed the in-place-updates-updater branch from 9818035 to 34d1df0 Compare April 10, 2025 16:53
@maxcao13
Copy link
Member Author

maxcao13 commented Apr 10, 2025

Sorry if it looks like the latest change was big, a lot of the changes were moving things around, renaming, lowercasing some fields, and adding/fixing unit tests.

Other than that, I've refactored the tolerance code a bit to address @raywainman's comments, and changed it so that CanEvict take into accoun about the in-place fallback to evict conditions.

I've added some unit tests to confirm the functionality.

Reviewers, please take a look when you can.

This diff is the list of changes since the last commit to now if that helps: https://github.com/kubernetes/autoscaler/compare/93a37e47308ad58275e909bfeaa0347b2ef6b4ba..34d1df06b3b84611e888dbbf988741bb1f3744dd

Copy link
Member

@raywainman raywainman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good. Generally just some nits. Thanks Max!

Introduces large changes in the updater component to allow InPlaceOrRecreate mode.
If the feature gate is enabled and the VPA update mode is InPlaceOrRecreate, the updater will attempt an in place update by first
checking a number of preconditions before actuation (e.g., if the pod's qosClass would be changed, whether we are already in-place resizing,
whether an in-place update may potentially violate disruption(previously eviction) tolerance, etc.).
After the preconditions are validated, we send an update signal to the InPlacePodVerticalScaling API with the recommendation, which may or may not fail.
Failures are handled in subsequent updater loops.

As for implementation details, patchCalculators have been re-used from the admission-controllers code for the updater in order to calculate recommendations for the updater to actuate.
InPlace logic has been mostly stuffed in the eviction package for now because of similarities and ease (user-initated API calls eviction vs. in-place; both cause disruption).
It may or may not be useful to refactor this later.

Signed-off-by: Max Cao <[email protected]>
The script needs to also check if the yaml input is a Deployment, and no longer needs to check for vpa-component names.

Signed-off-by: Max Cao <[email protected]>
This commit refactors inplace logic outside of the pods eviction restriction and separates them into their own files.
Also this commit adds PatchResourceTarget to calculators to allow them to explictly specify to the caller which
resource/subresource they should be patched to. This commit also creates a utils subpackage in order to prevent
dependency cycles in the unit tests, and adds various unit tests. Lastly, this commit adds a rateLimiter specifically
for limiting inPlaceResize API calls.

Signed-off-by: Max Cao <[email protected]>
@maxcao13 maxcao13 force-pushed the in-place-updates-updater branch from 0cbc46a to ed9358c Compare April 11, 2025 21:18
@raywainman
Copy link
Member

Overall LGTM. I would LOVE a second pair of eyes on this just given there is a lot happening here. @adrianmoisey, @omerap12, @voelzmo any chance I could get a second look?

Thanks a lot Max for patiently dealing with my waves of comments :)

Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just my initial thoughts—planning to do another pass later.

@maxcao13
Copy link
Member Author

Is there anything else that needs to be addressed here?
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2025
@omerap12
Copy link
Member

lgtm, thank you! Excellent work, Max :)
The next PR is for the end-to-end tests, right? I can't recall if we already merged it :)

@maxcao13
Copy link
Member Author

maxcao13 commented Apr 29, 2025

Nope, I'll put that PR up now thanks Omer!

(and it should be the last one before the big PR to main)

@omerap12
Copy link
Member

Nope, I'll put that PR up now thanks Omer!

(and it should be the last one before the big PR to main)

Great, Thanks!

@maxcao13
Copy link
Member Author

maxcao13 commented Apr 29, 2025

PR is here: #8072

@raywainman
Copy link
Member

I know there are still some outstanding TODOs here but in the interest of unblocking this I think this is OK to go now. We can iterate from here. Really great work @maxcao13, thanks again.

/lgtm

/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maxcao13, raywainman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2025
@k8s-ci-robot k8s-ci-robot merged commit 9440674 into kubernetes:in-place-updates May 5, 2025
7 checks passed
@maxcao13 maxcao13 deleted the in-place-updates-updater branch May 5, 2025 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants