Skip to content

Conversation

@omerap12
Copy link
Member

@omerap12 omerap12 commented Apr 1, 2025

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Add docs for in place

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Add docs for in-place updates

Depends on #7673
/hold

rakechill and others added 30 commits February 11, 2025 14:31
…prover

Add adrianmoisey to VPA approvers
…ediately after cutting a release branch so that new development is done against the new version
Bump VPA version in main branch and change release process
this change ensures that when DecreaseTargetSize is counting the nodes
that it does not include any instances which are considered to be
pending (i.e. not having a node ref), deleting, or are failed. this change will
allow the core autoscaler to then decrease the size of the node group
accordingly, instead of raising an error.

This change also add some code to the unit tests to make detection of
this condition easier.
…ze-fix

make DecreaseTargetSize more accurate for clusterapi
This change makes it so that when a failed machine is found during the
`findScalableResourceProviderIDs` it will always gain a normalized
provider ID with failure guard prepended. This is to ensure that
machines which have gained a provider ID from the infrastructure and
then later go into a failed state can be properly removed by the
autoscaler when it wants to correct the size of a node group.
…-detection

improve failed machine detection in clusterapi
…odeHasValidProviderID

capi: node and provider ID accounting funcs
* Update default value for scaleDownDelayAfterDelete

Setting defaut value for scaleDownDelayAfterDelete to be scanInterval
instead of 0.

* Revert the change and fix the flag description
…up-sample-scheduled

Allow using scheduled pods as samples in proactive scale up
Fix log for node filtering in static autoscaler
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 1, 2025
@omerap12
Copy link
Member Author

omerap12 commented Apr 1, 2025

/assign @maxcao13

Please double-check that I didn’t miss anything? I kept the docs high-level, avoiding technical details since they’re not relevant to the end user.
Appreciate you taking a look :)

Signed-off-by: Omer Aplatony <[email protected]>
@adrianmoisey
Copy link
Member

Thanks for making this!
Does it make sense to put this in the in-place-updates branch, along with all the other In-place work?

Copy link
Member

@maxcao13 maxcao13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, have a couple suggestions. Thanks for writing this!

@maxcao13
Copy link
Member

maxcao13 commented Apr 1, 2025

Thanks for making this!
Does it make sense to put this in the in-place-updates branch, along with all the other In-place work?

I think we can do that or we can just merge it afterwards, both seem reasonable to me.

Maybe we should merge this after the in-place-updates branch is merged in case there are any extremely final review changes that affect this doc?

@omerap12
Copy link
Member Author

omerap12 commented Apr 2, 2025

Thanks for making this!
Does it make sense to put this in the in-place-updates branch, along with all the other In-place work?

I think we can do that or we can just merge it afterwards, both seem reasonable to me.

Maybe we should merge this after the in-place-updates branch is merged in case there are any extremely final review changes that affect this doc?

Agree.

@omerap12 omerap12 requested a review from maxcao13 April 2, 2025 06:51
* All containers in a pod are updated together (partial updates not supported)
* Memory downscaling requires careful consideration to prevent OOMs
* Updates still respect VPA's standard update conditions and timing restrictions
* In-place updates will fail for pods with Guaranteed QoS class (requires pod recreation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true?

If the QoS class is guaranteed then requests == limits and VPA will just update both together (since the ratio between them is 1.0), which means the QoS class will never change.

My understanding is that the in-place feature will fail when you try to change the QoS class.

Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, my misunderstanding, sorry.
fixed in: 125209d

Signed-off-by: Omer Aplatony <[email protected]>
@omerap12 omerap12 changed the base branch from master to in-place-updates May 4, 2025 06:32
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 4, 2025
@omerap12
Copy link
Member Author

omerap12 commented May 4, 2025

Gonna push this to in-place
/close

@k8s-ci-robot
Copy link
Contributor

@omerap12: Closed this PR.

Details

In response to this:

Gonna push this to in-place
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@omerap12 omerap12 deleted the in-place-docs branch May 4, 2025 06:34
@omerap12 omerap12 restored the in-place-docs branch May 4, 2025 06:35
@omerap12 omerap12 deleted the in-place-docs branch May 4, 2025 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/documentation Categorizes issue or PR as related to documentation. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.