Skip to content
Merged
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,117 +7,175 @@
- [Proposal](#proposal)
- [Context](#context)
- [Design Details](#design-details)
- [Applying Updates During Pod Admission](#applying-updates-during-pod-admission)
- [Applying Disruption-free Updates](#Applying-disruption-free-updates)
- [Applying Disruptive Updates](#applying-disruptive-updates)
- [1. Applying Updates During Pod Admission](#pod-admission)
- [2. In-Place Updates (**NEW**)](#in-place)
- [Comparison of `UpdateMode`s](#comparison-of-updatemodes)
- [Test Plan](#test-plan)
- [Implementation History](#implementation-history)
<!-- /toc -->

## Summary


VPA applies its recommendations with a mutating webhook, during pod creation. It can also evict
pods expecting that it will apply the recommendation when the pod is recreated. This is a
disruptive process so VPA has some mechanism to avoid too frequent disruptive updates.
pods expecting that it will apply the recommendation when the pod is recreated. Today, this process
is potentially disruptive as any changes in recommendations requires a pod to be recreated.

We can instead reduce the amount of disruption by leveraging the [in-place update feature] which is
currently an [alpha feature since 1.27] and graduating to [beta in 1.33].

This proposal allows VPA to apply its recommendations more frequently, with less disruption by
using the
[in-place update feature] which is an alpha feature [available in Kubernetes 1.27.] This proposal enables only core uses
of in place updates in VPA with intention of gathering more feedback. Any more advanced uses of in place updates in VPA
(like applying different recommendations during pod initialization) will be introduced as separate enhancement
proposals.
This proposal enables only core uses of in place updates in VPA with intention of providing the
foundational pieces. Further advanced uses of in place updates in VPA (like applying different
recommendations during pod initialization or providing more frequent smaller updates) will be
introduced as separate enhancement proposals.

[in-place update feature]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources
[available in Kubernetes 1.27.]: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#api-change-3
[alpha feature since 1.27]: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#api-change-3
[beta in 1.33]: https://github.com/orgs/kubernetes/projects/178/views/1

### A Note On Disruptions {#disruptions}

It is important to note that **VPA cannot guarantee NO disruptions**. This is because the
underlying container runtime is responsible for actuating the resize operation and there are no
guarantees provided (see [this thread] for more information). However, in practice if the
underlying container runtime supports it, we expect these disruptions to be minimal and that MOST
of the time the updates will be done in-place.

This proposal therefore focuses on *reducing* disruptions while still harnessing the benefits of
VPA.

[this thread]: https://github.com/kubernetes/autoscaler/issues/7722#issue-2796215055

### Goals

* Allow VPA to actuate without disruption,
* Allow VPA to actuate more frequently, when it can actuate without disruption,
* Allow VPA to actuate in situations where actuation by eviction doesn't work.
* Allow VPA to actuate with reduced disruption.
* Allow VPA to actuate in situations where actuation by eviction is not desirable.

### Non-Goals

* Allow VPA to actuate more frequently.
* Allow VPA to operate with NO disruptions, see the [note above](#disruptions).
* Improved handling of injected sidecars
* Separate AEP will improve VPAs handling of injected sidecars.

## Proposal

Add new supported values of [`UpdateMode`]:
Add a new supported value of [`UpdateMode`]:

* `InPlaceOnly` and
* `InPlaceOrRecreate`.
* `InPlaceOrRecreate`

Here we specify `InPlaceOrRecreate` to make sure the user explicitly knows that the existing pod
may be replaced.

[`UpdateMode`]: https://github.com/kubernetes/autoscaler/blob/71b489f5aec3899157b37472cdf36a1de223d011/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L124

For the initial release of in-place updates with VPA, in-place updates will only be available
using the `InPlaceOrRecreate` mode. In the future, once the SIG feels that the feature is
mature enough, this behavior will become the default behavior for the `Auto` mode. See the [`Auto`
mode documentation].

[`Auto` mode documentation]: https://github.com/kubernetes/autoscaler/blob/78c8173b979316f892327022d53369760b000210/vertical-pod-autoscaler/docs/api.md#updatemode

## Context

[In-place update of pod resources KEP] is available in alpha in [Kubernetes 1.27]. The feature allows changing container
resources while the container is running. It also adds [`ResizePolicy`] field to Container. This field indicates for an
individual resource if a container needs to be restarted by kubelet when the resource is changed. For example it may be
the case that a Container automatically adapts to a change in CPU, but needs to be restarted for a change in Memory to
take effect.
[In-place update of pod resources KEP] is available in alpha in 1.27 and graduating to beta in
1.33. The feature allows changing container resources while the container is running. It adds
two key features:

* A [`/resize` subresource] that can be used to mutate the `Pod.Spec.Containers[i].Resources`
field.
* A [`ResizePolicy`] field to Container. This field allows to the user to specify the behavior when
modifying a resource value. Currently it has two modes:
- `PreferNoRestart` (default) which indicates to the container runtime that it should try to resize
the container without restarting. However, it does not guarantee that a restart will not
happen.
- `RestartContainer` which indicates that any mutation to the resource requires a restart (for
example, this is important for Java apps using the `-xmxN` which are unable to resize memory
without restarting).

Note that resize operations will NOT change the pod's quality of service (QoS) class.

Note that in the initial Beta version of in-place updates, [memory limit downscaling is forbidden]
for pods with `resizePolicy: PreferNoRestart`. This means that when VPA will attempt to apply the
patch, it will fail and VPA will need to fallback to a regular eviction (see below).

[In-place update of pod resources KEP]: https://github.com/kubernetes/enhancements/issues/1287
[Kubernetes 1.27]: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#api-change-3
[`ResizePolicy`]: https://github.com/kubernetes/api/blob/8360d82aecbc72aa039281a394ebed2eaf0c0ccc/core/v1/types.go#L2448
[`/resize` subresource]:https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources#api-changes
[`ResizePolicy`]: https://github.com/kubernetes/api/blob/4dccc5e86b957cea946a63c4f052ee7dec3946ce/core/v1/types.go#L2636
[memory limit downscaling is forbidden]: https://github.com/kubernetes/enhancements/pull/5089

## Design Details

In the update modes existing before (`Initial` and `Recreate`) only admission controller was changing pod spec (updater
was responsible for evicting pods so that admission controller could change them during admission).
Prior to this AEP, only the VPA admission controller was responsible for changing the pod spec.

The VPA updater is responsible for evicting pods so that the admission controller can change them
during admission.

In the newly added `InPlaceOrRecreate` mode, the VPA Updater will attempt to execute in-place
updates _FIRST_. If it is unable to process an in-place update in time, it will evict the pod to
force a change.

This will effectively match the current behavior in `Auto` except that resizes will first be
attempted in-place.

In the newly added `InPlaceOnly` and `InPlaceOrRecreate` modes VPA Updater will execute in-place updates (which change
pod spec without involving admission controller).
In the future, this logic may be improved to:
* Provide more frequent resizes.
* Make changes that are only attempted using in-place resizes and wouldn't ultimately result in an
eviction on failure.
* In the case of failure, make smaller updates to circumvent a node that does not have enough
headroom to accept the full resize but could accommodate a smaller one.

### Applying Updates During Pod Admission
We classify two types of updates in the context of this new mode:

For VPAs in `InPlaceOnly` and `InPlaceOrRecreate` modes VPA Admission Controller will apply updates to starting pods,
like it does for VPAs in `Initial`, `Auto`, and `Recreate` modes.
1. Updates on pod admission
2. In-place updates

### Applying Disruption-free Updates
### 1. Applying Updates During Pod Admission {#pod-admission}

When an update only changes resources for which a container indicates that it doesn't require a restart, then VPA can
attempt to actuate the change without disrupting the pod. VPA Updater will:
* attempt to actuate such updates in place,
* attempt to actuate them if difference between recommendation and request is at least 10%
* even if pod has been running for less than 12h,
* if necessary execute only partial updates (if some changes would require a restart).
For VPAs using the new `InPlaceOrRecreate` mode, the VPA Admission Controller will apply updates to
starting pods just as it does for VPAs in `Initial`, `Auto`, and `Recreate` modes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure if this AEP is the correct place to make this comment, but the promise of Auto was that it would one day support in-place, see https://github.com/kubernetes/autoscaler/blob/78c8173b979316f892327022d53369760b000210/vertical-pod-autoscaler/docs/api.md#updatemode

We've been promising users that if they use Auto today, one day in the future the VPA will start resizing their pods for them without recreating them.
It seems that this may not be the case since the container runtime can't guarantee it.

Do we need to start thinking about what to do with Auto in the future?

Again, I'm unsure if this really fits into the context of this AEP, but I thought I'd bring it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cents on this are that, switching Auto to be InPlace or some variation of InPlace would be a breaking change that should be in something like VPA 2.0. It may not "break" someone's configuration, but the change probably won't be the same expected behaviour in VPA 0.x and 1.x when the updater was freely evicting pods in Auto mode. Although, what I'm talking about is probably a discussion for the future timeline of VPA in general.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the behavior of Auto, good idea. Major version bump to accompany that, also a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the discussion around changing the behavior of auto, we're talking about a future where in-place vertical scaling is GA and not under a feature gate, right? IMHO it would feel a bit weird to make the default behavior of VPA depend on a beta feature gate.

Copy link
Member

@adrianmoisey adrianmoisey Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the discussion around changing the behavior of auto, we're talking about a future where in-place vertical scaling is GA and not under a feature gate, right?

Correct. Since the work on in-place is starting to happen now, and we are potentially diverging on the promise given to users, it seems appropriate to discuss Auto now, just so it's considered in our discussions as we move forward with the in-place rollout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specified in the AEP that the intention is that this eventually graduates into the Auto mode.


### Applying Disruptive Updates
### 2. In-Place Updates (**NEW**) {#in-place}

In both `InPlaceOnly` and `InPlaceOrRecreate` modes VPA updater will attempt to apply updates that require container
restart in place. It will update them under conditions that would trigger update with `Recreate` mode. That is it will
apply disruptive updates if:
In the `InPlaceOrRecreate` modes, and for updates that require a container restart, the VPA updater
will attempt to apply updates in place. It will update them under the same conditions that would
trigger an update with `Recreate` mode. That is it will apply an in-place update if:

* Any container has a request below the corresponding `LowerBound` or
* Any container has a request above the corresponding `UpperBound` or
* Difference between sum of pods requests and sum of recommendation `Target`s is more than 10% and the pod has been
running undisrupted for at least 12h.
* Successful disruptive update counts as disruption here (and prevents further disruptive updates to the pod for 12h).
* NOTE: A successful update counts as disruption here (and prevents further disruptive updates to the pod for 12h).

In `InPlaceOrRecreate` mode (but not in `InPlaceOnly` mode) VPA updater will evict pod to actuate a recommendation if it
attempted to apply the recommendation in place and failed.
(NEW!) In addition, VPA will attempt an in-place update in some cases where we NORMALLY would not
be able to perform an eviction, including:

* If `CanEvict` is false.
* If any of the `EvictionRequirements` on the VPA are not true.

These additional resizes can be attempted because the eviction fallback would fail anyway.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to clarify this...
So an in-place update will be attempted, if that in-place fails, the VPA will not attempt an eviction?
Almost as if it was InPlaceOrNotAtAll ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will attempt an eviction but the eviction won't go through because the actual condition will prevent it from happening. Let me clarify that in the text.


The VPA updater will evict a pod to actuate a recommendation if it attempted to apply the
recommendation in place and failed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two sentences seem to conflict with each other

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified.

The idea here is that we can be slightly more relaxed with in-place resizes and still allow VPA to proceed with eviction when the operation fails. The eviction would get blocked anyway since these conditions themselves prevent eviction from happening. Hopefully that makes sense.


VPA updater will consider that the update failed if:
* The pod has `.status.resize: Infeasible` or
* The pod has `.status.resize: Deferred` and more than 1 minute elapsed since the update or
* The pod has `.status.resize: InProgress` and more than 1 hour elapsed since the update:
* There seems to be a bug where containers that say they need to be restarted get stuck in update, hopefully it gets
fixed and we don't have to worry about this by beta.
* Patch attempt will return an error.
* If the attempt fails because it would change pods QoS:
* `InPlaceOrRecreate` will treat it as any other failure and consider evicting the pod.
* `InPlaceOnly` will consider applying request slightly lower than the limit.

Those failure modes shouldn't disrupt pod operations, only update. If there are problems that can disrupt pod operation
we should consider not implementing the `InPlaceOnly` mode.
* The pod has condition `PodResizePending` with reason `Infeasible` or
* The pod has condition `PodResizePending` with reason `Deferred` and more than 5 minutes elapsed
since the update or
* The pod has condition `PodResizing` and more than 1 hour elapsed since the update or
* Patch attempt returns an error.

Note that in the initial version of In-Place updates, memory limit downscaling will always fail
the patch operation. This means VPA will need to evict the pod normally for this change to happen.

#### A note on `ResizePolicy`.

VPA does not care and should not care about a container's `ResizePolicy` setting. In the new mode,
it will simply issue the `/resize` request and let the underlying machinery apply the resize
operation in a way that complies with the user's specification.

### Comparison of `UpdateMode`s

VPA updater considers the following conditions when deciding if it should apply an update:
Today, VPA updater considers the following conditions when deciding if it should apply an update:
- [`CanEvict`]:
- Pod is `Pending` or
- There are enough running pods in the controller.
Expand All @@ -133,77 +191,48 @@ VPA updater considers the following conditions when deciding if it should apply
recommendation or
- At least one container has at least one resource request higher than the upper bound of the corresponding
recommendation.
- Disruption-free update - doesn't change any resources for which the relevant container specifies
- **NEW** Disruption-free update - doesn't change any resources for which the relevant container specifies
`RestartPolicy: RestartContainer`.

`Auto` / `Recreate` evicts pod if:
* [`CanEvict`] returns true for the pod, and it meets at least one of the following conditions:
* Quick OOM,
* Outside recommended range,
* Long-lived pod with significant change.
* `EvictionRequirements` are all true.

`InPlaceOnly` and `InPlaceOrRecreate` will attempt to apply a disruption-free update in place if it meets at least one
`InPlaceOrRecreate` will attempt to apply an update in place if it meets at least one
of the following conditions:
* Quick OOM,
* Outside recommended range,
* Significant change.

`InPlaceOnly` and `InPlaceOrRecreate` when considering a disruption-free update in place ignore some conditions that
influence eviction decission in the `Recreate` mode:
* Long-lived Significant change.
* [`CanEvict`] won't be checked and
* Pods with significant change can be updated even if they are not long-lived.
* [`EvictionRequirements`] won't be checked

`InPlaceOnly` and `InPlaceOrRecreate` will attempt to apply updates that are **not** disruption-free in place under
the same conditions that apply to updates in the `Recreate` mode.

`InPlaceOrRecreate` will attempt to apply updates that by eviction when:
* VPA already attempted to apply the update in-place and failed and
* it meets conditions for applying in the `Recreate` mode.

[`CanEvict`]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/eviction/pods_eviction_restriction.go#LL100C10-L100C37
[by default less than 10 minutes]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L37
[`UpdatePriorityCalculator.AddPod`]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L81
[by default 12h]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L35
[by default 10%]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L33
[Outside recommended range]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/priority_processor.go#L73
[`EvictionRequirements`]: https://github.com/kubernetes/autoscaler/blob/54fe60ed4d4bb4cb89fe4abe11284d1bd6b06390/vertical-pod-autoscaler/pkg/updater/priority/scaling_direction_pod_eviction_admission.go

### Test Plan

The following test scenarios will be added to e2e tests. Both `InPlaceOnly` and `InPlaceOrRecreate` modes will be tested
and they should behave the same:
The following test scenarios will be added to e2e tests. The `InPlaceOrRecreate` mode will be
tested in the following scenarios:

* Admission controller applies recommendation to pod controlled by VPA.
* Disruption-free in-place update applied to all containers of a pod (request in recommendation bounds).
* Partial disruption-free update applied to some containers of a pod, some disruptive changes skipped (request in
* In-place update applied to all containers of a pod.
* Partial updates applied to some containers of a pod, some changes skipped (request in
recommendation bounds).
* Disruptive in-place update applied to all containers of a pod (request out ouf recommendation bounds).

There will be also scenarios testing differences between `InPlaceOnly` and `InPlaceOrRecreate` modesL
* Disruptive in-place update will fail. In `InPlaceOnly` pod should not be evicted, in the `InPlaceOrRecreate` pod
should be evicted and the recommendation applied.
* VPA attempts an update that would change Pods QoS (`RequestsOnly` scaling, request initially < limit, recommendation
equal to limit). In `InPlaceOnly` pod should not be evicted, request slightly lower than the recommendation will be
applied. In the `InPlaceOrRecreate` pod should be evicted and the recommendation applied.
* In-place update will fail. Pod should be evicted and the recommendation applied.
* In-place update will fail but `CanEvict` is false, pod should not be evicted.
* In-place update will fail but `EvictionRequirements` are false, pod should not be evicted.

### Details still to consider

#### Ensure in-place resize request doesn't cause restarts

Currently the container [resize policy](https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/#container-resize-policies)
can be either `NotRequired` or `RestartContainer`. With `NotRequired` in-place update could still end up
restarting the container if in-place update is not possible, depending on kubelet and container
runtime implementation. However in the proposed design it should be VPA's decision whether to fall back
to restarts or not.

Extending or changing the existing API for in-place updates is possible, e.g. adding a new
`MustNotRestart` container resize policy.

#### Should `InPlaceOnly` mode be dropped

The use case for `InPlaceOnly` is not understood yet. Unless we have a strong signal it solves real
needs we should not implement it. Also VPA cannot ensure no restart would happen unless
*Ensure in-place resize request doesn't cause restarts* (see above) is solved.

#### Careful with memory scale down

Downsizing memory may have to be done slowly to prevent OOMs if application starts to allocate rapidly.
Expand All @@ -212,3 +241,4 @@ Needs more research on how to scale down on memory safely.
## Implementation History

- 2023-05-10: initial version
- 2025-02-19: Updates to align with latest changes to [KEP-1287](https://github.com/kubernetes/enhancements/issues/1287).
Loading