Skip to content

VPA: Allow CPU startup boost unboost mechanism to be quicker/reactive #9414

@maxcao13

Description

@maxcao13

Which component are you using?:

/area vertical-pod-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

The current alpha implementation of VPA CPU Startup Boost relies on the updater interval in order to "unboost" values back to the original manifest values. This most of the time results in a delay to when cpu values get unboosted, which is probably not the behaviour that users expect from just reading the enhancement/documentation. I would expect that as soon as the durationSeconds is hit, or that the pod becomes Ready, that the boost would end immediately.

Describe the solution you'd like.:

I want to propose making the cpu unboosting mechanism event driven and reactive, instead of relying on the set updater-interval that is defaulted to 60 seconds. I understand that requires somewhat of a different architecture than currently stands, so this requires careful implementation as to not make these new code paths blow up memory/CPU usage of the Updater.

Thinking to the future, I think there could be value in making the entire updater code reactive to pod recommendations changing as a whole. This would allow VPA updates to happen faster and can allow updates to spread out more over time, and allow the Updater to be able to do work throughout it's lifecycle, instead of sit idly between intervals. But that is out of scope of this issue.

A side note here, if there is not a recommendation existing, then the unboost should go back to the original value, but if there does exist a recommendation, it falls back to that value as is documented in the AEP. However, if we unboost back to the original value, and 1 second later the updater has a run tick which sees a new recommendation, we can technically be wasting an API call. But I think this is a worthy tradeoff.

Describe any alternative solutions you've considered.:

The workaround for this is to just have a really small updater interval, and don't use VPA for applying recommendations at all (updateMode=Off so that we don't throttle the updater), but that's a lot of user intervention for a seemingly simple mechanism.

Additional context.:

None. Open to ideas, or maybe reasons why we don't want to do this.

Metadata

Metadata

Assignees

Labels

area/vertical-pod-autoscalerkind/featureCategorizes issue or PR as related to a new feature.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions