-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2021: Support scaling HPA to/from zero pods for object/external metrics #2022
base: master
Are you sure you want to change the base?
KEP-2021: Support scaling HPA to/from zero pods for object/external metrics #2022
Conversation
Welcome @johanneswuerbach! |
Hi @johanneswuerbach. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
aeff3ee
to
d09c208
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhancements Lead here, some notes on the the current PR:
The dir structure should be: keps/sig-autoscaling/2021-scale-from-zero/ (Note we put the issue number in the title now)
The current file 20200926-scale-from-zero.md should be renamed README.md and a kep.yaml should also be added.
More detail can be found here: https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template
Best,
Kirsten
@kikisdeliveryservice thank you, folder structure fixed. |
/assign @gjtempleton I hoped to get the current state documented first, before we start talking about how we could move towards beta. Does that make sense? |
I think that's a good plan for now. |
@johanneswuerbach @gjtempleton how's this going so far? Can I be of any help? |
@jeffreybrowning yes. I assumed we could merge this KEP already as is to document the current state and then iterate on it towards beta. I'll try to present it at the next sig-autoscaling meeting to get some input and discuss next steps, but if you have any input I'm happy to incorporate it already. |
@johanneswuerbach missed autoscaling meeting today -- do you have clarity on next steps? |
Me too, I assumed those are bi-weekly or are the meetings on-demand? |
In all honesty, it would have been my first one -- the work you started on this feature for beta has encouraged me to get involved and help you push this through. It will really help using HPA + async job queue to scale down to 0 workers when not processing tasks. |
Hey, the meetings are held weekly at 14:00 UTC, you can see more info including a link to the agenda if you want to raise it here. I've raised this at a previous meeting, so the community's already aware this work has started, but it would be good to have some more in-depth discussion. |
Thanks I got confused by the mention of biweekly here https://github.com/kubernetes/community/tree/master/sig-autoscaling#meetings and assumed the calendar invite is wrong. Will ask upfront next time :-) |
Pinging here. How's this enhancement coming along? What are next steps? |
Pinging back before the holidays hit. What are next steps? |
@adrianmoisey not actively, I've tried to get this KEP reviewed, but there has been much activity. Happy to help with the implementation once there is movement here though. |
Well... I'm not an approver, but I'm happy to help push this as much as I can. Something I'm not 100% sure about, is that the feature seems to already exist, despite the KEP PR not being merged. But I'll do some digging and figure out the state of the world |
/ok-to-test |
@johanneswuerbach can you make the suggested changes, then I'll ping some folks to try un-stuck this KEP |
9b6913c
to
f55305c
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: johanneswuerbach, josephburnett, mwielgus The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@adrianmoisey thanks, updated. |
@johanneswuerbach Sorry for the runaround. It's taking me a bit of time to get my head around where this KEP is in terms of process. So... it's already implemented as alpha, and we need to promote it to beta. Again, sorry for the runaround! And just to summarise what I think needs happening and the next steps:
|
f55305c
to
1c9aad2
Compare
@adrianmoisey I've updated the KEP template, are you talking about any specific section? Would be great reach out to sig-autoscaling again prior to iterating on this to understand what is exactly missing at this point. |
1c9aad2
to
14b9732
Compare
14b9732
to
4b6cce0
Compare
From my standpoint it seems fine. @gjtempleton will also take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits and typos, but some areas where we need to add some detail.
## Summary | ||
|
||
[Horizontal Pod Autoscaler][] (HPA) automatically scales the number of pods in any resource which supports the `scale` subresource based on observed CPU utilization | ||
(or, with custom metrics support, on some other application-provided metrics) from one to many replicas. This proposal adds support for scaling from zero to many replicas and back to zero for object and external metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to make this a bit more clear on why users would want to do this, and the limitations on these metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, better?
In cases of a frequently idle queue or a less latency sensitive workload there is no need to run one replica at all times and instead you want to dynamically scale | ||
to zero replicas, especially if those replicas have high resource requests. If replicas are scale to 0, HPA also needs the ability to scale up once messages are available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth making extra-clear the cost/energy saving potential brought about with increasing use of GPU workloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as well.
Not required until feature graduated to beta. | ||
- Testing: Are there any tests for failure mode? If not, describe why. | ||
--> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the failure mode (albeit the same as currently) where the controller is unable to fetch the relevant object or external metrics being used for scale from zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
--> | ||
|
||
###### What steps should be taken if SLOs are not being met to determine the problem? | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Co-authored-by: Guy Templeton <[email protected]>
Thanks for the review @gjtempleton , I'll look into your questions until end of week. |
|
||
- [x] Feature gate (also fill in values in `kep.yaml`) | ||
- Feature gate name: `HPAScaleToZero` | ||
- Components depending on the feature gate: `kube-apiserver` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kube-controller-manager? Where does HPA controller live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the gate HPAScaleToZero
is only required on the kube-apiserver
- [x] Feature gate (also fill in values in `kep.yaml`) | ||
- Feature gate name: `HPAScaleToZero` | ||
- Components depending on the feature gate: `kube-apiserver` | ||
- [x] Other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need to fill this section, the feature gate is sufficient
--> | ||
|
||
|
||
Yes. To downgrade the cluster to version that does not support scale-to-zero feature: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or if you turn off the feature gate?
|
||
###### What happens if we reenable the feature if it was previously rolled back? | ||
|
||
Nothing, the feature can be re-enabled without problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, will "stuck at 0" workloads start getting scaled again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
will rollout across nodes. | ||
--> | ||
|
||
There are no expected side-effects when the rollout fails as new `enableScaleToZero` flag should only be enabled once the version upgraded completed and should be disabled before attempting a rollback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the new code in the hpa controller crashes, will it block other HPAs that don't use the feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely certain about the details of panic handling in k8s controllers, but the HPA controller implements essentially a queue with multiple workers, so if one of those workers consistently crashes, other HPAs would be affected as well unless the default k8s queue implementation somehow protects against this.
are missing a bunch of machinery and tooling and can't do that now. | ||
--> | ||
|
||
No yet as no implementation is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you have an alpha implementation?
logs or events for this purpose. | ||
--> | ||
|
||
The feature is used if workloads are scaled to zero by the autoscaling controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have 50,000 clusters, a one liner is not really helpful. Metrics are best. Can kube state metrics help?
If it's not reasonable/practical to have a metric, then other means can be OK.
implementation difficulties, etc.). | ||
--> | ||
|
||
No, in regards to this KEP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments above...
Would it be useful to track how often scaling to/from zero happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of replicas for each workload that can be targeted by a HPA is already a metric, so tracking how often a workload is scaled to 0 works similar to observing scaling activity today. Is this sufficient?
Thanks for the feedback, pushed some updates. |
Currently disabling HPA is possible by manually setting the scaled resource to `replicas: 0`. This works as the HPA itself could never reach this state itself. | ||
As `replicas: 0` is now a possible state when using `minReplicas: 0` it can no longer be used to differentiate between manually disabled or automatically scaled to zero. | ||
|
||
Additionally the `replicas: 0` state is problematic as updating a HPA object `minReplicas` from `0` to `1` has different behavior. If `replicas` was `0` during the update, HPA | ||
will be disabled for the resource, if it was `> 0`, HPA will continue with the new `minReplicas` value. | ||
|
||
To resolve this issue the KEP is introducing an explicit `enableScaleToZero` property to explicitly enable/disable scale from/to zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if instead of having a enableScaleToZero
property, we allow HPAs to scale to 0 if they are for objects or external metrics, and introduce a disabled
flag, allowing someone to disable the HPA completely?
In the world with enableScaleToZero
enabled, how would someone disable that HPA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if instead of having a
enableScaleToZero
property, we allow HPAs to scale to 0 if they are for objects or external metrics, and introduce adisabled
flag, allowing someone to disable the HPA completely?In the world with
enableScaleToZero
enabled, how would someone disable that HPA?
Is it even true that 0 disables? Last time I tested, 0 was rejected as invalid, there was no option for disabling (I had to delete HPAs to "temporarily disabled auto scaling"
(I'm not in a position to test this right now).
FWIW, I think an explicit "allow scaling to zero" flag is a better option anyway. And an explicit disable flag would be very useful regardless of scale to zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it even true that 0 disables? Last time I tested, 0 was rejected as invalid, there was no option for disabling (I had to delete HPAs to "temporarily disabled auto scaling"
On which resource specifically?
The disabling of the HPA is when the scaled resource's replicas is set to zero, see:
- https://github.com/kubernetes/kubernetes/blob/1d73c5d882fdb70037e5944cb9492415b4187774/pkg/controller/podautoscaler/horizontal.go#L817-L821
- https://github.com/kubernetes/kubernetes/pull/29212/files
FWIW, I think an explicit "allow scaling to zero" flag is a better option anyway.
Why so? Isn't setting minReplicas: 0
an explicit way to say "I'm Ok with scaling down to zero pods".
We can also safe-guard it by only allowing objects or external metrics.
replaces: | ||
|
||
# The target maturity stage in the current dev cycle for this KEP. | ||
stage: beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may need to stay as alpha, since enableScaleToZero
is being introduced.
See: https://github.com/kubernetes/community/blob/ecfd8bbf586d7bcd9767171c4055f2cf6502bb85/contributors/devel/sig-architecture/feature-gates.md?plain=1#L80-L81
The scenario I'm worried about is that someone uses enableScaleToZero
in 1.33 (I know the feature missed 1.33, this is just an example) when the feature gate is set to beta, and enabled by default.
They then downgrade to 1.32, and now the API server fails on HPA resources with enableScaleToZero
set.
If enableScaleToZero
had been added in 1.32 with the alpha feature gate defaulted to off, this wouldn't have been a problem.
Enhancement issue: #2021
Rendered version https://github.com/johanneswuerbach/enhancements/blob/kep-2021-alpha/keps/sig-autoscaling/2021-scale-from-zero/README.md