Skip to content

Comments

Add cooldownOnlyAfterHpaMinReplica to control cooldown behavior#7349

Open
rlanhellas wants to merge 17 commits intokedacore:mainfrom
rlanhellas:issue-7204
Open

Add cooldownOnlyAfterHpaMinReplica to control cooldown behavior#7349
rlanhellas wants to merge 17 commits intokedacore:mainfrom
rlanhellas:issue-7204

Conversation

@rlanhellas
Copy link
Contributor

@rlanhellas rlanhellas commented Dec 30, 2025

Added a bool field spec.advanced.cooldownOnlyAfterHpaMinReplica, when enabled it does the following:

  • Check if HPA min replicas is equal to current replicas.
  • Wait for the cooldown period before scaling to zero.

Checklist

Fixes #7204

Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
@rlanhellas rlanhellas requested a review from a team as a code owner December 30, 2025 18:02
@github-actions
Copy link

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

@keda-automation keda-automation requested a review from a team December 30, 2025 18:02
@snyk-io
Copy link

snyk-io bot commented Dec 30, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

init container is started, or after any startupProbe has successfully
completed.
type: string
securityContext:
Copy link

@semgrep-app semgrep-app bot Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Kubernetes, each pod runs in its own isolated environment with its own set of security policies. However, certain container images may contain setuid or setgid binaries that could allow an attacker to perform privilege escalation and gain access to sensitive resources. To mitigate this risk, it's recommended to add a securityContext to the container in the pod, with the parameter allowPrivilegeEscalation set to false. This will prevent the container from running any privileged processes and limit the impact of any potential attacks. By adding the allowPrivilegeEscalation parameter to your the securityContext, you can help to ensure that your containerized applications are more secure and less vulnerable to privilege escalation attacks.

🍰 Removed in commit c819f49 🍰

@semgrep-app
Copy link

semgrep-app bot commented Dec 30, 2025

Semgrep found 2 context-todo findings:

Consider to use well-defined context

@rickbrouwer
Copy link
Member

Since I don't think the example in the issue is entirely accurate, I'm wondering what we're actually trying to solve here. Could you describe the use case again?

@rlanhellas
Copy link
Contributor Author

rlanhellas commented Dec 31, 2025

Since I don't think the example in the issue is entirely accurate, I'm wondering what we're actually trying to solve here. Could you describe the use case again?

Sure thing. Let me try to explain ...

Imagine you have the following configs: minReplicas=0, maxReplicas=10. HPA will scale out until 10 pods, but when there is no more load the KEDA will scale to 0 immediately after cooldownPeriod. The request from issue #7204 was to be able to let HPA scale in until minReplicas=1 and then KEDA can start the cooldownPeriod to scale to 0. In that situation we give a chance PODs receive some remaining load before going to 0.

So the behavior will be something like that:

  • We reach max 10 PODs.
  • Load stops.
  • HPA scale in until 1 POD.
  • Keda starts cooldownPeriod after HPA is 1 POD (min replicas).
  • Keda scales to 0 after 5 min.

Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
@rickbrouwer
Copy link
Member

Ah, yes, now I remember. Thanks. Considering that the KEDA cooldown period only applies when scaling to 0, I find the current naming a bit confusing, but...I'm still thinking what the best solution is to solve this. @JorTurFer suggests solving this by "not replacing current behavior but supporting both."

But reading the documentation about the cooldownPeriod:

"The period to wait after the last reported active trigger before scaling the resource back to 0, in seconds."

But also:

"Additionally, the KEDA cooldownPeriod only applies when scaling to 0; scaling from 1 to N replicas is handled by the Kubernetes Horizontal Pod Autoscaler."

The issue in that sense, I think, is that because of the scaleDown policy you can specify, the cooldownPeriod is already "exhausted" while the policy is still executing, and therefore scaling back to 0...(?)

In retrospect, I think, also because the documentation indicates, it might have been better if scaling from N to 1 were handled purely by the HPA, and the KEDA cooldownPeriod would handle scaling from 1 to 0. This will make it easier to explain the responsibilities, as we now have to consider the scaleDown calculation and the cooldownPeriod (separation of concerns).

Perhaps I'm overthinking things, but I'm very curious how we'll explain this simply in the documentation.

@rlanhellas
Copy link
Contributor Author

rlanhellas commented Dec 31, 2025

Ah, yes, now I remember. Thanks. Considering that the KEDA cooldown period only applies when scaling to 0, I find the current naming a bit confusing, but...I'm still thinking what the best solution is to solve this. @JorTurFer suggests solving this by "not replacing current behavior but supporting both."

But reading the documentation about the cooldownPeriod:

"The period to wait after the last reported active trigger before scaling the resource back to 0, in seconds."

But also:

"Additionally, the KEDA cooldownPeriod only applies when scaling to 0; scaling from 1 to N replicas is handled by the Kubernetes Horizontal Pod Autoscaler."

The issue in that sense, I think, is that because of the scaleDown policy you can specify, the cooldownPeriod is already "exhausted" while the policy is still executing, and therefore scaling back to 0...(?)

In retrospect, I think, also because the documentation indicates, it might have been better if scaling from N to 1 were handled purely by the HPA, and the KEDA cooldownPeriod would handle scaling from 1 to 0. This will make it easier to explain the responsibilities, as we now have to consider the scaleDown calculation and the cooldownPeriod (separation of concerns).

Perhaps I'm overthinking things, but I'm very curious how we'll explain this simply in the documentation.

IMO after this new behavior cooldown period can have 2 behaviors which we can explain in this way:

  • When cooldownOnlyAfterHpaMinReplica is disabled (default behavior) the cooldownPeriod will start to countdown as soon as scaler become inactive, so it will scale to zero regardless of HPA scale down policy.
  • When cooldownOnlyAfterHpaMinReplica is enabled (new behavior) the cooldownPeriod will start to countdown ONLY after PODs reaches min replicas in HPA, it means, we wait for HPA scale to min replicas before scaling to zero, we respect the HPA scale down policy before set pods to zero.

We can give some examples to be more clear of course, giving examples is the easier way to explain it.

@rlanhellas
Copy link
Contributor Author

@rickbrouwer may you run the e2e tests?

@rickbrouwer
Copy link
Member

rickbrouwer commented Jan 3, 2026

I see several files point to your private repository, which can't be the intention.

Furthermore, I'd like to discuss adding this field with the other @kedacore/keda-contributors first because I still have some questions about this field

Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
@rlanhellas
Copy link
Contributor Author

I see several files point to your private repository, which can't be the intention.

Furthermore, I'd like to discuss adding this field with the other @kedacore/keda-contributors first because I still have some questions about this field

Just removed the extra yaml files @rickbrouwer. About the new field, maybe is better to include @JorTurFer and @zroubalik , they were aware about this new field.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new feature cooldownOnlyAfterHpaMinReplica to allow KEDA to honor HPA scale-down policies before scaling to zero. When enabled, KEDA waits for the HPA to gradually scale down to the minimum replica count, then applies the cooldown period before finally scaling to zero. This addresses issue #7204 and helps reduce latency during short traffic bursts by ensuring pods remain available during gradual scale-down.

  • Adds CooldownOnlyAfterHpaMinReplica boolean field to ScaledObject spec
  • Introduces HpaMinReplicaSinceTime status field to track when replicas reach HPA minimum
  • Updates scaling logic to respect the new cooldown behavior when feature is enabled

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
apis/keda/v1alpha1/scaledobject_types.go Adds CooldownOnlyAfterHpaMinReplica field to AdvancedConfig and HpaMinReplicaSinceTime to ScaledObjectStatus
apis/keda/v1alpha1/zz_generated.deepcopy.go Auto-generated DeepCopy method for the new HpaMinReplicaSinceTime field
pkg/scaling/executor/scale_scaledobjects.go Core logic to track HPA min replica time and implement new cooldown behavior
pkg/scaling/executor/scale_scaledobjects_test.go Unit tests validating the new cooldown feature behavior
tests/internals/polling_cooldown_so/polling_cooldown_so_test.go Integration test for the cooldown-only-after-HPA-min-replica feature
config/crd/bases/keda.sh_scaledobjects.yaml CRD schema updates for new fields
config/manager/kustomization.yaml CRITICAL: Changes image to personal Docker Hub account (should be reverted)
config/metrics-server/kustomization.yaml CRITICAL: Changes image to personal Docker Hub account (should be reverted)
config/webhooks/kustomization.yaml CRITICAL: Changes image to personal Docker Hub account (should be reverted)
pkg/scalers/liiklus/LiiklusService_grpc.pb.go Auto-generated protobuf code with version updates and typo
pkg/scalers/liiklus/LiiklusService.pb.go Auto-generated protobuf code with version updates
pkg/scalers/externalscaler/externalscaler_grpc.pb.go Auto-generated protobuf code with version updates and typo
pkg/scalers/externalscaler/externalscaler.pb.go Auto-generated protobuf code with version updates
pkg/metricsservice/api/metrics_grpc.pb.go Auto-generated protobuf code with version updates and typo
pkg/metricsservice/api/metrics.pb.go Auto-generated protobuf code with version updates
pkg/mock/mock_scaling/mock_executor/mock_interface.go Auto-generated mock with import reordering
CHANGELOG.md Documents the new feature with spelling error

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rlanhellas and others added 6 commits January 5, 2026 08:52
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
rlanhellas and others added 2 commits January 5, 2026 08:53
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Ronaldo Lanhellas <ronaldo.lanhellas@gmail.com>
@rickbrouwer
Copy link
Member

rickbrouwer commented Jan 6, 2026

The proposal in the issue is to honor the scale-down policy defined in the Horizontal Pod Autoscaler (HPA) before enforcing the scale to zero.
I think I agree with that.
I just ran a test where I created a scaleDown policy with a time longer than my cooldownPeriod. However, as soon as the cooldownPeriod expires, it immediately scales it to 0 instead of respecting my configured scaleDown policy.

The cooldownPeriod should only be from 1 to 0. It's perfectly fine for the cooldownPeriod to start counting as soon as the last trigger indicates it's no longer active, but when the minReplica is greater than 1, it simply shouldn't scale to 0 in my opinion, since there's likely still an HPA scaleDown policy running and it must remain respected.

Once the cooldownPeriod is over and the minReplica is 1, KEDA can scale it to 0 because everything is fine.

This way, we respect the HPA scaleDown policy, but also the KEDA cooldown period.
WDYT @wozniakjan ?

@rlanhellas
Copy link
Contributor Author

cooldownPeriod

It's exactly the way it's implemented @rickbrouwer .

@rickbrouwer
Copy link
Member

rickbrouwer commented Jan 7, 2026

That's nice, but I think it should be without an extra (new) configuration parameter.

The documentation also states that the cooldownPeriod can only be used to scale down to 0 replicas, and that everything else is handled by the HPA (however, the documentation states scaling from 1 to N replicas is handled by Kubernetes Horizontal Pod Autoscaler, which I believe should be scaling from N to 1 replicas is handled by Kubernetes Horizontal Pod Autoscaler, because this is about explanation about cooldown).

I'll give another example.
Suppose I have a minReplicaCount of 0 and a maxReplicaCount of 50. I do NOT set a cooldownPeriod. I'm going to configure a proper scaleDown policy because I want it to gradually scale down at a certain rate. Because of my many running replicas, that could take 10 minutes, for example. I configure that, so that's what I expect. After all, we ourselves define KEDA as being on top of HPA. So, I don't configure a cooldownPeriod, but because the default is 300 seconds, KEDA suddenly overrules my own scaleDown policy and scales straight to 0 pods.
In that sense, that would mean I have to calculate for each ScaledObject how long my scaleDown policy can theoretically last and then calculate my cooldownPeriod based on that so they don't conflict. That can't be right.

Furthermore, I asked several users yesterday, just to check, if they could explain the cooldownPeriod in combination with a scaleDown. When I indicated that the cooldownPeriod now overrules the configured scaleDown policy, they all said it must be a bug.

My argument now remains that we shouldn't want an extra parameter. That would only mean extra explanation for (new) users, making things unnecessarily complicated.

I think the fix should be that once the cooldownPeriod has expired it should only scale to 0 once the minReplica is 1, not before.

@rlanhellas
Copy link
Contributor Author

rlanhellas commented Jan 7, 2026

That's nice, but I think it should be without an extra (new) configuration parameter.

The documentation also states that the cooldownPeriod can only be used to scale down to 0 replicas, and that everything else is handled by the HPA (however, the documentation states scaling from 1 to N replicas is handled by Kubernetes Horizontal Pod Autoscaler, which I believe should be scaling from N to 1 replicas is handled by Kubernetes Horizontal Pod Autoscaler, because this is about explanation about cooldown).

I'll give another example. Suppose I have a minReplicaCount of 0 and a maxReplicaCount of 50. I do NOT set a cooldownPeriod. I'm going to configure a proper scaleDown policy because I want it to gradually scale down at a certain rate. Because of my many running replicas, that could take 10 minutes, for example. I configure that, so that's what I expect. After all, we ourselves define KEDA as being on top of HPA. So, I don't configure a cooldownPeriod, but because the default is 300 seconds, KEDA suddenly overrules my own scaleDown policy and scales straight to 0 pods. In that sense, that would mean I have to calculate for each ScaledObject how long my scaleDown policy can theoretically last and then calculate my cooldownPeriod based on that so they don't conflict. That can't be right.

Furthermore, I asked several users yesterday, just to check, if they could explain the cooldownPeriod in combination with a scaleDown. When I indicated that the cooldownPeriod now overrules the configured scaleDown policy, they all said it must be a bug.

My argument now remains that we shouldn't want an extra parameter. That would only mean extra explanation for (new) users, making things unnecessarily complicated.

I think the fix should be that once the cooldownPeriod has expired it should only scale to 0 once the minReplica is 1, not before.

I got your point. You understand that current behavior is a bug, so we should fix it with the new behavior and not creating a feature flag to enable or not this new behavior. I agree with you, for me going to zero without respecting the sacleDown policy is odd, I can't see a user using it in this way.

@rickbrouwer
Copy link
Member

Great. I'm curious if @JorTurFer and/or @wozniakjan can confirm that we want to solve this so you (@rlanhellas) can continue with this PR (only then with the setup as I described).

@JorTurFer JorTurFer mentioned this pull request Jan 11, 2026
44 tasks
@JorTurFer
Copy link
Member

JorTurFer commented Jan 11, 2026

I agree with you @rickbrouwer in the sense of how it should be, the problem that I see is that a bug that has been there for 7 years, it's a feature and not a bug. For a hypothetical v3, we definitively should improve this behavior, but I'd not change it at this moment because IMHO it's a breaking change.

For example, we use scaling to zero for some workers and scaling up as fast as we need but removing resources also fast is part of the behavior that we rely on. I agree that we should set scaling down cooldown to 0 and that's all because it means that we are almost in the same scenario, but it requires manual intervention and I'm not sure if we should force it (we have just 2-3 cases so my personal case doesn't matter, but other customers could complain because of this too)

An option that comes to my mind to handle this is managing this behavior using annotations or operator flag, we could flag the current behavior as legacy and explain that eventually we will modify the behavior to the new one. The problem I see about this is that we need a solid plan to remove these flags, an eventual v3 or a CRD v2 or so.

Personally, I'm not against changing the behavior but we need to honor the deprecation policy somehow (and probably revisit it in general because this is totally forbidden currently)

@kedacore/keda-core-maintainers ?

@rlanhellas
Copy link
Contributor Author

rlanhellas commented Jan 11, 2026

I agree with you @rickbrouwer in the sense of how it should be, the problem that I see is that a bug that has been there for 7 years, it's a feature and not a bug. For a hypothetical v3, we definitively should improve this behavior, but I'd not change it at this moment because IMHO it's a breaking change.

For example, we use scaling to zero for some workers and scaling up as fast as we need but removing resources also fast is part of the behavior that we rely on. I agree that we should set scaling down cooldown to 0 and that's all because it means that we are almost in the same scenario, but it requires manual intervention and I'm not sure if we should force it (we have just 2-3 cases so my personal case doesn't matter, but other customers could complain because of this too)

An option that comes to my mind to handle this is managing this behavior using annotations or operator flag, we could flag the current behavior as legacy and explain that eventually we will modify the behavior to the new one. The problem I see about this is that we need a solid plan to remove these flags, an eventual v3 or a CRD v2 or so.

Personally, I'm not against changing the behavior but we need to honor the deprecation policy somehow (and probably revisit it in general because this is totally forbidden currently)

@kedacore/keda-core-maintainers ?

in sense of breaking change and since this feat is 7y in use, it's really a risk to just change the behavior and keeping the current feat flag proposed in PR would be better.

@rickbrouwer
Copy link
Member

rickbrouwer commented Jan 12, 2026

Yes, in that sense, the bug might have become a feature because it's being misused.

If we're going to assume this has been in place for seven years, I'm not in favor of adding another parameter to fix potential bug abuse. Because then we'll have to deal with it later and phase it out.

Perhaps we should indeed move to V3 then, along with a few other major items already marked for V3.

I'd really like the PR to wait until we're absolutely sure about the choice. We shouldn't rush into a decision and end up with a new parameter that we then have to phase out later, with all the new consequences that entails.

I'll try to be at the community call tomorrow. We can discuss it then.

@wozniakjan
Copy link
Member

wozniakjan commented Jan 15, 2026

sorry for being late to the party, I heard this was a fun discussion so I finally arrived :)

I guess we can't easily claim this as a bug and fix it entirely, but we also don't want to complicate the SO API UX with extra field that will need to be eventually deprecated and phased out.

But the operator cli argument doesn't sound half bad to me. It's something that won't bother KEDA users with complicated SO and won't be API parameter we need to go through deprecation path. But it will give the KEDA admin power to get the desired behavior for their organization. By default, we probably want to keep the original legacy behavior - that means spec.cooldownPeriod (and initialCooldownPeriod which is imho worse) will ignore custom spec.advanced.horizontalPodAutoscalerConfig.behavior.scaleDown, but in the docs we should be more explicit about what this means and how KEDA admins can enforce different KEDA behavior that is more in line with general user expectations - wait for HPA to scale to 1 and then start the countdown for scale to zero.

@rickbrouwer
Copy link
Member

Then I would opt for a operator cli argument/flag and, logically, mention this in the documentation. That ensures we don't end up with an extra parameter that we might have to deprecate later, and which would then also affect users.

@rlanhellas
Copy link
Contributor Author

Then I would opt for a operator cli argument/flag and, logically, mention this in the documentation. That ensures we don't end up with an extra parameter that we might have to deprecate later, and which would then also affect users.

any suggestions to cli argument name ?

@rickbrouwer
Copy link
Member

I have these two proposals:

strictCooldownBehavior (true / false)
enableStrictCooldown (true / false)

@rlanhellas
Copy link
Contributor Author

strictCooldownBehavior

So, strictCooldownBehavior=true will execute the new behavior where Keda respect HPA scale down policy, right ?

@rickbrouwer
Copy link
Member

strictCooldownBehavior

So, strictCooldownBehavior=true will execute the new behavior where Keda respect HPA scale down policy, right ?

Yes. I think that will require an adjustment in the code scaleToZeroOrIdle.

@rlanhellas
Copy link
Contributor Author

strictCooldownBehavior

So, strictCooldownBehavior=true will execute the new behavior where Keda respect HPA scale down policy, right ?

Yes. I think that will require an adjustment in the code scaleToZeroOrIdle.

may I go ahead with this solution or do you want to wait for someone else to take a look on this ?

@rickbrouwer
Copy link
Member

Yeah, I think it's good that you want to start with it. And also thank you for your patience regarding this piece!
At most, the name of the flag may be changed if others disagree with the name or have a better name, but that is purely adjusting the flag, which is not much more work than.

So from my side, go ahead :)

@wozniakjan
Copy link
Member

wozniakjan commented Jan 16, 2026

ptal @JorTurFer and @zroubalik - tl;dr instead of SO field, the behavior will be controller by operator CLI flag, defaulting to the legacy behavior ignoring custom HPA scale down policies.

I personally like the operator CLI arg, also the --strictCooldownBehavior sounds good to me (alternatively I would propose --honorHpaScaleDownForCooldownPeriod if we need more options, but imho the --strictCooldownBehavior is better).

@JorTurFer
Copy link
Member

The cli solution is perfect and the current name makes sense, so it's fine IMHO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Honour HPA Scale-Down Policy When Scaling Down to Zero Replicas

4 participants