Skip to content

Comments

feat: Use informer cache for ReplicaSet replica count lookups#7466

Open
mattshep wants to merge 4 commits intokedacore:mainfrom
daily-co:replicaset-informer-cache
Open

feat: Use informer cache for ReplicaSet replica count lookups#7466
mattshep wants to merge 4 commits intokedacore:mainfrom
daily-co:replicaset-informer-cache

Conversation

@mattshep
Copy link

GetCurrentReplicas() has special handling for Deployments and StatefulSets that uses the controller-runtime client (backed by informer cache) instead of the scale subresource API. This avoids live API calls on every poll.

ReplicaSets were missing this optimization and always used the scale subresource, causing a live API call every polling interval. With many ScaledObjects targeting ReplicaSets, this creates significant API server load.

This change adds ReplicaSet to the list of resource types that use the informer cache, reducing API calls for ReplicaSet-targeted ScaledObjects.

Validated in a staging environment with 67 ScaledObjects targeting ReplicaSets:

  • Before: ~767 GET requests/minute to /apis/apps/v1/.../replicasets/.../scale
  • After: Dropped to ~128/minute

No unit tests found for existing case statements.

Checklist

  • When introducing a new scaler, I agree with the scaling governance policy
  • I have verified that my change is according to the deprecations & breaking changes policy
  • Tests have been added (if applicable)
  • Ensure make generate-scalers-schema has been run to update any outdated generated files
  • Changelog has been updated and is aligned with our changelog requirements, only when the change impacts end users
  • A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Commits are signed with Developer Certificate of Origin (DCO - learn more)

@mattshep mattshep requested a review from a team as a code owner February 21, 2026 14:05
@github-actions
Copy link

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

@snyk-io
Copy link

snyk-io bot commented Feb 21, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@keda-automation keda-automation requested a review from a team February 21, 2026 14:05
Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there isn't any way to test this with unit tests, could you tell us something about the use case where you need to scale replicasets directly? We can cover it using e2e tests adding one case where KEDA scales the rs directly, ensuring that this code works

@mattshep
Copy link
Author

We run a platform where users deploy workloads that scale based on a Redis queue (pending session requests). When a user pushes a new version, we want the old version to remain independently scalable — for example, to handle traffic surges if the new version is broken.

With Deployment-targeted ScaledObjects, only the current ReplicaSet scales; old ReplicaSets are stuck at their replica count. We create ReplicaSets directly (rather than one Deployment per revision) because ReplicaSets are the natural primitive — Deployments add rollout logic we don't need since we manage revision lifecycle ourselves.

Happy to help with e2e test scenarios if useful.

@JorTurFer
Copy link
Member

We run a platform where users deploy workloads that scale based on a Redis queue (pending session requests). When a user pushes a new version, we want the old version to remain independently scalable — for example, to handle traffic surges if the new version is broken.

With Deployment-targeted ScaledObjects, only the current ReplicaSet scales; old ReplicaSets are stuck at their replica count. We create ReplicaSets directly (rather than one Deployment per revision) because ReplicaSets are the natural primitive — Deployments add rollout logic we don't need since we manage revision lifecycle ourselves.

Happy to help with e2e test scenarios if useful.

With this in mind, I'd take some test like https://github.com/kedacore/keda/tree/main/tests/internals/subresource_scale as inspiration and write another specific for replicaset directly. The main goal is to have this new use case covered to avoid breaking it in the future.

Does it make sense?

@keda-automation keda-automation requested a review from a team February 22, 2026 22:57
@JorTurFer
Copy link
Member

JorTurFer commented Feb 22, 2026

/run-e2e replicaset
Update: You can check the progress here

mattshep added a commit to daily-co/keda-charts that referenced this pull request Feb 23, 2026
Syncs RBAC changes with core: kedacore/keda#7466

This adds list/watch permissions for ReplicaSets to enable the informer
cache optimization when targeting ReplicaSets directly with ScaledObjects.

Signed-off-by: Matt Sheppard <matthewlouissheppard@gmail.com>
@keda-automation keda-automation requested a review from a team February 23, 2026 13:21
GetCurrentReplicas() has special handling for Deployments and StatefulSets
that uses the controller-runtime client (backed by informer cache) instead
of the scale subresource API. This avoids live API calls on every poll.

ReplicaSets were missing this optimization and always used the scale
subresource, causing a live API call every polling interval. With many
ScaledObjects targeting ReplicaSets, this creates significant API server
load.

This change adds ReplicaSet to the list of resource types that use the
informer cache, reducing API calls for ReplicaSet-targeted ScaledObjects.

Signed-off-by: Matt Sheppard <matt.sheppard@daily.co>
Signed-off-by: Matt Sheppard <matt.sheppard@daily.co>
Signed-off-by: Matt Sheppard <matt.sheppard@daily.co>
- Add replicasets to ClusterRole for informer cache list/watch
- Move WaitForReplicaSetReplicaReadyCount to tests/helper package
- Update e2e test to use shared helper function

Signed-off-by: Matt Sheppard <matt.sheppard@daily.co>
@mattshep mattshep force-pushed the replicaset-informer-cache branch from 5b08cfb to 740515e Compare February 23, 2026 13:27
@JorTurFer
Copy link
Member

JorTurFer commented Feb 23, 2026

/run-e2e replicaset
Update: You can check the progress here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants