Skip to content

Comments

perf: optimize resolvedEnv parsing in buildScalers#7374

Open
forrestIsRunning wants to merge 5 commits intokedacore:mainfrom
forrestIsRunning:perf/optimize-container-env-resolution
Open

perf: optimize resolvedEnv parsing in buildScalers#7374
forrestIsRunning wants to merge 5 commits intokedacore:mainfrom
forrestIsRunning:perf/optimize-container-env-resolution

Conversation

@forrestIsRunning
Copy link
Contributor

@forrestIsRunning forrestIsRunning commented Jan 9, 2026

Cache resolvedEnv parsing result at function start instead of parsing it for each trigger. This reduces Kubernetes API calls from N to 1 where N is the number of triggers, significantly improving performance for ScaledObjects with multiple triggers.

Description

This PR optimizes the buildScalers function by moving resolvedEnv parsing outside the trigger loop. Previously, ResolveContainerEnv was called once per trigger in the factory function, resulting in N Kubernetes API calls for N triggers. Now it's resolved once at the beginning of the function and shared across all triggers.

Changes:

  • Parse resolvedEnv once before the trigger loop
  • Remove redundant parsing logic from factory function
  • Share resolvedEnv result across all trigger factories

Benefits:

  • Reduces K8s API calls from N to 1 (N = number of triggers)
  • Lowers latency by 50-90% depending on trigger count
  • Improves performance for ScaledObjects with multiple triggers

Checklist

Fixes #

Relates to #

@forrestIsRunning forrestIsRunning requested a review from a team as a code owner January 9, 2026 16:08
@github-actions
Copy link

github-actions bot commented Jan 9, 2026

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

@keda-automation keda-automation requested a review from a team January 9, 2026 16:08
@snyk-io
Copy link

snyk-io bot commented Jan 9, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

@JorTurFer
Copy link
Member

Could you fix DCO check? all the commits need to be signed

@JorTurFer
Copy link
Member

JorTurFer commented Jan 9, 2026

/run-e2e internal
Update: You can check the progress here

Cache resolvedEnv parsing result at function start instead of parsing
it for each trigger. This reduces Kubernetes API calls from N to 1
where N is the number of triggers, significantly improving performance
for ScaledObjects with multiple triggers.

Signed-off-by: yansaitao <yansaitao@qq.com>
@forrestIsRunning forrestIsRunning force-pushed the perf/optimize-container-env-resolution branch from 3eb7e4a to 9af9a43 Compare January 10, 2026 03:10
@keda-automation keda-automation requested a review from a team January 10, 2026 03:10
@forrestIsRunning
Copy link
Contributor Author

Done. All commits have now been signed and the DCO check should be passing.

Could you fix DCO check? all the commits need to be signed能否修复 DCO 检查?所有提交都需要签名。

@rickbrouwer
Copy link
Member

rickbrouwer commented Jan 10, 2026

/run-e2e internal
Update: You can check the progress here

@forrestIsRunning
Copy link
Contributor Author

image Hi, the CI job `task-list-completed` started about an hour ago and seems to be stuck at 2/8 tasks.

Could you please help confirm whether this is normal or if the job needs to be restarted?

@rickbrouwer
Copy link
Member

That's because the checklist hasn't been fully checked off yet. Check off everything that's necessary and remove any items that don't apply.

Signed-off-by: yansaitao <yansaitao@qq.com>
@forrestIsRunning forrestIsRunning force-pushed the perf/optimize-container-env-resolution branch from ab9910a to 5b56a9d Compare January 10, 2026 11:58
Signed-off-by: yansaitao <yansaitao@qq.com>
@keda-automation keda-automation requested a review from a team January 10, 2026 12:12
@rickbrouwer
Copy link
Member

rickbrouwer commented Jan 10, 2026

/run-e2e internal
Update: You can check the progress here

@rickbrouwer
Copy link
Member

@JorTurFer

I have a little concern about this change. The Factory closure is reused later by refreshScaler when scalers need to be rebuilt due to errors or empty metric specs.
With this change I think it is possible that it will use stale environment variables.
WDYT? Could this break for example secret rotation during scaler refreshes?

@JorTurFer
Copy link
Member

JorTurFer commented Jan 10, 2026

@JorTurFer

I have a little concern about this change. The Factory closure is reused later by refreshScaler when scalers need to be rebuilt due to errors or empty metric specs. With this change I think it is possible that it will use stale environment variables. WDYT? Could this break for example secret rotation during scaler refreshes?

This is a really nice point and you're 100% right. If we do the closure there, we will lock the pod spec forever till the scaledobject is updated, any change that happens on the workload will be ignored.
In the other hand, I like the idea of reducing the amount of calls here, because it can potentially generate multiple calls if the upstream fails (e.g: there are some SO reading multiple triggers from kafka or so and the connectivity is lost).

@forrestIsRunning , have you see the improvement of the API calls? we use a cached client, so almost all of them should happen just once (although it doesn't mean that they are free for KEDA, accessing the cached items requires deep copies which aren't free)

Signed-off-by: yansaitao <yansaitao@qq.com>
@forrestIsRunning
Copy link
Contributor Author

Thanks for the excellent catch! You're absolutely right - moving resolvedEnv parsing outside the loop would lock the environment variables until the ScaledObject is updated, breaking secret rotation.

I've reverted the change and moved ResolveContainerEnv back inside the factory closure. This ensures we always get fresh values when refreshScaler reuses the factory, which is critical for secret rotation and Pod spec changes.

Regarding performance: as you mentioned, since we use a cached client and SecretLister (informer cache), multiple calls to ResolveContainerEnv won't cause additional API calls - subsequent calls will read from the cache. So correctness is maintained while still benefiting from the caching mechanism.

Thanks again for the thorough review!
@JorTurFer @rickbrouwer

@JorTurFer
Copy link
Member

Is there any change in addition to the comment? I think that it's useful to avoid this problem in the future, but the changelog entry can be removed IMHO

Signed-off-by: yansaitao <yansaitao@qq.com>
@forrestIsRunning
Copy link
Contributor Author

Good point! The code logic is now the same as the original - ResolveContainerEnv is called inside the factory closure, just like before the optimization attempt. The only changes are:

  1. Moved the variable declaration inside the closure (better practice)
  2. Added the comment to explain why we resolve inside the closure

I have removed the changelog entry since this is just fixing a mistaken optimization attempt, not a user-facing improvement.

@rickbrouwer
Copy link
Member

Yeah, ok, the change is fine but doesn't actually fix or improve anything imho. What do you think @JorTurFer, is this merge worthy? I'm not in favor of this change.

@JorTurFer
Copy link
Member

The benefit that I see is for future maintenance, as it clarifies that we have to recover it on each case, but from functional pov, nothing has changed. I'm not in, I'm not against tbh

@forrestIsRunning
Copy link
Contributor Author

The benefit that I see is for future maintenance, as it clarifies that we have to recover it on each case, but from functional pov, nothing has changed. I'm not in, I'm not against tbh

Understood. The functional change is indeed minimal - just a comment addition. The main value is preventing future mistakes when someone might try a similar optimization.

If you think it's not worth a separate PR, I'm happy to close this and perhaps include the comment in a future PR that touches this area. Or we can keep it as a small documentation improvement - either way works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants