Skip to content

Conversation

@ramon-carrasco
Copy link

@ramon-carrasco ramon-carrasco commented Dec 22, 2025

  • Add configurable maxRetries parameter (default: 0)
  • Implement exponential backoff (2s → 60s cap)
  • Add unit tests for metadata parsing
  • Update schema files
  • Respect context cancellation

Fixes #7338

Relates to kedacore/keda-docs#1677

Checklist

  • I have verified that my change is according to the deprecations & breaking changes policy
  • Tests have been added
  • Ensure make generate-scalers-schema has been run to update any outdated generated files.
  • Changelog has been updated and is aligned with our changelog requirements
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Commits are signed with Developer Certificate of Origin (DCO - learn more)

  - Add configurable maxRetries parameter (default: 0)
  - Implement exponential backoff (2s → 60s cap)
  - Add unit tests for metadata parsing
  - Update schema files
  - Respect context cancellation

  Fixes kedacore#7338

Signed-off-by: Ramon Carrasco <[email protected]>
@ramon-carrasco ramon-carrasco requested a review from a team as a code owner December 22, 2025 20:24
@github-actions
Copy link

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

@keda-automation keda-automation requested a review from a team December 22, 2025 20:24
@snyk-io
Copy link

snyk-io bot commented Dec 22, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@JorTurFer
Copy link
Member

Hello
Although I understand the problem, I think that this won't work because HPA will get timeouts during long backoffs. The current code holds the HPA request during the backoff, so I'm sure that timeouts will happen somewhere. The current scenario that you're trying to solve can be "mitigated" by using useCachedMetrics at trigger level. This will make that your metric is requested by the operator and cached, so the HPA will always see a value for it.

Could this solve your problem? If not, we need to think in something like storing the last valid value and returning it during the "failing cycle" and just keeping to HPA cycles the "retry" during next cycle. WDYT?

@ramon-carrasco
Copy link
Author

Thanks for the feedback! You're absolutely right about the HPA timeout issue, I missed that.

While I am almost sure the useCachedMetrics approach would reduce the number of errors we are seeing, we would still be in the position of not getting the real-time queue length that we need to know if scaling is needed, which is what we are trying to solve.

I'm happy to try to redesign this with a non-blocking approach (return last known value to HPA + background retry).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Azure Service Bus Scaler: Add retry logic for transient API failures

2 participants