Add retry logic to Azure Service Bus scaler#7339
Add retry logic to Azure Service Bus scaler#7339ramon-carrasco wants to merge 1 commit intokedacore:mainfrom
Conversation
- Add configurable maxRetries parameter (default: 0) - Implement exponential backoff (2s → 60s cap) - Add unit tests for metadata parsing - Update schema files - Respect context cancellation Fixes kedacore#7338 Signed-off-by: Ramon Carrasco <ramon.carrasco@duckcreek.com>
|
Thank you for your contribution! 🙏 Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected. While you are waiting, make sure to:
Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient. Learn more about our contribution guide. |
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
Hello Could this solve your problem? If not, we need to think in something like storing the last valid value and returning it during the "failing cycle" and just keeping to HPA cycles the "retry" during next cycle. WDYT? |
|
Thanks for the feedback! You're absolutely right about the HPA timeout issue, I missed that. While I am almost sure the I'm happy to try to redesign this with a non-blocking approach (return last known value to HPA + background retry). |
What about setting a polling interval of 15 seconds? With this interval, you will have the same interval as the HPA and the retry will be done automatically during next HPA cycle. Also you can reduce the time to 10 seconds and that'll do 6 request per minute to the upstream, which is the current amount of requests done (2 from operator and another 4 from HPA, now 6 from operator and 0 from HPA) |
We are currently using the default polling interval in all scaledObjects, which means that everything should be polling every 30 seconds, which I have always seen as a pretty long interval which shouldn't be the cause of the errors we are seeing with service bus, but I would expect that decreasing that time to 15 seconds would be harmful to the number of errors we are seeing. Is that assumption correct? In periods when there are many instances trying to read from the service bus and getting thousands of errors, I am worried about what increasing the polling frequency would cause. |
The point here is that you are already pulling metrics 6 times per minute with default values. When you enable cached metrics, KEDA operator will store the metric value requested every polling interval and give it to the HPA controller when it request the metric, so if you reduce the polling interval from 30s to 15s but you enable cached metrics, you will do 4 request per minute from operator but 0 from metrics server, so actually you're reducing the request to the upstream (because the HPA controller is querying every 15 seconds in any case) |
Thank you so much for taking the time to explain this in detail. I see your point now and I think it's actually a great advice. I'll give it a try and monitor behavior. |
nice! let us know how it goes 😄 can we close this PR in the meantime? you can re-open it again if needed |
Fixes #7338
Relates to kedacore/keda-docs#1677
Checklist
make generate-scalers-schemahas been run to update any outdated generated files.