Skip to content

Retry all non-200 API responses with exponential backoff#4

Merged
emac-E merged 3 commits into
emac-E:mainfrom
Lifto:feat/retry-all-non-200
Apr 6, 2026
Merged

Retry all non-200 API responses with exponential backoff#4
emac-E merged 3 commits into
emac-E:mainfrom
Lifto:feat/retry-all-non-200

Conversation

@Lifto

@Lifto Lifto commented Apr 6, 2026

Copy link
Copy Markdown

Summary

  • Retries any non-200 HTTP response from the API, not just 429
  • Wraps the infer endpoint with the retry decorator (was missing — query and streaming already had it)
  • Renames _is_too_many_requests_error_is_retryable_server_error to reflect broader scope

Motivation

Transient errors from lightspeed-stack are common and varied:

  • 500: Upstream rate limits (429 masked as 500), GCP auth network blips
  • 503: Upstream LLM timeouts
  • 504: nginx gateway timeouts

Previously only 429 was retried, and the infer endpoint had no retry at all. This caused false errors in evaluation runs — questions that would have passed on retry were reported as failures.

Behavior

  • Retries up to num_retries (default 3) with exponential backoff (4s, 4s, 8s)
  • Each retry is logged at WARNING level via tenacity's before_sleep_log
  • If all retries fail, the error is reported as before

Previously only 429 was retried. Transient errors like 500, 503, and
504 from the upstream stack are common (rate limits, timeouts, GCP auth
blips) and worth retrying. Also wraps the infer endpoint with the retry
decorator, which was missing.
Comment thread src/lightspeed_evaluation/core/api/client.py Outdated
For consistency keeping double quotes for keys like the other keys in this project. Removed debugging color that I put in as temporary.
Comment thread src/lightspeed_evaluation/core/api/client.py Outdated
like line 298, for consistency and rem temp color that I was watching in DEBUG output

@emac-E emac-E left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, I changed just a couple lines for consistency but thanks for adding this! I had made my changes before the LS-eval team added retry, so skipped updating that and luckily have not had a lot of errors so didn't notice the need! Thank you!

@emac-E emac-E merged commit ffbf731 into emac-E:main Apr 6, 2026
5 of 15 checks passed
emac-E pushed a commit that referenced this pull request Apr 10, 2026
emac-E added a commit that referenced this pull request Apr 10, 2026
Retry all non-200 API responses with exponential backoff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants