Skip to content

fix: tolerate recoverable cloud API failures in APIBasedLLM#378

Open
officialasishkumar wants to merge 2 commits intokubeedge:mainfrom
officialasishkumar:fix/joint-inference-cloud-api-failures
Open

fix: tolerate recoverable cloud API failures in APIBasedLLM#378
officialasishkumar wants to merge 2 commits intokubeedge:mainfrom
officialasishkumar:fix/joint-inference-cloud-api-failures

Conversation

@officialasishkumar
Copy link
Copy Markdown

What type of PR is this?

/kind bug
/kind test

What this PR does / why we need it:

This PR prevents a single recoverable cloud API failure from aborting the cloud-edge collaborative LLM benchmark.

  • treat content-filter 400s and retryable provider status failures as per-sample failures
  • return an empty structured response with error metadata so downstream parsing and metrics can continue
  • preserve fail-fast behavior for invalid request/configuration errors
  • add focused unit coverage for the recoverable and fail-fast paths

Which issue(s) this PR fixes:
Fixes #356

Return an empty structured response for content-filter and retryable
provider errors so a single bad sample does not abort the entire
benchmarking run.

Add focused unit coverage for the fallback path while preserving
fail-fast behavior for invalid request configuration errors.

Fixes kubeedge#356

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
@kubeedge-bot kubeedge-bot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 9, 2026
@kubeedge-bot
Copy link
Copy Markdown
Collaborator

@officialasishkumar: The label(s) kind/test cannot be applied, because the repository doesn't have them

Details

In response to this:

What type of PR is this?

/kind bug
/kind test

What this PR does / why we need it:

This PR prevents a single recoverable cloud API failure from aborting the cloud-edge collaborative LLM benchmark.

  • treat content-filter 400s and retryable provider status failures as per-sample failures
  • return an empty structured response with error metadata so downstream parsing and metrics can continue
  • preserve fail-fast behavior for invalid request/configuration errors
  • add focused unit coverage for the recoverable and fail-fast paths

Which issue(s) this PR fixes:
Fixes #356

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubeedge-bot kubeedge-bot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Apr 9, 2026
@kubeedge-bot kubeedge-bot requested review from Poorunga and hsj576 April 9, 2026 17:20
@kubeedge-bot
Copy link
Copy Markdown
Collaborator

Welcome @officialasishkumar! It looks like this is your first PR to kubeedge/ianvs 🎉

@kubeedge-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: officialasishkumar
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 9, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces robust API error handling to the APIBasedLLM class, defining recoverable status codes and content filter error types. It adds methods to extract error details, determine if an error is recoverable, and construct an empty response with error information for such cases. The _infer method's exception handling is updated to utilize this new logic, logging warnings for recoverable errors and re-raising others. New unit tests are also included to validate this behavior. Feedback indicates a significant issue with the inverted retry logic, where transient errors are not retried and permanent errors are, suggesting a refactoring to correct this. Additionally, it's recommended to preserve the original exception's traceback when re-raising RuntimeError for better debugging.

throughput = 0

except Exception as e:
if self._is_recoverable_api_error(e):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation inverts the expected retry logic for transient vs. permanent errors.

  1. Transient Errors (e.g., 503, 429): These are included in RECOVERABLE_API_STATUS_CODES. Because they are caught inside the _infer method (which is decorated with @retry), the decorator will see a successful return value (the error response) and will not perform any retries. This reduces the robustness of the benchmark against temporary network or provider issues.
  2. Permanent Errors (e.g., 401, 404): These are not in the recoverable set, so they fall through to line 193 and raise a RuntimeError. The @retry decorator will catch this and retry the request 3 times, which is unnecessary and adds latency for errors that will never succeed on retry.

Consider refactoring to ensure transient errors are retried before being tolerated, and that permanent errors fail fast. A common pattern is to use a wrapper method for the toleration logic while keeping the retry logic on the actual API call.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 118baf1: transient 408/429/5xx failures are now retried before being downgraded to an empty per-sample response, while content-filter 400s are still tolerated immediately and invalid requests fail fast.

message,
)
return self._build_error_response(e)
raise RuntimeError(f"Error during API inference: {e}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When re-raising the exception as a RuntimeError, it is best practice to use from e to preserve the original exception's traceback. This is crucial for debugging the root cause of the failure.

Suggested change
raise RuntimeError(f"Error during API inference: {e}")
raise RuntimeError(f"Error during API inference: {e}") from e

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also handled in 118baf1: the fail-fast path now raises RuntimeError(... ) from e, and the unit test covers the non-retry invalid-request case.

@kubeedge-bot
Copy link
Copy Markdown
Collaborator

Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages.

The list of commits with invalid commit messages:

  • 0edbcd9 fix: tolerate recoverable cloud API failures in APIBasedLLM
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Joint inference pipeline crashes when cloud LLM returns HTTP 400 during API inference

2 participants