You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AppProxyClient in src/ai/backend/manager/clients/appproxy/client.py has inconsistent error handling across its methods. Only fetch_status maps low-level aiohttp exceptions to the AppProxyConnectionError / AppProxyResponseError domain exceptions defined in manager/errors/appproxy.py. The four endpoint methods — create_endpoint, create_endpoints_bulk, delete_endpoint, delete_endpoints_bulk — leak raw aiohttp exceptions or, in the case of delete_endpoint, silently swallow non-2xx responses entirely.
This was flagged during review of #11328 (BA-1929) as an out-of-scope but real robustness gap.
Concrete defects
delete_endpoint uses async with ... as resp: pass with no resp.raise_for_status() and no body read. A 4xx/5xx response from the coordinator is silently dropped — the manager logs and returns successfully, even though the deletion never happened.
create_endpoint, create_endpoints_bulk, delete_endpoints_bulk call resp.raise_for_status() followed by await resp.json(). Both can raise raw aiohttp.ClientResponseError / aiohttp.ContentTypeError, neither of which is wrapped into a BackendAIError subclass. Callers in manager/sokovan/deployment/executor.py then see a non-domain exception, and the eventual DeploymentExecutionError does not carry the AppProxy domain.
Apply the same try / except (ClientConnectorError, ClientResponseError, ContentTypeError, JSONDecodeError) pattern that fetch_status uses to all four endpoint methods, and add raise_for_status() to delete_endpoint. Where possible, attach the parsed coordinator error body to AppProxyResponseError.extra_data so the upstream JSON error survives the translation.
Out of scope
Re-architecting the resilience policy or retry behavior.
Changing the public signatures of the four methods.
Summary
AppProxyClientinsrc/ai/backend/manager/clients/appproxy/client.pyhas inconsistent error handling across its methods. Onlyfetch_statusmaps low-levelaiohttpexceptions to theAppProxyConnectionError/AppProxyResponseErrordomain exceptions defined inmanager/errors/appproxy.py. The four endpoint methods —create_endpoint,create_endpoints_bulk,delete_endpoint,delete_endpoints_bulk— leak raw aiohttp exceptions or, in the case ofdelete_endpoint, silently swallow non-2xx responses entirely.This was flagged during review of #11328 (BA-1929) as an out-of-scope but real robustness gap.
Concrete defects
delete_endpointusesasync with ... as resp: passwith noresp.raise_for_status()and no body read. A 4xx/5xx response from the coordinator is silently dropped — the manager logs and returns successfully, even though the deletion never happened.create_endpoint,create_endpoints_bulk,delete_endpoints_bulkcallresp.raise_for_status()followed byawait resp.json(). Both can raise rawaiohttp.ClientResponseError/aiohttp.ContentTypeError, neither of which is wrapped into aBackendAIErrorsubclass. Callers inmanager/sokovan/deployment/executor.pythen see a non-domain exception, and the eventualDeploymentExecutionErrordoes not carry the AppProxy domain.BackendAIErrorJSON body returned by the coordinator on validation failures (after fix(BA-1929): Return JSON instead of HTML for coordinator API errors #11329) is not preserved through to the caller. The error gets re-raised as an aiohttp exception with only the status code.Proposed fix
Apply the same
try / except (ClientConnectorError, ClientResponseError, ContentTypeError, JSONDecodeError)pattern thatfetch_statususes to all four endpoint methods, and addraise_for_status()todelete_endpoint. Where possible, attach the parsed coordinator error body toAppProxyResponseError.extra_dataso the upstream JSON error survives the translation.Out of scope
Related
Acceptheader fix (BA-1929)