You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add startup health checks to validate inference backend before serving traffic (#24)
* Add startup health checks to validate inference backend before serving traffic
Gated behind start up check for open ai chat completion compatibility, runs three sequential checks against the
backend before the proxy binds its TCP listener: model existence via /v1/models,
non-streaming chat completions with tools, and streaming chat completions with
tools. Validates tool call argument JSON to catch inference engine bugs (e.g.
vLLM producing malformed arguments in streaming finish chunks). Only retries
on transient errors (connection refused, 503); fails fast on validation errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0 commit comments