You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(BA-5528): generalize chat client and address review comments
Address review comments from #11344:
- Drop chat_dto.py and switch the SDK to plain dict[str, Any] for both
request and response, so it doesn't try to track every runtime
variant's extension fields (vllm reasoning_content, tool_calls, etc.)
- Rename DeploymentChatClient -> InferenceChatClient and decouple it
from the vllm runtime variant: works against any OpenAI-compatible
endpoint (vllm, tgi, sglang, nim) and exposes a configurable path
plus a list_models helper
- Rename the cached api key field vllm_api_key -> api_key throughout
the cache schema, CLI options, show output, and tests
- chat-config set: --token is now optional and pairs with a new
--no-token flag for deployments started without --api-key. The
served model name is auto-discovered via GET /v1/models (option B
from the discussion) so users no longer have to know it
- chat: replace the local _abort helper with click.ClickException,
validate --max-tokens via click.IntRange(min=1) and the sampling
knobs via click.FloatRange, and add --top-p, --frequency-penalty,
--presence-penalty, --seed, --stop options
- inference_chat client: add ClientTimeout (sock_connect/sock_read)
to the owned aiohttp session and normalize trailing slashes when
building the chat / models URL
- cache loader: tolerate corrupted JSON (OSError/JSONDecodeError) and
skip individual malformed entries instead of aborting the whole load
- tests: drop redundant atomic-write/permission-reset cases, add
loader resilience cases, and shorten the changelog entry
Add `./bai deployment chat`and `./bai deployment chat-config` v2 CLI commands for one-shot OpenAI-compatible chat with deployed vLLM models, including a local cache (`~/.backend.ai/deployment_chat.json`, `0600`) of per-deployment endpoint URLs and API keys.
1
+
Add `./bai deployment chat` for one-shot OpenAI-compatible chat against deployed inference services.
0 commit comments