Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
a0fdfd1
feat(BA-5528): add deployment chat CLI for vLLM-backed model services
jopemachine Apr 27, 2026
2365209
changelog: rename news fragment to PR number
jopemachine Apr 27, 2026
9eb1285
fix(BA-5528): format BackendAPIError in deployment chat/chat-config
jopemachine Apr 27, 2026
6fdfbf9
refactor(BA-5528): drop --endpoint-url from chat-config set
jopemachine Apr 27, 2026
414aaf8
refactor(BA-5528): generalize chat client and address review comments
jopemachine Apr 27, 2026
87c32c9
refactor(BA-5528): replace per-flag sampling args with --params JSON …
jopemachine Apr 27, 2026
8f588a1
refactor(BA-5528): use existing JSONParamType for chat --params
jopemachine Apr 27, 2026
7c74d2e
refactor(BA-5528): default --params to "{}" for chat command
jopemachine Apr 27, 2026
0828b82
refactor(BA-5528): split chat cache token, switch entry to BaseModel,…
jopemachine Apr 27, 2026
a4e7d15
refactor(BA-5528): split chat tokens into deployment_chat_config; bot…
jopemachine Apr 27, 2026
d58d145
refactor(BA-5528): apply review feedback for chat CLI
jopemachine Apr 28, 2026
60589d3
refactor(BA-5528): merge load_chat_cache/config try-except blocks
jopemachine Apr 28, 2026
8e05a1a
refactor(BA-5528): regroup chat CLI under deployment/chat package
jopemachine Apr 28, 2026
1bd6a8b
refactor(BA-5528): align DeploymentChatClient session lifecycle, add …
jopemachine Apr 28, 2026
85d6961
refactor(BA-5528): drop /v1/models auto-discover, consolidate cache save
jopemachine Apr 28, 2026
63a551f
refactor(BA-5528): drop schema_version handling from chat storage
jopemachine Apr 28, 2026
391cb28
refactor(BA-5528): collapse DeploymentChatClient.create into __init__
jopemachine Apr 28, 2026
79f1946
refactor(BA-5528): collapse _post/_get into a single _request
jopemachine Apr 28, 2026
28ee34b
refactor(BA-5528): rename DeploymentChatCache.upsert to set
jopemachine Apr 28, 2026
b15584e
refactor(BA-5528): use common.json.load_json, drop show-all, rename p…
jopemachine Apr 28, 2026
c93b660
refactor(BA-5528): rename _atomic_write to _atomic_write_text
jopemachine Apr 28, 2026
4efa590
refactor(BA-5528): move print_summary onto entry, split clear, drop r…
jopemachine Apr 28, 2026
195ed4d
refactor(BA-5528): drop frozen=True on DeploymentChatCacheEntry
jopemachine Apr 28, 2026
73de1a9
feat(BA-5528): expose --path on chat command for non-default routes
jopemachine Apr 28, 2026
11084bb
refactor(BA-5528): formatter class, inline writes, rename clear-token…
jopemachine Apr 28, 2026
3ad8eb9
refactor(BA-5528): add cache TTL, fixturize chat type tests
jopemachine Apr 28, 2026
516433c
refactor(BA-5528): move TTL check onto DeploymentChatCacheEntry.is_fresh
jopemachine Apr 28, 2026
d8c066f
refactor(BA-5528): drop resolved_key from chat-config set
jopemachine Apr 28, 2026
7ad8eff
refactor(BA-5528): formatter rename + drop path arg + load classmethods
jopemachine Apr 28, 2026
d668291
refactor(BA-5528): move DeploymentChatAuthError to client/v2/exceptions
jopemachine Apr 28, 2026
5ed3447
refactor(BA-5528): mask inside print_summary, move mask_token to form…
jopemachine Apr 28, 2026
473299b
refactor(BA-5528): drop 0600 enforcement on chat cache file
jopemachine Apr 28, 2026
59b0d72
fix(BA-5528): allow chat-config set while deployment is still provisi…
jopemachine Apr 28, 2026
c45ea42
refactor(BA-5528): move save onto types, rename is_fresh/connection
jopemachine Apr 28, 2026
0c67cb8
fix(BA-5528): clear cached token on chat 401/403, type chat body, pol…
jopemachine Apr 29, 2026
739506f
refactor(BA-5528): standardize on "token" naming, fixed mask placehol…
jopemachine Apr 29, 2026
f0716e3
refactor(BA-5528): align chat-config persistence with existing CLI cr…
jopemachine Apr 29, 2026
6585828
refactor(BA-5528): collapse SDK response handling, drop --path, mock …
jopemachine Apr 29, 2026
10c77d4
refactor(BA-5528): hoist deployment-endpoint HTTP plumbing into Backe…
jopemachine Apr 29, 2026
42b6920
refactor(BA-5528): rename to pop/pop_token, inline mask placeholder, …
jopemachine Apr 29, 2026
5ca7eac
refactor(BA-5528): group chat state under ~/.backend.ai/deployment_ch…
jopemachine Apr 29, 2026
678d190
feat(BA-5528): auto-derive default model from /v1/models when --model…
jopemachine Apr 29, 2026
b43bd18
refactor(BA-5528): split user-set vs auto-cached model, group config …
jopemachine Apr 29, 2026
296c963
refactor(BA-5528): use defaultdict for chat config so set_*() collaps…
jopemachine Apr 29, 2026
e280191
refactor(BA-5528): drop redundant defaultdict comment block on chat c…
jopemachine Apr 29, 2026
ced7acd
refactor(BA-5528): move chat data/DTO types under common package
jopemachine Apr 29, 2026
d8e0102
revert(BA-5528): roll back chat data types to client/cli, fold storag…
jopemachine Apr 30, 2026
50907da
fix(BA-5528): rename `chat` CLI argument from `content` to `message`
jopemachine Apr 30, 2026
57280c3
fix(BA-5528): make `chat-config show` only display the user-managed c…
jopemachine Apr 30, 2026
e5ba0b2
feat(BA-5528): split cache management into dedicated `chat-cache` com…
jopemachine Apr 30, 2026
cadeea7
refactor(BA-5528): use `DeploymentID` NewType for chat deployment_id …
jopemachine Apr 30, 2026
939d3a1
feat(BA-5903): persist deployment chat history and replay as request …
jopemachine Apr 30, 2026
5e9842c
fix: CI
jopemachine May 4, 2026
eae4cdf
refactor(BA-5528): consolidate OpenAI-compat extra="allow" and split …
jopemachine May 6, 2026
bc320bb
refactor(BA-5528): rename misleading `pop*` methods on chat cache/config
jopemachine May 6, 2026
d1608cb
refactor(BA-5528): extract `_parse_response` from `BackendAIAppProxyC…
jopemachine May 6, 2026
3e59aaf
test(BA-5826): mock preview_query_template instead of query_instant
jopemachine May 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changes/11344.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add `./bai deployment chat` for one-shot OpenAI-compatible chat against deployed inference services.
5 changes: 5 additions & 0 deletions src/ai/backend/client/cli/v2/deployment/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from .access_token import access_token
from .auto_scaling_rule import auto_scaling_rule
from .chat import chat, chat_cache, chat_config, chat_history
from .commands import deployment as deployment
from .options import options
from .policy import policy
Expand All @@ -15,5 +16,9 @@
deployment.add_command(access_token)
deployment.add_command(auto_scaling_rule)
deployment.add_command(options)
deployment.add_command(chat)
deployment.add_command(chat_config)
deployment.add_command(chat_cache)
deployment.add_command(chat_history)

__all__ = ("deployment",)
17 changes: 17 additions & 0 deletions src/ai/backend/client/cli/v2/deployment/chat/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""``./bai deployment chat``, ``chat-config``, ``chat-cache``, and ``chat-history`` CLI commands.

Submodules:
- :mod:`commands` β€” Click command/group definitions.
- :mod:`types` β€” Pydantic models for the on-disk cache, config, and history,
including the ``.load()``/``.save()`` classmethods that wire them to
``~/.backend.ai/deployment_chat/*.json``.
- :mod:`utils` β€” file paths and shared JSON I/O helpers.
- :mod:`formatter` β€” display helpers (``mask_token``, ``DeploymentChatFormatter``).

OpenAI-compat wire DTOs live in :mod:`ai.backend.common.dto.clients.openai_compat`
so they can be reused by any backend.ai component.
"""

from .commands import chat, chat_cache, chat_config, chat_history

__all__ = ("chat", "chat_cache", "chat_config", "chat_history")
Loading
Loading