Skip to content

enhance(BA-6053): preserve str start_command form in model definitions#11625

Open
achimnol wants to merge 3 commits into
mainfrom
fix/preserve-str-start-command
Open

enhance(BA-6053): preserve str start_command form in model definitions#11625
achimnol wants to merge 3 commits into
mainfrom
fix/preserve-str-start-command

Conversation

@achimnol
Copy link
Copy Markdown
Member

@achimnol achimnol commented May 14, 2026

resolves #11624 (BA-6053)

Summary

Stop coercing string start_command values into argv lists via shlex.split in the API and config validators. The kernel runner already wraps str commands as [shell, "-c", str] at exec time (src/ai/backend/kernel/service.py:118-122), so passing the original string through gives users full shell semantics — line continuations, $VAR expansion, pipes — and removes the need to manually strip backslashes from copy-pasted multi-line vendor recipes (e.g. the vLLM recipes the issue cites).

What changed

  • common/config.py: dropped _normalize_start_command and the _coerce_start_command field validators on ModelServiceConfig / ModelServiceConfigDraft; widened start_command to str | list[str] | None. The trafaret schema model_definition_iv no longer routes strings through shlex.split either.
  • {model_path} substitution in ModelConfigDraft.to_resolved now dispatches on type — per-token .replace for the list form, in-place .replace for the string form.
  • ModelDefinition.with_args_appended appends preset ARGS as separate argv tokens for the list form, and shell-quotes them via shlex.join and concatenates with a single space for the string form so the same shell that runs the user's script parses them.
  • DTOs in common/dto/manager/v2/deployment/{request,types}.py are widened to str | list[str] | None.
  • GraphQL types (ModelServiceConfigGQL, ModelServiceConfigInputGQL) expose the field as JSON | None — same pattern already used for ModelMetadata.version since GraphQL cannot natively represent a scalar/list union.

Why pass-through (and not, e.g., shell-aware line-continuation stripping)?

Two reasonable options were on the table:

  1. Pre-strip \<newline> before shlex.split so multi-line strings still become argv lists.
  2. Keep the original string and let the kernel runner shell-eval it.

Option 1 fixes the immediate copy-paste paper cut but does not help users who actually want $VAR, pipes, &&, or globs in their command — which is the bigger ask in the vendor recipes the issue links. Option 2 (this PR) hands those semantics over to the user explicitly and relies on the kernel runner's already-existing [shell, "-c", str] wrap, so the change is concentrated in validators rather than runtime.

Trade-off: a string start_command is now eval'd by bash inside the container, so anything that flows into it from external input is shell-evaluated. The list form remains argv-only (no shell) and is the default for the variant baselines; users only opt into shell-eval by writing a string.

Scope

Per the issue: existing model definitions with list-style start_command are unaffected. The previously released backfill migration 8c1f7d3a9e2b (which wrapped legacy string values as one-item lists for the older narrowed schema) is intentionally left untouched.

Checklist

  • Mention to the original issue (Allow direct passing of str start_command in model definition #11624 / BA-6053)
  • API server-client counterparts — DTOs and GQL types widened in lockstep
  • Test case(s):
    • test_to_resolved_preserves_str_start_command — string survives validation and {model_path} substitution unchanged
    • test_model_service_config_accepts_str_start_command — strict type accepts the string form too
    • test_appends_args_to_str_start_command_with_shell_quotingwith_args_appended shell-quotes appended args
  • Documentation — field description= updated on ModelServiceConfig, the DTO, and both GraphQL types
  • Milestone metadata specifying the target backport version (set after review)
  • Installer updates (none required — no schema change)
  • Update of end-to-end CLI integration tests in ai.backend.test (N/A — covered by existing model-definition flow tests)

Test plan

  • pants check on the 5 changed Python files (mypy clean)
  • pants fmt / pants fix / pants lint clean
  • pants test tests/unit/common/test_config.py passes
  • pants test tests/unit/agent/test_model_service_start_command.py passes (no regression on the agent-side parser)
  • Manual: paste a multi-line vLLM recipe (with trailing \) into the WebUI's start_command field, deploy, and verify the service starts and serves requests

📚 Documentation preview 📚: https://sorna--11625.org.readthedocs.build/en/11625/


📚 Documentation preview 📚: https://sorna-ko--11625.org.readthedocs.build/ko/11625/

@achimnol achimnol requested a review from a team as a code owner May 14, 2026 22:50
Copilot AI review requested due to automatic review settings May 14, 2026 22:50
@github-actions github-actions Bot added the size:L 100~500 LoC label May 14, 2026
@github-actions github-actions Bot added comp:manager Related to Manager component comp:common Related to Common component labels May 14, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR resolves #11624 (BA-6053) by preserving start_command as a str when provided as a string in model definitions, instead of coercing it into an argv list via shlex.split. This aligns validation/DTO/GQL layers with the kernel runner’s existing behavior of wrapping string commands as [shell, "-c", str], enabling full shell semantics (including multi-line \ continuations) for users who opt into the string form.

Changes:

  • Stop normalizing string start_command values into argv tokens in config/API validation; widen types to str | list[str] | None.
  • Update {model_path} substitution to handle both list and string forms appropriately.
  • Update arg-appending to preserve semantics: argv concatenation for list form; shlex.join + string concatenation for string form, with new unit tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/ai/backend/common/config.py Removes shlex.split coercion, widens start_command type, updates placeholder substitution and arg-appending behavior.
src/ai/backend/common/dto/manager/v2/deployment/request.py Widens request DTO start_command to `str
src/ai/backend/common/dto/manager/v2/deployment/types.py Widens response DTO start_command and updates field description to document list-vs-string semantics.
src/ai/backend/manager/api/gql/deployment/types/revision.py Exposes start_command as `JSON
tests/unit/common/test_config.py Adds unit tests ensuring string preservation, placeholder substitution for strings, and shell-quoted arg appending for string form.
changes/11624.enhance.md Adds changelog entry describing the behavioral change and motivation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions Bot added the area:docs Documentations label May 14, 2026
@achimnol achimnol force-pushed the fix/preserve-str-start-command branch from 71e8974 to e7c0d9b Compare May 14, 2026 23:12
achimnol and others added 3 commits May 17, 2026 16:48
Stop coercing string ``start_command`` values into argv lists via
``shlex.split`` in the API and config validators. The kernel runner
already wraps ``str`` commands as ``[shell, "-c", str]`` at exec time
(``kernel/service.py``), so passing the original string through gives
users full shell semantics — line continuations, ``$VAR`` expansion,
pipes — and removes the need to manually strip backslashes from
copy-pasted multi-line vendor recipes (e.g. vLLM).

- ``ModelServiceConfig.start_command`` and the draft mirror now accept
  ``str | list[str] | None``; the previous ``shlex.split`` validator
  is removed.
- ``{model_path}`` substitution and ``with_args_appended`` handle both
  forms; preset ARGS appended to a string command are shell-quoted via
  ``shlex.join``.
- DTOs in ``common/dto/manager/v2/deployment`` are widened to match.
  The GraphQL types expose the field as ``JSON | None`` (the same
  pattern already used for ``ModelMetadata.version``) since GraphQL
  cannot natively represent a scalar/list union.

Existing list-form definitions are unaffected.

resolves #11624
11624.enhance.md -> 11625.enhance.md

Co-authored-by: octodog <mu001@lablup.com>
Co-authored-by: octodog <mu001@lablup.com>
@achimnol achimnol force-pushed the fix/preserve-str-start-command branch from e7c0d9b to 8a339f5 Compare May 17, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:docs Documentations comp:common Related to Common component comp:manager Related to Manager component size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow direct passing of str start_command in model definition

2 participants