feat(BA-5565): Unify AppProxy config generation with tomlkit in-place modification#10746
feat(BA-5565): Unify AppProxy config generation with tomlkit in-place modification#10746
Conversation
- Add TRAEFIK to FrontendMode enum - Configure coordinator with enable_traefik and traefik.etcd settings - Configure worker with traefik section (api_port, etcd, port_proxy) when frontend_mode is traefik Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate group_data/all.py from backend.ai-installer with enterprise-only items (graylog, zabbix, license hwinfo) removed. Provides host.data defaults for SSH, OS, Docker, Python, and container registry config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p 2) - Migrate InventoryBuilder from backend.ai-installer (1848 → 1524 lines) - Update imports: ai.backend.pyinfra → ai.backend.install.pyinfra - Remove enterprise-only logic (license, control_panel, fasttrack, harbor, rtun) - Keep enterprise config stubs with enabled=False in return dicts - Fix type inconsistency in group_data: ssh_port, bai_user_id, bai_user_group_id now use int() wrapper for os.getenv() calls - Add missing type annotations for lint compliance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Step 3) Create DevInventoryBuilder that targets @Local with Docker Compose halfstack ports matching DevContext.hydrate_install_info(). Provides all host.data attributes and services dict that deploy scripts expect. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add inventory_local.py as pyinfra CLI entry point for local dev - Fix bai_version default to use actual version instead of "dev" - Verified: deploy scripts successfully read all host.data attributes from DevInventoryBuilder (fails only on macOS subprocess, not inventory) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add shared_defaults.py with centralized port/version/credential constants - Refactor DevInventoryBuilder to use shared_defaults instead of hardcoded values - Constants are shared between DevContext (TUI) and DevInventoryBuilder (pyinfra) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 18 unit tests for DevInventoryBuilder and shared_defaults - Remove follow_imports="skip" from mypy overrides (pyinfra is now a dependency) - Remove import-not-found from disabled error codes - Fix stale import path in package_manager.py - Refactor APPPROXY_PORTS to typed constants for mypy compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EtcdConfig: hostname/port → advertised_client_ip/advertised_client_port - ManagerConfig, WebserverConfig, HiveGatewayConfig: remove non-existent hostname field - StorageProxyConfig: hostname → removed, client_port → port - Update test to use correct field (advertised_client_port) - Add changelog for PR #10738 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n macOS Workaround for gevent/gevent#2169: monkey.patch_all() sets subprocess._fork_exec to None on macOS + Python 3.13. This wrapper saves and restores _fork_exec before running pyinfra CLI. Can be removed once gevent fixes the issue upstream. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tations - Add pyinfra~=3.7 and passlib~=1.7.4 to requirements.txt (was missing from commit) - Regenerate python.lock with new dependencies - Fix dict → dict[str, object] type annotations in tests for mypy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create config_gen/appproxy.py with CoordinatorParams, WorkerParams, build_coordinator_config(), build_worker_config() - Refactor context.py:configure_appproxy() to use shared module (~150 lines of tomlkit manipulation → ~30 lines of param construction) - Frontend mode logic (port/wildcard/traefik) now lives in one place - Add 16 unit tests covering all frontend modes and edge cases - Import FrontendMode from types.py (single definition) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change from build_*_config() (returns new dict) to apply_*_config() (modifies existing tomlkit document in-place). This preserves comments and structure from sample.toml files, which are auto-generated by `backend.ai mgr config generate-sample`. - apply_coordinator_config(doc, params) modifies coordinator toml doc - apply_worker_config(doc, params) modifies worker toml doc - context.py loads sample, applies params, writes back - Tests use sample TOML strings to verify comment preservation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors AppProxy TOML configuration generation into a shared config_gen/appproxy.py module that mutates tomlkit documents in-place (to preserve comments/structure), and also introduces/extends pyinfra inventory tooling (defaults, dev inventory, and a unified inventory builder) along with associated tests.
Changes:
- Add shared AppProxy config generator (
CoordinatorParams/WorkerParams+apply_*functions) and refactorcontext.py:configure_appproxy()to use it. - Add pyinfra inventory infrastructure (shared defaults, group_data defaults, local/dev inventory, unified InventoryBuilder) and unit tests.
- Add dependencies (
pyinfra,passlib) and adjust mypy configuration for pyinfra modules.
Reviewed changes
Copilot reviewed 16 out of 20 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/install/pyinfra/inventory/test_dev_inventory.py | Adds unit tests for DevInventoryBuilder and shared defaults. |
| tests/unit/install/pyinfra/inventory/BUILD | Adds Pants python_tests target for inventory unit tests. |
| tests/unit/install/config_gen/test_appproxy.py | Adds unit tests for shared AppProxy config generation and TOML roundtrip. |
| tests/unit/install/config_gen/BUILD | Adds Pants python_tests target for config_gen unit tests. |
| src/ai/backend/install/types.py | Extends FrontendMode enum with TRAEFIK. |
| src/ai/backend/install/pyinfra/os_packages/package_manager.py | Fixes import path for platform detection utilities. |
| src/ai/backend/install/pyinfra/inventory/shared_defaults.py | Centralizes shared ports/versions/dev defaults for installer + pyinfra. |
| src/ai/backend/install/pyinfra/inventory/run_local.py | Adds local pyinfra runner wrapper with gevent/macOS workaround. |
| src/ai/backend/install/pyinfra/inventory/inventory_local.py | Adds a local inventory entrypoint for pyinfra CLI. |
| src/ai/backend/install/pyinfra/inventory/group_data/all.py | Defines pyinfra host.data defaults via env vars. |
| src/ai/backend/install/pyinfra/inventory/dev_inventory.py | Adds DevInventoryBuilder for local @local deployments. |
| src/ai/backend/install/pyinfra/inventory/builder.py | Adds unified InventoryBuilder for single-node and HA inventories. |
| src/ai/backend/install/context.py | Refactors AppProxy configuration to call shared TOML mutators. |
| src/ai/backend/install/config_gen/appproxy.py | Introduces shared AppProxy TOML in-place mutation logic (incl. Traefik mode). |
| src/ai/backend/install/pyinfra/inventory/init.py | Package init (no functional changes shown). |
| src/ai/backend/install/config_gen/init.py | Package init (no functional changes shown). |
| requirements.txt | Adds pyinfra and passlib runtime dependencies. |
| python.lock | Locks new dependencies and transitive updates. |
| pyproject.toml | Updates mypy overrides for pyinfra modules (removes follow_imports="skip"). |
| changes/10738.feature.md | Adds a changelog entry for pyinfra inventory migration (file naming may be inconsistent with this PR). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| enable_traefik: bool = False | ||
| etcd_host: str = "127.0.0.1" | ||
| etcd_port: int = 8120 | ||
| etcd_namespace: str = "backend" |
There was a problem hiding this comment.
CoordinatorParams.etcd_namespace defaults to "backend", but Traefik integration in AppProxy expects the etcd namespace to be "traefik" (see pyinfra template src/ai/backend/install/pyinfra/deploy/cores/appproxy/templates/app-proxy-coordinator.toml.j2:63-66 and coordinator config model src/ai/backend/appproxy/coordinator/config.py:169+). With the current default, configure_appproxy() will emit a coordinator TOML that points Traefik at the wrong etcd namespace unless every caller overrides it. Consider changing the default to "traefik" (or renaming the field to make the intent explicit).
| etcd_namespace: str = "backend" | |
| etcd_namespace: str = "traefik" |
| # Add traefik section | ||
| traefik_table = tomlkit.table() | ||
| traefik_table["api_port"] = params.traefik_api_port | ||
| traefik_table["last_used_time_marker_directory"] = params.traefik_last_used_dir | ||
| traefik_etcd_table = tomlkit.table() | ||
| traefik_etcd_table["namespace"] = params.traefik_etcd_namespace | ||
| traefik_etcd_table["addr"] = _make_inline_table({ | ||
| "host": params.traefik_etcd_host, | ||
| "port": params.traefik_etcd_port, | ||
| }) | ||
| traefik_table["etcd"] = traefik_etcd_table | ||
| port_proxy_table = tomlkit.table() | ||
| port_proxy_table["advertised_host"] = params.port_proxy_advertised_host | ||
| port_proxy_table["bind_port_range"] = [ | ||
| params.port_proxy_range_start, | ||
| params.port_proxy_range_end, | ||
| ] | ||
| traefik_table["port_proxy"] = port_proxy_table | ||
| doc["proxy_worker"]["traefik"] = traefik_table # type: ignore[index] |
There was a problem hiding this comment.
In TRAEFIK mode, the generated TOML keys don’t match the AppProxy worker schema / existing templates: Traefik port proxy uses port_range (not bind_port_range), and the traefik table requires a frontend_mode field (see src/ai/backend/install/pyinfra/deploy/cores/appproxy/templates/app-proxy-worker.toml.j2:74-90 and src/ai/backend/appproxy/worker/config.py TraefikConfig). As-is, the produced config is likely to fail validation at runtime. Update the traefik section generation to follow the expected keys/fields.
| # Configure based on frontend_mode | ||
| match params.frontend_mode: | ||
| case FrontendMode.PORT: | ||
| doc["proxy_worker"]["port_proxy"]["advertised_host"] = params.port_proxy_advertised_host # type: ignore[index] |
There was a problem hiding this comment.
PORT mode ignores port_proxy_range_start / port_proxy_range_end even though they’re part of WorkerParams; only advertised_host is updated. If callers customize the port range, the output TOML will silently keep whatever range is in the input document. Consider setting bind_port_range (and any corresponding advertised range field, if present) from the params in PORT mode as well.
| doc["proxy_worker"]["port_proxy"]["advertised_host"] = params.port_proxy_advertised_host # type: ignore[index] | |
| port_proxy = doc["proxy_worker"]["port_proxy"] # type: ignore[index] | |
| port_proxy["advertised_host"] = params.port_proxy_advertised_host | |
| # Update bind port range from params | |
| port_proxy["bind_port_range"] = _make_inline_table({ | |
| "start": params.port_proxy_range_start, | |
| "end": params.port_proxy_range_end, | |
| }) | |
| # Optionally update advertised port range if params provide it | |
| advertised_range_start = getattr(params, "port_proxy_advertised_range_start", None) | |
| advertised_range_end = getattr(params, "port_proxy_advertised_range_end", None) | |
| if advertised_range_start is not None and advertised_range_end is not None: | |
| port_proxy["advertised_port_range"] = _make_inline_table({ | |
| "start": advertised_range_start, | |
| "end": advertised_range_end, | |
| }) |
| # Configure based on frontend_mode | ||
| match params.frontend_mode: | ||
| case FrontendMode.PORT: | ||
| doc["proxy_worker"]["port_proxy"]["advertised_host"] = params.port_proxy_advertised_host # type: ignore[index] | ||
| case FrontendMode.WILDCARD: | ||
| # Remove port_proxy section | ||
| if "port_proxy" in doc["proxy_worker"]: # type: ignore[operator] | ||
| del doc["proxy_worker"]["port_proxy"] # type: ignore[union-attr] | ||
|
|
||
| # Override api_advertised_addr | ||
| doc["proxy_worker"]["api_advertised_addr"] = _make_inline_table( # type: ignore[index] | ||
| {"host": params.api_advertised_host, "port": params.wildcard_advertised_port} | ||
| ) | ||
|
|
||
| # Add wildcard_domain section | ||
| if params.wildcard_domain: | ||
| wildcard_table = tomlkit.table() | ||
| wildcard_table["domain"] = params.wildcard_domain | ||
| wildcard_table["bind_addr"] = _make_inline_table({ | ||
| "host": params.wildcard_bind_host, | ||
| "port": params.wildcard_bind_port, | ||
| }) | ||
| wildcard_table["advertised_port"] = params.wildcard_advertised_port | ||
| doc["proxy_worker"]["wildcard_domain"] = wildcard_table # type: ignore[index] | ||
| case FrontendMode.TRAEFIK: | ||
| # Remove port_proxy section | ||
| if "port_proxy" in doc["proxy_worker"]: # type: ignore[operator] | ||
| del doc["proxy_worker"]["port_proxy"] # type: ignore[union-attr] | ||
|
|
There was a problem hiding this comment.
apply_worker_config() doesn’t clean up mode-specific sections when switching modes or when optional params are unset (e.g., WILDCARD mode with wildcard_domain=None leaves any existing proxy_worker.wildcard_domain table untouched; switching back to PORT leaves any existing proxy_worker.traefik table untouched). Since this module claims to modify existing TOML documents in-place, this can produce configs with conflicting/stale sections. Consider explicitly deleting wildcard_domain / traefik tables when they’re not applicable to the selected mode, and re-creating port_proxy when returning to PORT mode.
| worker_inference_advertised_hostname=h, | ||
| worker_inference_port=10203, | ||
| worker_inference_app_port_start=10601, | ||
| worker_inference_app_port_end=10700, |
There was a problem hiding this comment.
DevInventoryBuilder hard-codes the inference worker port and port range (10203 / 10601-10700) instead of using the shared constants already defined in shared_defaults.py (APPPROXY_WORKER_INFERENCE_PORT / APPPROXY_WORKER_INFERENCE_RANGE). This undermines the goal of centralizing magic numbers and can drift if defaults change. Prefer wiring these fields from the shared_defaults constants and/or DevInventoryBuilder.PORTS.
| registry_port = os.getenv("PYINFRA_CONTAINER_REGISTRY_PORT", "7080") | ||
| registry_username = os.getenv("PYINFRA_CONTAINER_REGISTRY_USERNAME", "bai") | ||
| registry_projects = os.getenv("PYINFRA_CONTAINER_REGISTRY_PROJECTS", "bai,bai-user") | ||
| registry_password = os.getenv("PYINFRA_CONTAINER_REGISTRY_PASSWORD", "lY0B=op3") |
There was a problem hiding this comment.
registry_password defaults to a non-empty hardcoded value. This value is consumed by deploy scripts (e.g. pyinfra/deploy/cores/manager/deploy.py reads host.data.registry_password) and could accidentally be used in non-dev deployments, weakening security by providing a predictable default credential. Consider defaulting to an empty string and emitting a warning/error when it’s not explicitly set via env/inventory (similar to the PYINFRA_SUDO_PASSWORD warning).
| registry_password = os.getenv("PYINFRA_CONTAINER_REGISTRY_PASSWORD", "lY0B=op3") | |
| registry_password = os.getenv("PYINFRA_CONTAINER_REGISTRY_PASSWORD", "") | |
| if not registry_password: | |
| logger.warning( | |
| "PYINFRA_CONTAINER_REGISTRY_PASSWORD is not set; please configure a secure " | |
| "registry password via environment variable or inventory." | |
| ) |
| _original_fork_exec = subprocess._fork_exec | ||
|
|
||
| from gevent import monkey # noqa: E402 | ||
|
|
||
| monkey.patch_all() | ||
|
|
||
| # Restore _fork_exec after monkey patching | ||
| subprocess._fork_exec = _original_fork_exec |
There was a problem hiding this comment.
This script unconditionally accesses the private subprocess._fork_exec attribute. On platforms where it doesn’t exist (e.g., Windows) or if Python internals change, this will crash before running pyinfra. Consider guarding with hasattr(subprocess, "_fork_exec") and/or restricting the workaround to affected platforms/versions (macOS + Python 3.13), falling back to a normal cli() run otherwise.
| _original_fork_exec = subprocess._fork_exec | |
| from gevent import monkey # noqa: E402 | |
| monkey.patch_all() | |
| # Restore _fork_exec after monkey patching | |
| subprocess._fork_exec = _original_fork_exec | |
| needs_original_fork_exec_workaround = ( | |
| sys.platform == "darwin" | |
| and sys.version_info >= (3, 13) | |
| and hasattr(subprocess, "_fork_exec") | |
| ) | |
| if needs_original_fork_exec_workaround: | |
| _original_fork_exec = subprocess._fork_exec | |
| from gevent import monkey # noqa: E402 | |
| monkey.patch_all() | |
| # Restore _fork_exec after monkey patching, if we saved it | |
| if needs_original_fork_exec_workaround: | |
| subprocess._fork_exec = _original_fork_exec |
| @@ -0,0 +1 @@ | |||
| Migrate pyinfra inventory system from backend.ai-installer with dev inventory support | |||
There was a problem hiding this comment.
The changelog entry filename changes/10738.feature.md appears to reference a different PR/issue number than this PR (BA-5565). If your project expects one changeset file per PR, this will misattribute release notes. Consider renaming the changes file to match the current PR number / convention used in the repo.
- Remove type: ignore comments from config_gen/appproxy.py (handled by mypy overrides) - Add index/operator to mypy disable_error_code for ai.backend.install.* - Add mypy override for tests.unit.install.* (tomlkit dict-like access) - Fix dict[str, object] → dict[str, Any] in test_dev_inventory.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove all unused type: ignore comments (now handled by mypy overrides) - Fix str | None → str arg-type for appproxy secrets (add `or ""` fallback) - Fix dict → dict[str, object] type param in config_gen/agent.py - Fix ConsoleRenderable → str in context.py:462 - Auto-format with ruff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cripts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Extract AppProxy config generation into a shared module (
config_gen/appproxy.py) that modifies existing tomlkit documents in-place, preserving comments and structure from auto-generated sample.toml files.Changes
New:
src/ai/backend/install/config_gen/appproxy.pyCoordinatorParams/WorkerParams— frozen dataclasses with all config parametersapply_coordinator_config(doc, params)— modifies coordinator tomlkit document in-placeapply_worker_config(doc, params)— modifies worker tomlkit document in-placeRefactored:
context.py:configure_appproxy()apply_*callWhy tomlkit in-place (not Jinja2)?
sample.toml files are auto-generated by
backend.ai mgr config generate-sample. Using tomlkit to modify them in-place means config schema changes are automatically picked up. Jinja2 templates would need manual sync with every schema change.Test plan
Related
🤖 Generated with Claude Code