feat(BA-5563): Support macOS/dev mode in pyinfra deploy with skip_systemd#10740
feat(BA-5563): Support macOS/dev mode in pyinfra deploy with skip_systemd#10740
Conversation
- Add TRAEFIK to FrontendMode enum - Configure coordinator with enable_traefik and traefik.etcd settings - Configure worker with traefik section (api_port, etcd, port_proxy) when frontend_mode is traefik Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate group_data/all.py from backend.ai-installer with enterprise-only items (graylog, zabbix, license hwinfo) removed. Provides host.data defaults for SSH, OS, Docker, Python, and container registry config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p 2) - Migrate InventoryBuilder from backend.ai-installer (1848 → 1524 lines) - Update imports: ai.backend.pyinfra → ai.backend.install.pyinfra - Remove enterprise-only logic (license, control_panel, fasttrack, harbor, rtun) - Keep enterprise config stubs with enabled=False in return dicts - Fix type inconsistency in group_data: ssh_port, bai_user_id, bai_user_group_id now use int() wrapper for os.getenv() calls - Add missing type annotations for lint compliance Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Step 3) Create DevInventoryBuilder that targets @Local with Docker Compose halfstack ports matching DevContext.hydrate_install_info(). Provides all host.data attributes and services dict that deploy scripts expect. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add inventory_local.py as pyinfra CLI entry point for local dev - Fix bai_version default to use actual version instead of "dev" - Verified: deploy scripts successfully read all host.data attributes from DevInventoryBuilder (fails only on macOS subprocess, not inventory) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add shared_defaults.py with centralized port/version/credential constants - Refactor DevInventoryBuilder to use shared_defaults instead of hardcoded values - Constants are shared between DevContext (TUI) and DevInventoryBuilder (pyinfra) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 18 unit tests for DevInventoryBuilder and shared_defaults - Remove follow_imports="skip" from mypy overrides (pyinfra is now a dependency) - Remove import-not-found from disabled error codes - Fix stale import path in package_manager.py - Refactor APPPROXY_PORTS to typed constants for mypy compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EtcdConfig: hostname/port → advertised_client_ip/advertised_client_port - ManagerConfig, WebserverConfig, HiveGatewayConfig: remove non-existent hostname field - StorageProxyConfig: hostname → removed, client_port → port - Update test to use correct field (advertised_client_port) - Add changelog for PR #10738 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n macOS Workaround for gevent/gevent#2169: monkey.patch_all() sets subprocess._fork_exec to None on macOS + Python 3.13. This wrapper saves and restores _fork_exec before running pyinfra CLI. Can be removed once gevent fixes the issue upstream. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tations - Add pyinfra~=3.7 and passlib~=1.7.4 to requirements.txt (was missing from commit) - Regenerate python.lock with new dependencies - Fix dict → dict[str, object] type annotations in tests for mypy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…yment - Add skip_systemd flag to BaseSystemdDeploy that skips all systemd operations (create, remove, start, stop, restart) - Set skip_systemd=True in DevInventoryBuilder host_data - Read skip_systemd from host.data in AppProxyBaseDeploy - Add config_local.py with SUDO=False for local dev - Verified: full dry-run of appproxy coordinator deploy succeeds on macOS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… set Management scripts (start.sh, stop.sh, show_logs.sh, show_status.sh) are systemd wrappers and unnecessary in dev mode where ./dev is used instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the installer + pyinfra deployment tooling to better support local/macOS development by allowing systemd operations to be skipped, adding a local dev inventory/config, and introducing a Traefik frontend mode option for appproxy configuration.
Changes:
- Add
skip_systemdsupport toBaseSystemdDeployand propagate it fromhost.datafor AppProxy deploys. - Introduce local dev pyinfra inventory tooling (
DevInventoryBuilder, shared defaults,inventory_local.py,config_local.py) plus arun_local.pywrapper for a gevent/macOS subprocess issue. - Add
FrontendMode.TRAEFIKand update installer appproxy config generation to emit Traefik-related config sections.
Reviewed changes
Copilot reviewed 16 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/install/pyinfra/inventory/test_dev_inventory.py | Adds unit tests covering dev inventory construction and shared default constants. |
| tests/unit/install/pyinfra/inventory/BUILD | Registers the new unit tests in the build system. |
| src/ai/backend/install/types.py | Adds FrontendMode.TRAEFIK to installer type definitions. |
| src/ai/backend/install/pyinfra/runner.py | Implements skip_systemd gating for systemd + management-script operations. |
| src/ai/backend/install/pyinfra/os_packages/package_manager.py | Fixes platform utils import path to the new module location. |
| src/ai/backend/install/pyinfra/inventory/shared_defaults.py | Centralizes default ports/versions/dev credentials for reuse by dev tooling. |
| src/ai/backend/install/pyinfra/inventory/run_local.py | Adds local pyinfra runner wrapper with gevent/macOS subprocess workaround. |
| src/ai/backend/install/pyinfra/inventory/inventory_local.py | Adds a pyinfra inventory entrypoint for @local dev deployments. |
| src/ai/backend/install/pyinfra/inventory/group_data/all.py | Adds global host.data defaults for pyinfra deployments via group_data. |
| src/ai/backend/install/pyinfra/inventory/dev_inventory.py | Implements DevInventoryBuilder and sets skip_systemd=True for local mode. |
| src/ai/backend/install/pyinfra/inventory/config_local.py | Adds pyinfra config disabling sudo for local @local runs. |
| src/ai/backend/install/pyinfra/inventory/builder.py | Adds unified inventory builder for single-node and HA deployment scenarios. |
| src/ai/backend/install/pyinfra/deploy/cores/appproxy/base.py | Reads skip_systemd from host.data so AppProxy deploys can skip systemd actions. |
| src/ai/backend/install/context.py | Updates installer-generated appproxy configs for Traefik mode. |
| requirements.txt | Adds pyinfra and passlib as dependencies. |
| python.lock | Locks new dependencies and transitive requirements. |
| pyproject.toml | Adjusts mypy overrides for pyinfra modules now that pyinfra is included. |
| changes/10738.feature.md | Adds a towncrier fragment (currently appears mismatched to this PR’s scope). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| data["proxy_coordinator"]["enable_traefik"] = True # type: ignore[index] | ||
| traefik_table = tomlkit.table() | ||
| traefik_etcd_table = tomlkit.table() | ||
| traefik_etcd_table["namespace"] = "backend" |
There was a problem hiding this comment.
In Traefik mode, the coordinator Traefik etcd namespace is set to "backend", but the proxy coordinator’s TraefikConfig defaults/assumes the namespace is "traefik". Writing to the wrong namespace will prevent the coordinator from updating/reading Traefik routing state correctly. Please set this to "traefik" (or reuse the same namespace value used for the worker Traefik section).
| traefik_etcd_table["namespace"] = "backend" | |
| traefik_etcd_table["namespace"] = "traefik" |
| # Add traefik section | ||
| traefik_table = tomlkit.table() | ||
| traefik_table["api_port"] = 18080 | ||
| traefik_table["last_used_time_marker_directory"] = "./last_used" | ||
| traefik_etcd_table = tomlkit.table() | ||
| traefik_etcd_table["namespace"] = "traefik" |
There was a problem hiding this comment.
The worker-side Traefik config section you build does not set traefik.frontend_mode, but ProxyWorkerConfig.TraefikConfig requires it (no default) and validates it to decide whether port_proxy or wildcard_domain must be present. Without frontend_mode, loading the generated app-proxy-worker.toml will fail schema validation. Set traefik.frontend_mode explicitly (likely "port" since you configure port_proxy).
| # Port proxy sub-section under traefik | ||
| port_proxy_table = tomlkit.table() | ||
| port_proxy_table["advertised_host"] = public_facing_address | ||
| port_proxy_table["bind_port_range"] = [10205, 10300] |
There was a problem hiding this comment.
In the Traefik worker config, the port proxy subsection uses the key bind_port_range, but TraefikPortProxyConfig expects port_range (tuple[int,int]). With the current key name, the worker config will not validate and Traefik port routing won’t be configured. Rename the key to port_range (and keep the value as a 2-item range).
| port_proxy_table["bind_port_range"] = [10205, 10300] | |
| port_proxy_table["port_range"] = [10205, 10300] |
| _original_fork_exec = subprocess._fork_exec | ||
|
|
||
| from gevent import monkey # noqa: E402 | ||
|
|
||
| monkey.patch_all() | ||
|
|
||
| # Restore _fork_exec after monkey patching | ||
| subprocess._fork_exec = _original_fork_exec |
There was a problem hiding this comment.
This wrapper relies on the private attribute subprocess._fork_exec unconditionally. On some Python implementations/platforms this attribute may not exist (or its semantics may differ), causing an AttributeError before pyinfra starts. Please guard with getattr(subprocess, "_fork_exec", None) / hasattr and only restore when the attribute exists, so non-macOS (or future Python) runs don’t break.
| _original_fork_exec = subprocess._fork_exec | |
| from gevent import monkey # noqa: E402 | |
| monkey.patch_all() | |
| # Restore _fork_exec after monkey patching | |
| subprocess._fork_exec = _original_fork_exec | |
| _original_fork_exec = getattr(subprocess, "_fork_exec", None) | |
| from gevent import monkey # noqa: E402 | |
| monkey.patch_all() | |
| # Restore _fork_exec after monkey patching (if it existed originally) | |
| if _original_fork_exec is not None and hasattr(subprocess, "_fork_exec"): | |
| subprocess._fork_exec = _original_fork_exec |
| ssh_pubkey = os.getenv("PYINFRA_SSH_PUBKEY", "~/.ssh/id_rsa.pub") | ||
| ssh_port = int(os.getenv("PYINFRA_SSH_PORT", "22")) | ||
| ssh_password = os.getenv("PYINFRA_SSH_PASSWORD", "") | ||
| ssh_strict_host_key_checking = "no" |
There was a problem hiding this comment.
ssh_strict_host_key_checking is hard-coded to "no" and cannot be overridden via env vars, which contradicts the module docstring (“All values can be overridden via environment variables”) and weakens SSH security by default. Please make it configurable via an env var (e.g., PYINFRA_SSH_STRICT_HOST_KEY_CHECKING) and consider defaulting to strict checking in non-dev contexts.
| ssh_strict_host_key_checking = "no" | |
| ssh_strict_host_key_checking = os.getenv("PYINFRA_SSH_STRICT_HOST_KEY_CHECKING", "no") |
| registry_type = os.getenv("PYINFRA_CONTAINER_REGISTRY_TYPE", "harbor2") | ||
| registry_scheme = os.getenv("PYINFRA_CONTAINER_REGISTRY_SCHEME", "http") | ||
| registry_name = os.getenv("PYINFRA_CONTAINER_REGISTRY_NAME", "bai-repo") | ||
| registry_port = os.getenv("PYINFRA_CONTAINER_REGISTRY_PORT", "7080") |
There was a problem hiding this comment.
registry_port is read from the environment as a string, but several deploy scripts treat it as an int (e.g., assign to self.registry_port: int and pass into templates/config). Please cast it to int here (similar to ssh_port) to avoid subtle type/templating issues.
| registry_port = os.getenv("PYINFRA_CONTAINER_REGISTRY_PORT", "7080") | |
| registry_port = int(os.getenv("PYINFRA_CONTAINER_REGISTRY_PORT", "7080")) |
| worker_inference_advertised_hostname=h, | ||
| worker_inference_port=10203, | ||
| worker_inference_app_port_start=10601, | ||
| worker_inference_app_port_end=10700, | ||
| webserver_endpoint=f"http://{h}:{p['webserver']}", |
There was a problem hiding this comment.
The AppProxy inference port and range are hard-coded here even though shared_defaults defines APPPROXY_WORKER_INFERENCE_PORT and APPPROXY_WORKER_INFERENCE_RANGE. This can drift from the shared constants and produce inconsistent inventories. Please use the shared constants for inference port/range the same way interactive/tcp ports are derived.
| # Validate required keys (core services only) | ||
| required_keys = [ | ||
| "halfstack", | ||
| "manager", | ||
| "webserver", | ||
| "storage_proxy", | ||
| ] | ||
|
|
There was a problem hiding this comment.
_validate_service_placement only checks that required keys exist, but does not validate that each required placement list is non-empty. Empty lists will later lead to broken inventories (e.g., HA cluster configs with zero nodes or empty service groups) and harder-to-debug runtime errors. Please validate that required placement values are non-empty lists and (ideally) reference existing mgmt node names.
| @@ -0,0 +1 @@ | |||
| Migrate pyinfra inventory system from backend.ai-installer with dev inventory support | |||
There was a problem hiding this comment.
This news fragment filename/content looks like it belongs to PR/issue 10738 (“Migrate pyinfra inventory system...”), but this PR is BA-5563 (macOS/dev mode + skip_systemd). Please rename/update the fragment to reflect this PR’s change (or drop it if it was already added in #10738) so release notes don’t misattribute the feature.
- dict[str, object] → dict[str, Any] for proper indexing - Add mypy override for tests.unit.install.* (tomlkit access patterns) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
skip_systemdflag toBaseSystemdDeploythat skips systemd service and management script operationsrun_local.pywrapper to work around gevent subprocess bug on macOS + Python 3.13config_local.pywithSUDO=Falsefor local dev deploymentChanges
runner.py(BaseSystemdDeploy)skip_systemdattribute (default:False)create_systemd_service(),remove_systemd_service(),start_service(),stop_service(),restart_service()all skip whenskip_systemd=Truecreate_service_manage_scripts()skips whenskip_systemd=True(management scripts are systemd wrappers)dev_inventory.pyskip_systemd: Truein host_data for dev modebase.py(AppProxyBaseDeploy)skip_systemdfromhost.dataduring initrun_local.pysubprocess._fork_execbefore monkey patchingTest plan
pyinfra --dryappproxy coordinator deploy succeeds on macOSRelated
🤖 Generated with Claude Code