Skip to content

feat(BA-5563): Support macOS/dev mode in pyinfra deploy with skip_systemd#10740

Open
Yaminyam wants to merge 14 commits intomainfrom
feat/BA-5563-pyinfra-macos-support
Open

feat(BA-5563): Support macOS/dev mode in pyinfra deploy with skip_systemd#10740
Yaminyam wants to merge 14 commits intomainfrom
feat/BA-5563-pyinfra-macos-support

Conversation

@Yaminyam
Copy link
Copy Markdown
Member

@Yaminyam Yaminyam commented Apr 2, 2026

Summary

  • Add skip_systemd flag to BaseSystemdDeploy that skips systemd service and management script operations
  • Add run_local.py wrapper to work around gevent subprocess bug on macOS + Python 3.13
  • Add config_local.py with SUDO=False for local dev deployment

Changes

runner.py (BaseSystemdDeploy)

  • New skip_systemd attribute (default: False)
  • create_systemd_service(), remove_systemd_service(), start_service(), stop_service(), restart_service() all skip when skip_systemd=True
  • create_service_manage_scripts() skips when skip_systemd=True (management scripts are systemd wrappers)

dev_inventory.py

  • Set skip_systemd: True in host_data for dev mode

base.py (AppProxyBaseDeploy)

  • Read skip_systemd from host.data during init

run_local.py

Test plan

  • pyinfra --dry appproxy coordinator deploy succeeds on macOS
  • systemd and management scripts are skipped (8 operations instead of 12)
  • Existing unit tests pass

Related

🤖 Generated with Claude Code

Yaminyam and others added 12 commits April 2, 2026 14:17
- Add TRAEFIK to FrontendMode enum
- Configure coordinator with enable_traefik and traefik.etcd settings
- Configure worker with traefik section (api_port, etcd, port_proxy)
  when frontend_mode is traefik

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate group_data/all.py from backend.ai-installer with enterprise-only
items (graylog, zabbix, license hwinfo) removed. Provides host.data
defaults for SSH, OS, Docker, Python, and container registry config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p 2)

- Migrate InventoryBuilder from backend.ai-installer (1848 → 1524 lines)
- Update imports: ai.backend.pyinfra → ai.backend.install.pyinfra
- Remove enterprise-only logic (license, control_panel, fasttrack, harbor, rtun)
- Keep enterprise config stubs with enabled=False in return dicts
- Fix type inconsistency in group_data: ssh_port, bai_user_id, bai_user_group_id
  now use int() wrapper for os.getenv() calls
- Add missing type annotations for lint compliance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Step 3)

Create DevInventoryBuilder that targets @Local with Docker Compose
halfstack ports matching DevContext.hydrate_install_info(). Provides
all host.data attributes and services dict that deploy scripts expect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add inventory_local.py as pyinfra CLI entry point for local dev
- Fix bai_version default to use actual version instead of "dev"
- Verified: deploy scripts successfully read all host.data attributes
  from DevInventoryBuilder (fails only on macOS subprocess, not inventory)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add shared_defaults.py with centralized port/version/credential constants
- Refactor DevInventoryBuilder to use shared_defaults instead of
  hardcoded values
- Constants are shared between DevContext (TUI) and DevInventoryBuilder (pyinfra)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 18 unit tests for DevInventoryBuilder and shared_defaults
- Remove follow_imports="skip" from mypy overrides (pyinfra is now a dependency)
- Remove import-not-found from disabled error codes
- Fix stale import path in package_manager.py
- Refactor APPPROXY_PORTS to typed constants for mypy compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EtcdConfig: hostname/port → advertised_client_ip/advertised_client_port
- ManagerConfig, WebserverConfig, HiveGatewayConfig: remove non-existent hostname field
- StorageProxyConfig: hostname → removed, client_port → port
- Update test to use correct field (advertised_client_port)
- Add changelog for PR #10738

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n macOS

Workaround for gevent/gevent#2169: monkey.patch_all() sets
subprocess._fork_exec to None on macOS + Python 3.13. This wrapper
saves and restores _fork_exec before running pyinfra CLI.

Can be removed once gevent fixes the issue upstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tations

- Add pyinfra~=3.7 and passlib~=1.7.4 to requirements.txt (was missing from commit)
- Regenerate python.lock with new dependencies
- Fix dict → dict[str, object] type annotations in tests for mypy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…yment

- Add skip_systemd flag to BaseSystemdDeploy that skips all systemd
  operations (create, remove, start, stop, restart)
- Set skip_systemd=True in DevInventoryBuilder host_data
- Read skip_systemd from host.data in AppProxyBaseDeploy
- Add config_local.py with SUDO=False for local dev
- Verified: full dry-run of appproxy coordinator deploy succeeds on macOS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… set

Management scripts (start.sh, stop.sh, show_logs.sh, show_status.sh) are
systemd wrappers and unnecessary in dev mode where ./dev is used instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 2, 2026 08:05
@github-actions github-actions bot added the size:XL 500~ LoC label Apr 2, 2026
@Yaminyam Yaminyam requested a review from a team April 2, 2026 08:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the installer + pyinfra deployment tooling to better support local/macOS development by allowing systemd operations to be skipped, adding a local dev inventory/config, and introducing a Traefik frontend mode option for appproxy configuration.

Changes:

  • Add skip_systemd support to BaseSystemdDeploy and propagate it from host.data for AppProxy deploys.
  • Introduce local dev pyinfra inventory tooling (DevInventoryBuilder, shared defaults, inventory_local.py, config_local.py) plus a run_local.py wrapper for a gevent/macOS subprocess issue.
  • Add FrontendMode.TRAEFIK and update installer appproxy config generation to emit Traefik-related config sections.

Reviewed changes

Copilot reviewed 16 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/unit/install/pyinfra/inventory/test_dev_inventory.py Adds unit tests covering dev inventory construction and shared default constants.
tests/unit/install/pyinfra/inventory/BUILD Registers the new unit tests in the build system.
src/ai/backend/install/types.py Adds FrontendMode.TRAEFIK to installer type definitions.
src/ai/backend/install/pyinfra/runner.py Implements skip_systemd gating for systemd + management-script operations.
src/ai/backend/install/pyinfra/os_packages/package_manager.py Fixes platform utils import path to the new module location.
src/ai/backend/install/pyinfra/inventory/shared_defaults.py Centralizes default ports/versions/dev credentials for reuse by dev tooling.
src/ai/backend/install/pyinfra/inventory/run_local.py Adds local pyinfra runner wrapper with gevent/macOS subprocess workaround.
src/ai/backend/install/pyinfra/inventory/inventory_local.py Adds a pyinfra inventory entrypoint for @local dev deployments.
src/ai/backend/install/pyinfra/inventory/group_data/all.py Adds global host.data defaults for pyinfra deployments via group_data.
src/ai/backend/install/pyinfra/inventory/dev_inventory.py Implements DevInventoryBuilder and sets skip_systemd=True for local mode.
src/ai/backend/install/pyinfra/inventory/config_local.py Adds pyinfra config disabling sudo for local @local runs.
src/ai/backend/install/pyinfra/inventory/builder.py Adds unified inventory builder for single-node and HA deployment scenarios.
src/ai/backend/install/pyinfra/deploy/cores/appproxy/base.py Reads skip_systemd from host.data so AppProxy deploys can skip systemd actions.
src/ai/backend/install/context.py Updates installer-generated appproxy configs for Traefik mode.
requirements.txt Adds pyinfra and passlib as dependencies.
python.lock Locks new dependencies and transitive requirements.
pyproject.toml Adjusts mypy overrides for pyinfra modules now that pyinfra is included.
changes/10738.feature.md Adds a towncrier fragment (currently appears mismatched to this PR’s scope).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

data["proxy_coordinator"]["enable_traefik"] = True # type: ignore[index]
traefik_table = tomlkit.table()
traefik_etcd_table = tomlkit.table()
traefik_etcd_table["namespace"] = "backend"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Traefik mode, the coordinator Traefik etcd namespace is set to "backend", but the proxy coordinator’s TraefikConfig defaults/assumes the namespace is "traefik". Writing to the wrong namespace will prevent the coordinator from updating/reading Traefik routing state correctly. Please set this to "traefik" (or reuse the same namespace value used for the worker Traefik section).

Suggested change
traefik_etcd_table["namespace"] = "backend"
traefik_etcd_table["namespace"] = "traefik"

Copilot uses AI. Check for mistakes.
Comment on lines +953 to +958
# Add traefik section
traefik_table = tomlkit.table()
traefik_table["api_port"] = 18080
traefik_table["last_used_time_marker_directory"] = "./last_used"
traefik_etcd_table = tomlkit.table()
traefik_etcd_table["namespace"] = "traefik"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worker-side Traefik config section you build does not set traefik.frontend_mode, but ProxyWorkerConfig.TraefikConfig requires it (no default) and validates it to decide whether port_proxy or wildcard_domain must be present. Without frontend_mode, loading the generated app-proxy-worker.toml will fail schema validation. Set traefik.frontend_mode explicitly (likely "port" since you configure port_proxy).

Copilot uses AI. Check for mistakes.
# Port proxy sub-section under traefik
port_proxy_table = tomlkit.table()
port_proxy_table["advertised_host"] = public_facing_address
port_proxy_table["bind_port_range"] = [10205, 10300]
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Traefik worker config, the port proxy subsection uses the key bind_port_range, but TraefikPortProxyConfig expects port_range (tuple[int,int]). With the current key name, the worker config will not validate and Traefik port routing won’t be configured. Rename the key to port_range (and keep the value as a 2-item range).

Suggested change
port_proxy_table["bind_port_range"] = [10205, 10300]
port_proxy_table["port_range"] = [10205, 10300]

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +26
_original_fork_exec = subprocess._fork_exec

from gevent import monkey # noqa: E402

monkey.patch_all()

# Restore _fork_exec after monkey patching
subprocess._fork_exec = _original_fork_exec
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wrapper relies on the private attribute subprocess._fork_exec unconditionally. On some Python implementations/platforms this attribute may not exist (or its semantics may differ), causing an AttributeError before pyinfra starts. Please guard with getattr(subprocess, "_fork_exec", None) / hasattr and only restore when the attribute exists, so non-macOS (or future Python) runs don’t break.

Suggested change
_original_fork_exec = subprocess._fork_exec
from gevent import monkey # noqa: E402
monkey.patch_all()
# Restore _fork_exec after monkey patching
subprocess._fork_exec = _original_fork_exec
_original_fork_exec = getattr(subprocess, "_fork_exec", None)
from gevent import monkey # noqa: E402
monkey.patch_all()
# Restore _fork_exec after monkey patching (if it existed originally)
if _original_fork_exec is not None and hasattr(subprocess, "_fork_exec"):
subprocess._fork_exec = _original_fork_exec

Copilot uses AI. Check for mistakes.
ssh_pubkey = os.getenv("PYINFRA_SSH_PUBKEY", "~/.ssh/id_rsa.pub")
ssh_port = int(os.getenv("PYINFRA_SSH_PORT", "22"))
ssh_password = os.getenv("PYINFRA_SSH_PASSWORD", "")
ssh_strict_host_key_checking = "no"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssh_strict_host_key_checking is hard-coded to "no" and cannot be overridden via env vars, which contradicts the module docstring (“All values can be overridden via environment variables”) and weakens SSH security by default. Please make it configurable via an env var (e.g., PYINFRA_SSH_STRICT_HOST_KEY_CHECKING) and consider defaulting to strict checking in non-dev contexts.

Suggested change
ssh_strict_host_key_checking = "no"
ssh_strict_host_key_checking = os.getenv("PYINFRA_SSH_STRICT_HOST_KEY_CHECKING", "no")

Copilot uses AI. Check for mistakes.
registry_type = os.getenv("PYINFRA_CONTAINER_REGISTRY_TYPE", "harbor2")
registry_scheme = os.getenv("PYINFRA_CONTAINER_REGISTRY_SCHEME", "http")
registry_name = os.getenv("PYINFRA_CONTAINER_REGISTRY_NAME", "bai-repo")
registry_port = os.getenv("PYINFRA_CONTAINER_REGISTRY_PORT", "7080")
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registry_port is read from the environment as a string, but several deploy scripts treat it as an int (e.g., assign to self.registry_port: int and pass into templates/config). Please cast it to int here (similar to ssh_port) to avoid subtle type/templating issues.

Suggested change
registry_port = os.getenv("PYINFRA_CONTAINER_REGISTRY_PORT", "7080")
registry_port = int(os.getenv("PYINFRA_CONTAINER_REGISTRY_PORT", "7080"))

Copilot uses AI. Check for mistakes.
Comment on lines +165 to +169
worker_inference_advertised_hostname=h,
worker_inference_port=10203,
worker_inference_app_port_start=10601,
worker_inference_app_port_end=10700,
webserver_endpoint=f"http://{h}:{p['webserver']}",
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AppProxy inference port and range are hard-coded here even though shared_defaults defines APPPROXY_WORKER_INFERENCE_PORT and APPPROXY_WORKER_INFERENCE_RANGE. This can drift from the shared constants and produce inconsistent inventories. Please use the shared constants for inference port/range the same way interactive/tcp ports are derived.

Copilot uses AI. Check for mistakes.
Comment on lines +355 to +362
# Validate required keys (core services only)
required_keys = [
"halfstack",
"manager",
"webserver",
"storage_proxy",
]

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_validate_service_placement only checks that required keys exist, but does not validate that each required placement list is non-empty. Empty lists will later lead to broken inventories (e.g., HA cluster configs with zero nodes or empty service groups) and harder-to-debug runtime errors. Please validate that required placement values are non-empty lists and (ideally) reference existing mgmt node names.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1 @@
Migrate pyinfra inventory system from backend.ai-installer with dev inventory support
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This news fragment filename/content looks like it belongs to PR/issue 10738 (“Migrate pyinfra inventory system...”), but this PR is BA-5563 (macOS/dev mode + skip_systemd). Please rename/update the fragment to reflect this PR’s change (or drop it if it was already added in #10738) so release notes don’t misattribute the feature.

Copilot uses AI. Check for mistakes.
Yaminyam and others added 2 commits April 2, 2026 17:56
- dict[str, object] → dict[str, Any] for proper indexing
- Add mypy override for tests.unit.install.* (tomlkit access patterns)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants