Skip to content

feat(BA-5565): Unify AppProxy config generation with tomlkit in-place modification#10746

Open
Yaminyam wants to merge 15 commits intomainfrom
feat/BA-5565-unify-appproxy-config
Open

feat(BA-5565): Unify AppProxy config generation with tomlkit in-place modification#10746
Yaminyam wants to merge 15 commits intomainfrom
feat/BA-5565-unify-appproxy-config

Conversation

@Yaminyam
Copy link
Copy Markdown
Member

@Yaminyam Yaminyam commented Apr 2, 2026

Summary

Extract AppProxy config generation into a shared module (config_gen/appproxy.py) that modifies existing tomlkit documents in-place, preserving comments and structure from auto-generated sample.toml files.

Changes

New: src/ai/backend/install/config_gen/appproxy.py

  • CoordinatorParams / WorkerParams — frozen dataclasses with all config parameters
  • apply_coordinator_config(doc, params) — modifies coordinator tomlkit document in-place
  • apply_worker_config(doc, params) — modifies worker tomlkit document in-place
  • Frontend mode logic (port/wildcard/traefik) centralized in one place

Refactored: context.py:configure_appproxy()

  • ~150 lines of tomlkit manipulation → ~30 lines of param construction + apply_* call
  • Loads sample toml → applies params → writes back (comments preserved)

Why tomlkit in-place (not Jinja2)?

sample.toml files are auto-generated by backend.ai mgr config generate-sample. Using tomlkit to modify them in-place means config schema changes are automatically picked up. Jinja2 templates would need manual sync with every schema change.

Test plan

  • 16 unit tests covering all frontend modes, TLS, traefik, comment preservation, TOML roundtrip
  • Lint and format pass

Related

🤖 Generated with Claude Code

Yaminyam and others added 12 commits April 2, 2026 14:17
- Add TRAEFIK to FrontendMode enum
- Configure coordinator with enable_traefik and traefik.etcd settings
- Configure worker with traefik section (api_port, etcd, port_proxy)
  when frontend_mode is traefik

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate group_data/all.py from backend.ai-installer with enterprise-only
items (graylog, zabbix, license hwinfo) removed. Provides host.data
defaults for SSH, OS, Docker, Python, and container registry config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p 2)

- Migrate InventoryBuilder from backend.ai-installer (1848 → 1524 lines)
- Update imports: ai.backend.pyinfra → ai.backend.install.pyinfra
- Remove enterprise-only logic (license, control_panel, fasttrack, harbor, rtun)
- Keep enterprise config stubs with enabled=False in return dicts
- Fix type inconsistency in group_data: ssh_port, bai_user_id, bai_user_group_id
  now use int() wrapper for os.getenv() calls
- Add missing type annotations for lint compliance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Step 3)

Create DevInventoryBuilder that targets @Local with Docker Compose
halfstack ports matching DevContext.hydrate_install_info(). Provides
all host.data attributes and services dict that deploy scripts expect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add inventory_local.py as pyinfra CLI entry point for local dev
- Fix bai_version default to use actual version instead of "dev"
- Verified: deploy scripts successfully read all host.data attributes
  from DevInventoryBuilder (fails only on macOS subprocess, not inventory)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add shared_defaults.py with centralized port/version/credential constants
- Refactor DevInventoryBuilder to use shared_defaults instead of
  hardcoded values
- Constants are shared between DevContext (TUI) and DevInventoryBuilder (pyinfra)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 18 unit tests for DevInventoryBuilder and shared_defaults
- Remove follow_imports="skip" from mypy overrides (pyinfra is now a dependency)
- Remove import-not-found from disabled error codes
- Fix stale import path in package_manager.py
- Refactor APPPROXY_PORTS to typed constants for mypy compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EtcdConfig: hostname/port → advertised_client_ip/advertised_client_port
- ManagerConfig, WebserverConfig, HiveGatewayConfig: remove non-existent hostname field
- StorageProxyConfig: hostname → removed, client_port → port
- Update test to use correct field (advertised_client_port)
- Add changelog for PR #10738

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n macOS

Workaround for gevent/gevent#2169: monkey.patch_all() sets
subprocess._fork_exec to None on macOS + Python 3.13. This wrapper
saves and restores _fork_exec before running pyinfra CLI.

Can be removed once gevent fixes the issue upstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tations

- Add pyinfra~=3.7 and passlib~=1.7.4 to requirements.txt (was missing from commit)
- Regenerate python.lock with new dependencies
- Fix dict → dict[str, object] type annotations in tests for mypy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create config_gen/appproxy.py with CoordinatorParams, WorkerParams,
  build_coordinator_config(), build_worker_config()
- Refactor context.py:configure_appproxy() to use shared module
  (~150 lines of tomlkit manipulation → ~30 lines of param construction)
- Frontend mode logic (port/wildcard/traefik) now lives in one place
- Add 16 unit tests covering all frontend modes and edge cases
- Import FrontendMode from types.py (single definition)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change from build_*_config() (returns new dict) to apply_*_config()
(modifies existing tomlkit document in-place). This preserves comments
and structure from sample.toml files, which are auto-generated by
`backend.ai mgr config generate-sample`.

- apply_coordinator_config(doc, params) modifies coordinator toml doc
- apply_worker_config(doc, params) modifies worker toml doc
- context.py loads sample, applies params, writes back
- Tests use sample TOML strings to verify comment preservation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 2, 2026 08:30
@github-actions github-actions bot added the size:XL 500~ LoC label Apr 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors AppProxy TOML configuration generation into a shared config_gen/appproxy.py module that mutates tomlkit documents in-place (to preserve comments/structure), and also introduces/extends pyinfra inventory tooling (defaults, dev inventory, and a unified inventory builder) along with associated tests.

Changes:

  • Add shared AppProxy config generator (CoordinatorParams/WorkerParams + apply_* functions) and refactor context.py:configure_appproxy() to use it.
  • Add pyinfra inventory infrastructure (shared defaults, group_data defaults, local/dev inventory, unified InventoryBuilder) and unit tests.
  • Add dependencies (pyinfra, passlib) and adjust mypy configuration for pyinfra modules.

Reviewed changes

Copilot reviewed 16 out of 20 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/unit/install/pyinfra/inventory/test_dev_inventory.py Adds unit tests for DevInventoryBuilder and shared defaults.
tests/unit/install/pyinfra/inventory/BUILD Adds Pants python_tests target for inventory unit tests.
tests/unit/install/config_gen/test_appproxy.py Adds unit tests for shared AppProxy config generation and TOML roundtrip.
tests/unit/install/config_gen/BUILD Adds Pants python_tests target for config_gen unit tests.
src/ai/backend/install/types.py Extends FrontendMode enum with TRAEFIK.
src/ai/backend/install/pyinfra/os_packages/package_manager.py Fixes import path for platform detection utilities.
src/ai/backend/install/pyinfra/inventory/shared_defaults.py Centralizes shared ports/versions/dev defaults for installer + pyinfra.
src/ai/backend/install/pyinfra/inventory/run_local.py Adds local pyinfra runner wrapper with gevent/macOS workaround.
src/ai/backend/install/pyinfra/inventory/inventory_local.py Adds a local inventory entrypoint for pyinfra CLI.
src/ai/backend/install/pyinfra/inventory/group_data/all.py Defines pyinfra host.data defaults via env vars.
src/ai/backend/install/pyinfra/inventory/dev_inventory.py Adds DevInventoryBuilder for local @local deployments.
src/ai/backend/install/pyinfra/inventory/builder.py Adds unified InventoryBuilder for single-node and HA inventories.
src/ai/backend/install/context.py Refactors AppProxy configuration to call shared TOML mutators.
src/ai/backend/install/config_gen/appproxy.py Introduces shared AppProxy TOML in-place mutation logic (incl. Traefik mode).
src/ai/backend/install/pyinfra/inventory/init.py Package init (no functional changes shown).
src/ai/backend/install/config_gen/init.py Package init (no functional changes shown).
requirements.txt Adds pyinfra and passlib runtime dependencies.
python.lock Locks new dependencies and transitive updates.
pyproject.toml Updates mypy overrides for pyinfra modules (removes follow_imports="skip").
changes/10738.feature.md Adds a changelog entry for pyinfra inventory migration (file naming may be inconsistent with this PR).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

enable_traefik: bool = False
etcd_host: str = "127.0.0.1"
etcd_port: int = 8120
etcd_namespace: str = "backend"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoordinatorParams.etcd_namespace defaults to "backend", but Traefik integration in AppProxy expects the etcd namespace to be "traefik" (see pyinfra template src/ai/backend/install/pyinfra/deploy/cores/appproxy/templates/app-proxy-coordinator.toml.j2:63-66 and coordinator config model src/ai/backend/appproxy/coordinator/config.py:169+). With the current default, configure_appproxy() will emit a coordinator TOML that points Traefik at the wrong etcd namespace unless every caller overrides it. Consider changing the default to "traefik" (or renaming the field to make the intent explicit).

Suggested change
etcd_namespace: str = "backend"
etcd_namespace: str = "traefik"

Copilot uses AI. Check for mistakes.
Comment on lines +239 to +257
# Add traefik section
traefik_table = tomlkit.table()
traefik_table["api_port"] = params.traefik_api_port
traefik_table["last_used_time_marker_directory"] = params.traefik_last_used_dir
traefik_etcd_table = tomlkit.table()
traefik_etcd_table["namespace"] = params.traefik_etcd_namespace
traefik_etcd_table["addr"] = _make_inline_table({
"host": params.traefik_etcd_host,
"port": params.traefik_etcd_port,
})
traefik_table["etcd"] = traefik_etcd_table
port_proxy_table = tomlkit.table()
port_proxy_table["advertised_host"] = params.port_proxy_advertised_host
port_proxy_table["bind_port_range"] = [
params.port_proxy_range_start,
params.port_proxy_range_end,
]
traefik_table["port_proxy"] = port_proxy_table
doc["proxy_worker"]["traefik"] = traefik_table # type: ignore[index]
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TRAEFIK mode, the generated TOML keys don’t match the AppProxy worker schema / existing templates: Traefik port proxy uses port_range (not bind_port_range), and the traefik table requires a frontend_mode field (see src/ai/backend/install/pyinfra/deploy/cores/appproxy/templates/app-proxy-worker.toml.j2:74-90 and src/ai/backend/appproxy/worker/config.py TraefikConfig). As-is, the produced config is likely to fail validation at runtime. Update the traefik section generation to follow the expected keys/fields.

Copilot uses AI. Check for mistakes.
# Configure based on frontend_mode
match params.frontend_mode:
case FrontendMode.PORT:
doc["proxy_worker"]["port_proxy"]["advertised_host"] = params.port_proxy_advertised_host # type: ignore[index]
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PORT mode ignores port_proxy_range_start / port_proxy_range_end even though they’re part of WorkerParams; only advertised_host is updated. If callers customize the port range, the output TOML will silently keep whatever range is in the input document. Consider setting bind_port_range (and any corresponding advertised range field, if present) from the params in PORT mode as well.

Suggested change
doc["proxy_worker"]["port_proxy"]["advertised_host"] = params.port_proxy_advertised_host # type: ignore[index]
port_proxy = doc["proxy_worker"]["port_proxy"] # type: ignore[index]
port_proxy["advertised_host"] = params.port_proxy_advertised_host
# Update bind port range from params
port_proxy["bind_port_range"] = _make_inline_table({
"start": params.port_proxy_range_start,
"end": params.port_proxy_range_end,
})
# Optionally update advertised port range if params provide it
advertised_range_start = getattr(params, "port_proxy_advertised_range_start", None)
advertised_range_end = getattr(params, "port_proxy_advertised_range_end", None)
if advertised_range_start is not None and advertised_range_end is not None:
port_proxy["advertised_port_range"] = _make_inline_table({
"start": advertised_range_start,
"end": advertised_range_end,
})

Copilot uses AI. Check for mistakes.
Comment on lines +210 to +238
# Configure based on frontend_mode
match params.frontend_mode:
case FrontendMode.PORT:
doc["proxy_worker"]["port_proxy"]["advertised_host"] = params.port_proxy_advertised_host # type: ignore[index]
case FrontendMode.WILDCARD:
# Remove port_proxy section
if "port_proxy" in doc["proxy_worker"]: # type: ignore[operator]
del doc["proxy_worker"]["port_proxy"] # type: ignore[union-attr]

# Override api_advertised_addr
doc["proxy_worker"]["api_advertised_addr"] = _make_inline_table( # type: ignore[index]
{"host": params.api_advertised_host, "port": params.wildcard_advertised_port}
)

# Add wildcard_domain section
if params.wildcard_domain:
wildcard_table = tomlkit.table()
wildcard_table["domain"] = params.wildcard_domain
wildcard_table["bind_addr"] = _make_inline_table({
"host": params.wildcard_bind_host,
"port": params.wildcard_bind_port,
})
wildcard_table["advertised_port"] = params.wildcard_advertised_port
doc["proxy_worker"]["wildcard_domain"] = wildcard_table # type: ignore[index]
case FrontendMode.TRAEFIK:
# Remove port_proxy section
if "port_proxy" in doc["proxy_worker"]: # type: ignore[operator]
del doc["proxy_worker"]["port_proxy"] # type: ignore[union-attr]

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_worker_config() doesn’t clean up mode-specific sections when switching modes or when optional params are unset (e.g., WILDCARD mode with wildcard_domain=None leaves any existing proxy_worker.wildcard_domain table untouched; switching back to PORT leaves any existing proxy_worker.traefik table untouched). Since this module claims to modify existing TOML documents in-place, this can produce configs with conflicting/stale sections. Consider explicitly deleting wildcard_domain / traefik tables when they’re not applicable to the selected mode, and re-creating port_proxy when returning to PORT mode.

Copilot uses AI. Check for mistakes.
Comment on lines +165 to +168
worker_inference_advertised_hostname=h,
worker_inference_port=10203,
worker_inference_app_port_start=10601,
worker_inference_app_port_end=10700,
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DevInventoryBuilder hard-codes the inference worker port and port range (10203 / 10601-10700) instead of using the shared constants already defined in shared_defaults.py (APPPROXY_WORKER_INFERENCE_PORT / APPPROXY_WORKER_INFERENCE_RANGE). This undermines the goal of centralizing magic numbers and can drift if defaults change. Prefer wiring these fields from the shared_defaults constants and/or DevInventoryBuilder.PORTS.

Copilot uses AI. Check for mistakes.
registry_port = os.getenv("PYINFRA_CONTAINER_REGISTRY_PORT", "7080")
registry_username = os.getenv("PYINFRA_CONTAINER_REGISTRY_USERNAME", "bai")
registry_projects = os.getenv("PYINFRA_CONTAINER_REGISTRY_PROJECTS", "bai,bai-user")
registry_password = os.getenv("PYINFRA_CONTAINER_REGISTRY_PASSWORD", "lY0B=op3")
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registry_password defaults to a non-empty hardcoded value. This value is consumed by deploy scripts (e.g. pyinfra/deploy/cores/manager/deploy.py reads host.data.registry_password) and could accidentally be used in non-dev deployments, weakening security by providing a predictable default credential. Consider defaulting to an empty string and emitting a warning/error when it’s not explicitly set via env/inventory (similar to the PYINFRA_SUDO_PASSWORD warning).

Suggested change
registry_password = os.getenv("PYINFRA_CONTAINER_REGISTRY_PASSWORD", "lY0B=op3")
registry_password = os.getenv("PYINFRA_CONTAINER_REGISTRY_PASSWORD", "")
if not registry_password:
logger.warning(
"PYINFRA_CONTAINER_REGISTRY_PASSWORD is not set; please configure a secure "
"registry password via environment variable or inventory."
)

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +26
_original_fork_exec = subprocess._fork_exec

from gevent import monkey # noqa: E402

monkey.patch_all()

# Restore _fork_exec after monkey patching
subprocess._fork_exec = _original_fork_exec
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script unconditionally accesses the private subprocess._fork_exec attribute. On platforms where it doesn’t exist (e.g., Windows) or if Python internals change, this will crash before running pyinfra. Consider guarding with hasattr(subprocess, "_fork_exec") and/or restricting the workaround to affected platforms/versions (macOS + Python 3.13), falling back to a normal cli() run otherwise.

Suggested change
_original_fork_exec = subprocess._fork_exec
from gevent import monkey # noqa: E402
monkey.patch_all()
# Restore _fork_exec after monkey patching
subprocess._fork_exec = _original_fork_exec
needs_original_fork_exec_workaround = (
sys.platform == "darwin"
and sys.version_info >= (3, 13)
and hasattr(subprocess, "_fork_exec")
)
if needs_original_fork_exec_workaround:
_original_fork_exec = subprocess._fork_exec
from gevent import monkey # noqa: E402
monkey.patch_all()
# Restore _fork_exec after monkey patching, if we saved it
if needs_original_fork_exec_workaround:
subprocess._fork_exec = _original_fork_exec

Copilot uses AI. Check for mistakes.
@@ -0,0 +1 @@
Migrate pyinfra inventory system from backend.ai-installer with dev inventory support
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog entry filename changes/10738.feature.md appears to reference a different PR/issue number than this PR (BA-5565). If your project expects one changeset file per PR, this will misattribute release notes. Consider renaming the changes file to match the current PR number / convention used in the repo.

Copilot uses AI. Check for mistakes.
Yaminyam and others added 3 commits April 2, 2026 17:52
- Remove type: ignore comments from config_gen/appproxy.py (handled by mypy overrides)
- Add index/operator to mypy disable_error_code for ai.backend.install.*
- Add mypy override for tests.unit.install.* (tomlkit dict-like access)
- Fix dict[str, object] → dict[str, Any] in test_dev_inventory.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove all unused type: ignore comments (now handled by mypy overrides)
- Fix str | None → str arg-type for appproxy secrets (add `or ""` fallback)
- Fix dict → dict[str, object] type param in config_gen/agent.py
- Fix ConsoleRenderable → str in context.py:462
- Auto-format with ruff

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cripts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants