Skip to content

refactor(infra-local): simplify + unify into one Python compose package (~50% smaller)#796

Merged
DorianZheng merged 7 commits into
mainfrom
chore/infra-local-purge-dead-config
Jun 16, 2026
Merged

refactor(infra-local): simplify + unify into one Python compose package (~50% smaller)#796
DorianZheng merged 7 commits into
mainfrom
chore/infra-local-purge-dead-config

Conversation

@DorianZheng

@DorianZheng DorianZheng commented Jun 15, 2026

Copy link
Copy Markdown
Member

What

Simplifies apps/infra-local (the BoxLite-based local dev stack) and unifies its two-language orchestrator into one Python package. ~4,500 → ~2,190 lines (~50%), one entry point (python -m compose), flat structure.

Commits

  1. simplify — drop dead config (the BOXLITE_*_HOST_PORT knobs never moved a bound port; pg_url/tcp_port/Severity.WARN; dead configs/minio/init.sh), dedupe the SDK import shim, delete the stale CONNECTIONS.md (445 lines restating the code), rewrite the README, and remove the infra-local test suite (not in CI; mostly asserted dataclass defaults) — preserving the one unique capability as a native-gated SDK test.
  2. port L2 to Python + rename + flatten — the 4 native host processes (API/runner/proxy/dashboard) move from ~800 lines of bash (scripts/) into compose/native.py (subprocess supervision: detached daemons, pidfiles, killpg teardown). Package boxlite_localcompose. scripts/ + configs/ deleted; api.env → root. CLI is 7 verbs (up/down/status/logs/restart/reset/nuke); Makefile ~19 → 9 thin aliases.
  3. adversarial fixes — a 2-round adversarial-review loop found + fixed an export KEY=val parser bug in the .env reader; the SDK capability test was rewritten (alpine busybox lacks httpd) to a python:3-alpine http.server box and verified live.

Final structure

4 root files (api.env, Makefile, README.md, pyproject.toml) + the compose/ package (config / services / orchestrator [L1] / native [L2] / doctor / _sdk / main). No scripts/, configs/, or tests/.

Verification

  • python -m compose --help + all 7 subcommands; py_compile + import all modules.
  • SDK volume-port test passes live (pytest -m integration → 1 passed, 30s): RW-volume persistence + host-port mapping.
  • 2-round adversarial review terminated at ZERO_FINDINGS; CLAUDE.md auditor PASS on every commit; pre-push integration suite green.

⚠️ Reviewer note — live-stack validation needed before merge

compose/native.py is a faithful behavior-port of the deleted bash, but process supervision is runtime-only behavior. It compiles/imports/parses and the adversarial review cleared it against the bash spec, but a live make up && make status (then make down / make restart COMPONENTS=runner) on an Apple Silicon Mac is the final proof the L2 daemons spawn/reap correctly. Please run that before merging.

No Cargo/Rust/core-SDK source changes; scope is apps/infra-local/ + one sdks/python/tests/ file.

Summary by CodeRabbit

Release Notes

  • Refactor

    • Reworked local infra orchestration to use python -m compose, with Makefile aliases for up, down, status, logs, restart, reset, and nuke.
    • Updated local persistent state directory to .apps-local/.
  • Documentation

    • Refreshed apps/infra-local README for the new compose workflow, commands, and endpoints.
    • Removed legacy connection-reference documentation.
  • Tests

    • Added an SDK integration test for read-write volume persistence and host port forwarding across restarts.
    • Removed legacy infra-local integration/unit test suites.

…s (~50%)

apps/infra-local was ~4,500 lines for an 11-service local stack. This keeps
every service and behavior identical while removing avoidable ceremony:
24 files, +437/-2703. No Cargo/Rust/core-SDK source changes.

Code (boxlite_local/, 10 files -> 8):
- doctor.py: drop the Severity/DoctorCheck/DoctorReport/DoctorError(report)/
  format_report framework for plain functions returning failure strings; shrink
  the lsof parser. Keep a one-line DoctorError(RuntimeError).
- Delete types.py: ServiceSpec/HealthCheck move to services.py (their only
  consumers); the Doctor* types are gone with the doctor rewrite.
- Delete execwrap.py: exec_collect inlined into orchestrator.py.
- Centralize the dual-layout SDK import (boxlite vs boxlite.boxlite) in _sdk.py
  (was copy-pasted across orchestrator/doctor).
- Drop dead config: the BOXLITE_*_HOST_PORT env overrides never moved a bound
  port (ServiceSpec.ports are literals), the pg_url property, and the YAGNI
  HealthCheck.tcp_port / Severity.WARN. Delete configs/minio/init.sh (a drifted
  dead duplicate of the inline _MINIO_INIT_SCRIPT in services.py).

Tests — delete apps/infra-local/tests/ (~1000 lines), behavior-preserving:
- The suite is not in CI and mostly asserted dataclass defaults. The real E2E
  (scripts/test/e2e) tests the REST path, which by design forbids host bind
  mounts + host ports (cases/test_volume_readonly.py), so these can't move
  there. Dev-stack validation is now `make stack-up` + the apps `npm run
  e2e:local`.
- Preserve the one genuinely-unique, nowhere-else-covered capability as a
  native-gated SDK test: sdks/python/tests/test_volume_port_persistence.py — a
  long-lived box with a read-write host volume that persists + a host-reachable
  mapped port (the shape infra-local relies on, e.g. postgres 25432:5432 over a
  writable volume). Mirrors the existing test_readonly_volume_remount.py.

Docs (1212 -> 103):
- Delete CONNECTIONS.md (it restated config.py/services.py). README rewritten to
  ~100 lines: what-it-is, quick start, make targets, endpoint+credential table,
  how to validate, top-6 troubleshooting. Also corrected the stale "Docker
  Desktop required" prereq (the runtime is Hypervisor.framework + libkrun).

Scripts:
- stack-up.sh: factor the four identical component prechecks + health-waits into
  prestart()/await_up() helpers; each launch invocation is unchanged.

Verification: py_compile + import all modules; `python -m boxlite_local --help`
+ doctor fast paths; bash -n on all scripts; new SDK test pytest --collect-only
(import + marker + collection; full native run is gated). End-to-end remains
`make stack-up && make stack-status`.
…, flatten

Unifies the two-language orchestrator into one Python package and slims the
command + directory surface. Builds on the prior simplification commit; folds in
the cli.py -> __main__.py merge from that step.

Rename boxlite_local/ -> compose/ (`python -m compose`). The old name was
redundant (the whole repo is boxlite) and didn't match its infra-local/ dir.
Internal imports are relative, so only pyproject/Makefile/README referenced it.

Port L2 shell -> compose/native.py: the four native host processes (API, runner,
proxy, dashboard) are supervised via subprocess instead of ~800 lines of bash.
Daemons spawn detached (start_new_session) and stop via SIGTERM->SIGKILL on the
process group -- killpg reaps the nx/go grandchildren, replacing the bash
pkill-by-name sweep. Reuses orchestrator._http_probe / doctor._lsof_owner /
InfraConfig. Deletes scripts/ (9 files).

Flatten: configs/api.env -> api.env (root); scripts/ + configs/ gone. The tree
is now four root files + the compose/ package.

CLI: one unified `python -m compose` over L1 (boxes) + L2 (procs), 7 verbs --
up / down / status / logs / restart / reset / nuke. `up` is self-contained
(install, build-if-missing, preflight, seed), so build/seed/doctor/migrate are
not separate commands. `restart <name>` bounces an L2 proc OR recreates an L1
box (folds the old `rebuild`). `nuke` is the full teardown; `reset` is data-only.

Makefile: ~19 targets -> 9 thin aliases (one per verb + help/install).

Verification: py_compile + import all modules; `python -m compose --help` + every
subcommand. Process supervision is runtime-only behavior I cannot exercise here
(no live stack), so native.py is a faithful behavior-port of the deleted scripts
and `make up && make status` must be run live to validate end-to-end before
merge. No automated test added for native.py: the suite was removed earlier (per
the repo convention of not unit-testing process/glue code), and the one unique
SDK capability is already pinned by sdks/python/tests/test_volume_port_persistence.py.
Adversarial review found compose/native.py's apps/.env parser stored
`export KEY=val` lines under the literal key "export KEY", silently dropping
the var from the API process env — a divergence from the bash it ports
(`set -a && . ./.env && set +a`, which strips the `export` keyword). Latent in
the checked-in template (no export lines) but live for dev edits to apps/.env.
Strip the keyword (and only the keyword: `exportFOO=` stays `exportFOO`).

Verified against real `bash -c 'set -a && . file && set +a && env'`
(revert -> diverges on an exported var, restore -> matches).
… busybox httpd

The new test_volume_port_persistence integration test failed on its first real
execution (the pre-push hook): `alpine:latest`'s base busybox omits the `httpd`
applet, so `httpd -p 8000 -h /data` was command-not-found. Run the in-box server
as the box's own long-lived foreground cmd instead — python:3-alpine with
`python3 -m http.server` over the mounted volume (mirrors how infra-local runs
each service as the box's main process), rather than backgrounding a daemon via
exec (whose lifetime is the exec session, not the box).

Verified live: `pytest tests/test_volume_port_persistence.py -m integration` →
1 passed (30s). Proves RW-volume write-through + persistence-across-restart +
host-port reachability against a real microVM.
@DorianZheng DorianZheng requested a review from a team as a code owner June 15, 2026 16:35
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8883d9a5-4e88-4031-8d15-482d1be0c02c

📥 Commits

Reviewing files that changed from the base of the PR and between 1c4458a and 1b3d3c7.

📒 Files selected for processing (1)
  • sdks/python/tests/test_volume_port_persistence.py

📝 Walkthrough

Walkthrough

Replaces the apps/infra-local orchestration layer: deletes the boxlite_local Python package, all scripts/stack-*.sh bash scripts, legacy tests, and configs/minio/init.sh; introduces a new compose Python package with services.py type contracts, updated config.py/orchestrator.py, a new doctor.py, and a comprehensive native.py L2 supervisor wired through a __main__.py CLI and slimmed Makefile. A new SDK-level integration test for RW volume and host-port persistence is added separately.

Changes

infra-local orchestrator rewrite

Layer / File(s) Summary
Package rename, pyproject, and deleted legacy code
apps/infra-local/pyproject.toml, deleted: apps/infra-local/boxlite_local/*, apps/infra-local/scripts/stack-*.sh, apps/infra-local/scripts/_stack-common.sh, apps/infra-local/configs/minio/init.sh, apps/infra-local/tests/**
Renames package from boxlite_local to infra-local, switches discovery to compose*, removes test extras. All boxlite_local/* modules, scripts/stack-*.sh, scripts/_stack-common.sh, configs/minio/init.sh, and all former test modules are deleted.
compose package type contracts and config
apps/infra-local/compose/services.py, apps/infra-local/compose/_sdk.py, apps/infra-local/compose/config.py
Defines self-contained HealthCheck/ServiceSpec dataclasses in services.py; introduces lazy import_sdk() in _sdk.py; updates InfraConfig to drop env-driven port fields and pg_url, standardize credential fields, and restrict load() to credential/identity env overrides only.
orchestrator.py refactor and new doctor.py
apps/infra-local/compose/orchestrator.py, apps/infra-local/compose/doctor.py
Inlines exec_collect into orchestrator.py, routes SDK type acquisition through import_sdk(), removes ps() and the tcp_port health-check branch, sources types from .services. New doctor.py adds DoctorError, lsof-based port-ownership checks, SDK/runtime probes, and an async doctor() aggregator.
native.py: L2 native process supervisor and stack commands
apps/infra-local/compose/native.py
Adds the full L2 layer: four-component configuration with fixed ports and health specs; detached-subprocess lifecycle with pidfiles and SIGTERM→SIGKILL+pkill fallback; L1 asyncio gating and psql postgres utilities; all public stack commands (up, down, status, logs, restart, reset, nuke, seed) with Go binary auto-build, .env seeding, soft/hard DB reset, migration execution, and seed verification.
compose CLI entry point and Makefile thin aliases
apps/infra-local/compose/__init__.py, apps/infra-local/compose/__main__.py, apps/infra-local/Makefile
Adds __version__, argparse subcommand CLI (up, down, status, logs, restart, reset, nuke), InfraConfig/ensure_home_env bootstrapping, DoctorError handling, and sys.exit(main()) wiring. Makefile becomes thin aliases over python -m compose.
README and docs update
apps/infra-local/README.md, deleted: apps/infra-local/CONNECTIONS.md
Removes CONNECTIONS.md; rewrites README.md to cover the new compose-driven stack, updated L1/L2 component lists, fixed-port endpoint table, new command interface, refreshed troubleshooting, and updated Layout section.

SDK volume and port persistence integration test

Layer / File(s) Summary
test_volume_port_persistence: RW volume and host-port test
sdks/python/tests/test_volume_port_persistence.py
Adds TestVolumePortPersistence.test_rw_volume_persists_and_port_is_reachable with helpers _free_host_port, _serve_cmd, _get_when_ready; starts two sequential boxes sharing a host RW volume and verifies the marker file is reachable through the host-mapped port after restart.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • boxlite-ai/boxlite#595: Introduced the original boxlite_local-based stack orchestration that this PR entirely replaces with the new compose Python package.
  • boxlite-ai/boxlite#790: Touches the same apps/infra-local Makefile and BOXLITE_CLI/BOXLITE_HOME conventions that this PR removes and supersedes.
  • boxlite-ai/boxlite#789: Modifies apps/infra-local state-directory conventions and reset behavior that directly overlap with this PR's Makefile rewrite and <repo>/.apps-local/ standardization.

Suggested reviewers

  • law-chain-hot

🐇 A rabbit rewrites the stack with glee,
Shell scripts gone, pure Python sets things free!
make up now calls compose with flair,
L1 boxes hum, L2 floats in air.
The markers persist, the ports hold true —
Hop hop hooray, the infra's brand new! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main purpose of the changeset: refactoring infra-local to simplify and consolidate into a single Python package with approximately 50% reduction in code. It is concise and clearly communicates the primary change.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/infra-local-purge-dead-config

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread apps/infra-local/compose/native.py Fixed
Comment thread apps/infra-local/compose/native.py Fixed
Comment thread apps/infra-local/compose/native.py Fixed
Comment thread apps/infra-local/compose/native.py Fixed
Comment thread apps/infra-local/compose/native.py Fixed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sdks/python/tests/test_volume_port_persistence.py (1)

95-136: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guarantee temp-directory cleanup on all failure paths in Line [95]-Line [136].

shutil.rmtree(host_dir, ignore_errors=True) only runs in the second box’s finally; if an exception occurs earlier (for example during second box creation), the temp directory is leaked.

Suggested fix (outer cleanup finally)
 import http.client
+import contextlib
 import os
@@
     def test_rw_volume_persists_and_port_is_reachable(self, runtime):
         host_dir = tempfile.mkdtemp(prefix="bl_vol_port_")
         host_port = _free_host_port()
+        try:
+            # ── Box 1: write through the RW volume, then serve it on a mapped port ──
+            box = _box(write_marker=True)
+            try:
+                box.start()
+                assert _get_when_ready(host_port, "/marker.txt") == MARKER
+                with open(os.path.join(host_dir, "marker.txt")) as f:
+                    assert f.read() == MARKER, "RW volume did not write through to host"
+            finally:
+                with contextlib.suppress(Exception):
+                    box.stop()
 
-        # ── Box 2: a fresh box on the same host volume serves the persisted data ──
-        box2 = _box(write_marker=False)
-        try:
-            box2.start()
-            assert _get_when_ready(host_port, "/marker.txt") == MARKER, (
-                "volume data did not persist across a box restart"
-            )
-        finally:
-            box2.stop()
+            # ── Box 2: a fresh box on the same host volume serves the persisted data ──
+            box2 = _box(write_marker=False)
+            try:
+                box2.start()
+                assert _get_when_ready(host_port, "/marker.txt") == MARKER, (
+                    "volume data did not persist across a box restart"
+                )
+            finally:
+                with contextlib.suppress(Exception):
+                    box2.stop()
+        finally:
             shutil.rmtree(host_dir, ignore_errors=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdks/python/tests/test_volume_port_persistence.py` around lines 95 - 136, The
temporary directory cleanup with shutil.rmtree is only executed in the finally
block of the second box, which means if any exception occurs earlier in the test
(during first box operations, second box creation, or other assertions), the
temp directory leaks. Add an outer try/finally block that wraps all the test
logic after host_dir is created to guarantee that shutil.rmtree(host_dir,
ignore_errors=True) executes on all failure paths in the
test_rw_volume_persists_and_port_is_reachable method.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/infra-local/compose/native.py`:
- Around line 491-494: The seed function call at line 491 (and also at line 501)
does not check the return code, allowing the up command to exit successfully
even when seed initialization fails. Capture the return value from seed(cfg,
no_bounce=True) and check if it indicates failure; if it does, propagate that
failure by exiting the up command with an appropriate non-zero status code
instead of continuing to success.
- Around line 627-633: The hard-reset logic calls _psql and _migrate functions
that can fail without raising exceptions, yet continues to report success with
the ok() and warn() messages regardless of whether these operations actually
succeeded. Modify the code to capture and check the return codes (or exceptions)
from both the _psql call (which drops and recreates the schema) and the _migrate
call, and only proceed to the ok() success message if both operations complete
successfully. If either _psql or _migrate fails, the function should raise an
error or return a failure status to prevent silent partial failures. Apply this
pattern to all affected locations in the hard-reset path.
- Around line 289-299: In the start_component() function, when a component fails
to become healthy (the healthy variable is False), add cleanup code before
logging the error and returning False. Stop and kill the spawned process p and
remove its associated pidfile to prevent stale daemons and ensure consistent
retries. The cleanup should happen in the else branch where the timeout error is
logged, after the unhealthy condition is detected but before the return False
statement.
- Around line 591-607: The restart function ignores the return value from
start_component(p, table[name]) when restarting L2 components in the loop over
l2, which means the command returns success even if a component fails to become
healthy. Capture the return value from the start_component call within the for
loop iterating over l2, check if it indicates failure (non-zero), and if so,
return that failure code immediately instead of continuing and returning 0 at
the end of the function.

In `@apps/infra-local/README.md`:
- Line 101: The opening fence in the code block lacks a language identifier,
which triggers markdownlint MD040. Add "text" as the language identifier to the
opening triple backticks (change ``` to ```text) to properly tag the fenced code
block that displays the directory structure and file descriptions.

In `@sdks/python/tests/test_volume_port_persistence.py`:
- Around line 53-59: The _free_host_port() function at line 53-59 has a TOCTOU
race condition: it finds a free port and closes the socket, allowing another
process to claim the port before actual binding occurs. Implement retry logic to
handle this: when binding to the returned port fails in the code at line 97-111
(where the port is actually used), catch the bind error and call
_free_host_port() again to get a new port, then retry the binding. Repeat this
until binding succeeds. Alternatively, modify _free_host_port() to keep the
socket open and return both the socket and port number, passing the open socket
to the binding code to ensure the port stays reserved until the actual binding
occurs.

---

Outside diff comments:
In `@sdks/python/tests/test_volume_port_persistence.py`:
- Around line 95-136: The temporary directory cleanup with shutil.rmtree is only
executed in the finally block of the second box, which means if any exception
occurs earlier in the test (during first box operations, second box creation, or
other assertions), the temp directory leaks. Add an outer try/finally block that
wraps all the test logic after host_dir is created to guarantee that
shutil.rmtree(host_dir, ignore_errors=True) executes on all failure paths in the
test_rw_volume_persists_and_port_is_reachable method.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 52f2032e-e67c-440f-bde9-a5b1daf1403e

📥 Commits

Reviewing files that changed from the base of the PR and between 757692a and da00474.

📒 Files selected for processing (39)
  • apps/infra-local/CONNECTIONS.md
  • apps/infra-local/Makefile
  • apps/infra-local/README.md
  • apps/infra-local/api.env
  • apps/infra-local/boxlite_local/__init__.py
  • apps/infra-local/boxlite_local/__main__.py
  • apps/infra-local/boxlite_local/cli.py
  • apps/infra-local/boxlite_local/doctor.py
  • apps/infra-local/boxlite_local/execwrap.py
  • apps/infra-local/boxlite_local/types.py
  • apps/infra-local/compose/__init__.py
  • apps/infra-local/compose/__main__.py
  • apps/infra-local/compose/_sdk.py
  • apps/infra-local/compose/config.py
  • apps/infra-local/compose/doctor.py
  • apps/infra-local/compose/native.py
  • apps/infra-local/compose/orchestrator.py
  • apps/infra-local/compose/services.py
  • apps/infra-local/configs/minio/init.sh
  • apps/infra-local/pyproject.toml
  • apps/infra-local/scripts/_stack-common.sh
  • apps/infra-local/scripts/seed-init-data.sh
  • apps/infra-local/scripts/stack-build.sh
  • apps/infra-local/scripts/stack-down.sh
  • apps/infra-local/scripts/stack-logs.sh
  • apps/infra-local/scripts/stack-reset.sh
  • apps/infra-local/scripts/stack-restart.sh
  • apps/infra-local/scripts/stack-status.sh
  • apps/infra-local/scripts/stack-up.sh
  • apps/infra-local/tests/__init__.py
  • apps/infra-local/tests/integration/__init__.py
  • apps/infra-local/tests/integration/test_e2e_full.py
  • apps/infra-local/tests/integration/test_multi_service.py
  • apps/infra-local/tests/unit/__init__.py
  • apps/infra-local/tests/unit/test_config.py
  • apps/infra-local/tests/unit/test_doctor_lsof.py
  • apps/infra-local/tests/unit/test_orchestrator.py
  • apps/infra-local/tests/unit/test_topo.py
  • sdks/python/tests/test_volume_port_persistence.py
💤 Files with no reviewable changes (23)
  • apps/infra-local/CONNECTIONS.md
  • apps/infra-local/scripts/stack-restart.sh
  • apps/infra-local/boxlite_local/init.py
  • apps/infra-local/scripts/stack-build.sh
  • apps/infra-local/configs/minio/init.sh
  • apps/infra-local/scripts/_stack-common.sh
  • apps/infra-local/scripts/stack-up.sh
  • apps/infra-local/boxlite_local/execwrap.py
  • apps/infra-local/scripts/seed-init-data.sh
  • apps/infra-local/boxlite_local/types.py
  • apps/infra-local/scripts/stack-status.sh
  • apps/infra-local/tests/unit/test_topo.py
  • apps/infra-local/boxlite_local/main.py
  • apps/infra-local/scripts/stack-down.sh
  • apps/infra-local/tests/integration/test_multi_service.py
  • apps/infra-local/scripts/stack-reset.sh
  • apps/infra-local/scripts/stack-logs.sh
  • apps/infra-local/boxlite_local/doctor.py
  • apps/infra-local/tests/unit/test_doctor_lsof.py
  • apps/infra-local/tests/unit/test_config.py
  • apps/infra-local/tests/unit/test_orchestrator.py
  • apps/infra-local/tests/integration/test_e2e_full.py
  • apps/infra-local/boxlite_local/cli.py

Comment thread apps/infra-local/compose/native.py
Comment thread apps/infra-local/compose/native.py
Comment thread apps/infra-local/compose/native.py Outdated
Comment thread apps/infra-local/compose/native.py
Comment thread apps/infra-local/README.md Outdated
Comment thread sdks/python/tests/test_volume_port_persistence.py
CodeRabbit + CodeQL findings on the compose port — each verified against the code
before applying:

native.py:
- start_component now stops the failed component on health-timeout: it was
  leaving a stale daemon + pidfile that a retry would treat as "already running",
  skipping a broken component. [CodeRabbit Major]
- up and restart propagate failures into the exit code — they returned 0 even
  when a component never became healthy or a hard seed failure occurred.
  [CodeRabbit Major]
- reset --hard now checks the `DROP SCHEMA` psql + `_migrate` exit codes and
  aborts on failure (restores the deleted bash's `set -e` fail-fast; it was
  printing "complete" over a broken partial schema). `_migrate` returns its code.
  [CodeRabbit Major]
- logs() handles `os.execvp` failure (returns 1) and no longer falls through
  implicitly, satisfying its `-> int` contract. [CodeQL]
- explanatory comments on the intentional best-effort `except` blocks
  (_kill_port_listeners / _terminate_group / _seed_api_env / _ensure_installed).
  [CodeQL]

README: language tag on the layout fenced block (markdownlint MD040). [CodeRabbit]

sdks/python volume-port test: pick a fresh host port and retry the box start, to
close the `_free_host_port()` TOCTOU window (port released before the box binds
it). Re-verified live: `pytest -m integration` -> 1 passed.
Comment thread sdks/python/tests/test_volume_port_persistence.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/infra-local/compose/native.py (1)

688-690: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Unchecked restart() return value may lead to unnecessary 60s wait.

When seed() bounces the API via restart(cfg, ["api"]), the return value is ignored. If the API fails to restart (e.g., port conflict, binary crash), seed() will wait the full 60s before timing out at line 698 rather than failing fast.

Consider checking the return value and failing early:

Proposed fix
         if _component_pid(p, "api") is not None:
             log("restarting api so it re-runs its initialize* cycle...")
-            restart(cfg, ["api"])
+            if restart(cfg, ["api"]) != 0:
+                err("api failed to restart — cannot complete seed")
+                return 1
         else:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra-local/compose/native.py` around lines 688 - 690, The `restart(cfg,
["api"])` call on line 690 ignores its return value, which causes unnecessary
delays when the API fails to restart. Check the return value of the `restart()`
function call and add error handling to fail early if the restart operation
fails. This will prevent the code from waiting the full 60 seconds before timing
out at line 698 when the API encounters issues like port conflicts or crashes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@apps/infra-local/compose/native.py`:
- Around line 688-690: The `restart(cfg, ["api"])` call on line 690 ignores its
return value, which causes unnecessary delays when the API fails to restart.
Check the return value of the `restart()` function call and add error handling
to fail early if the restart operation fails. This will prevent the code from
waiting the full 60 seconds before timing out at line 698 when the API
encounters issues like port conflicts or crashes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6cffec55-a4d9-4ef3-979c-79033b6f8df2

📥 Commits

Reviewing files that changed from the base of the PR and between da00474 and 1c4458a.

📒 Files selected for processing (3)
  • apps/infra-local/README.md
  • apps/infra-local/compose/native.py
  • sdks/python/tests/test_volume_port_persistence.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/infra-local/README.md
  • sdks/python/tests/test_volume_port_persistence.py

DorianZheng and others added 2 commits June 16, 2026 10:47
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Signed-off-by: dorianzheng <8065637+DorianZheng@users.noreply.github.com>
Signed-off-by: dorianzheng <8065637+DorianZheng@users.noreply.github.com>
@DorianZheng DorianZheng merged commit cf69a05 into main Jun 16, 2026
14 of 16 checks passed
@DorianZheng DorianZheng deleted the chore/infra-local-purge-dead-config branch June 16, 2026 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant