Skip to content

fix(bridge): enforce exclusive socket ownership#187

Merged
steipete merged 2 commits into
mainfrom
codex/fix-bridge-socket-ownership
Jun 12, 2026
Merged

fix(bridge): enforce exclusive socket ownership#187
steipete merged 2 commits into
mainfrom
codex/fix-bridge-socket-ownership

Conversation

@steipete

@steipete steipete commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • give every Bridge listener exclusive lease-backed ownership and atomically publish/remove its UNIX socket
  • bound connect/read/write work with nonblocking deadlines and drain accepted clients before releasing ownership
  • separate Peekaboo.app (bridge.sock) from the reusable daemon (daemon.sock), migrate legacy daemons replacement-first while preserving lifecycle settings, and keep custom sockets isolated
  • preserve the default healthy Peekaboo.app GUI fallback before daemon auto-start while still migrating legacy on-demand daemons
  • stop MCP from hosting a Bridge listener and make daemon stop/migration PID-conditional
  • document the ownership and migration contracts and add regression coverage for stale sockets, contention, cleanup, draining, routing, and rollback

Root cause

The app, reusable daemon, and MCP process could contend for one socket without an ownership lease. A failed connection was enough to treat a path as stale, blocking Bridge I/O could strand accepted clients, and daemon migration could retire the serving process before a replacement was proven healthy. Together those paths allowed ungranted processes to replace the granted app host and left shutdown/startup races wedged.

Impact

Bridge hosts can no longer replace a live owner. The reusable daemon has its own socket, abandoned clients are deadline-bound, legacy migration preserves mode/poll/idle settings and keeps the old daemon serving if replacement fails, healthy Peekaboo.app GUI hosting remains the automatic default fallback, and embedded MCP remains process-local.

Validation

  • pnpm run format
  • pnpm run lint (0 serious findings)
  • pnpm run lint:docs
  • pnpm run test:safe (507 tests, 68 suites)
  • swift test --package-path Apps/CLI --filter CommandRuntimeInjectionTests (28 tests)
  • swift test --package-path Core/PeekabooCore --no-parallel (918 tests, 132 suites; expected automation skips)
  • pnpm run build:cli
  • ./scripts/build-mac-debug.sh
  • live daemon start/status/conditional-stop on an isolated socket
  • live second-host contention rejection while the first host remained usable
  • live healthy Peekaboo.app GUI fallback with the production-team-signed exact new CLI and no daemon socket created
  • live legacy bridge.sock to daemon.sock migration after the fallback refactor, preserving mode, 425 ms poll interval, and 47.5 s idle timeout
  • live replacement-start failure proving the legacy daemon remained serving
  • live custom-socket isolation with legacy and custom daemons concurrently healthy
  • live MCP stdio lifecycle proving no socket listener was published
  • live abandoned-client drain/restart: replacement waited 11 seconds for lease release, then started with a new PID
  • final autoreview: clean, no accepted/actionable findings (0.91 confidence)

Live Evidence

Healthy Peekaboo.app fallback, selected before daemon auto-start:

ASSERT signed_cli_team=Y5PE65HELJ app_pid=5184 explicit_ready=yes daemon_socket_exists=no
[2026-06-12T06:47:36.261Z] DEBUG: Runtime host: remote gui via /tmp/.../Library/Application Support/Peekaboo/bridge.sock (build 3.4.2 (1))
LIVE_APP_FALLBACK_OK

Legacy on-demand daemon still migrates after the fallback refactor:

ASSERT old_pid=5486 new_pid=5513 legacy_running=false mode=auto poll=425 idle=47.5
[2026-06-12T06:47:46.742Z] DEBUG: Runtime host: remote onDemand via /tmp/.../Library/Application Support/Peekaboo/daemon.sock (build 3.1.1 (3.1.1))
LIVE_MIGRATION_AFTER_FALLBACK_OK

A second listener is rejected while the owner remains usable:

Error: Bridge socket is already owned by another host
ASSERT secondary_rc=1 primary_pid=59301 still_running=true

An abandoned client delays replacement until the lease is released:

ASSERT old_pid=62962 new_pid=63179 elapsed=11s running=true socket=/tmp/.../daemon.sock

Embedded MCP publishes no Bridge socket:

ASSERT stdout_bytes=0 stderr_bytes=0 sockets=0

Fixes #184

@clawsweeper

clawsweeper Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codex review: needs maintainer review before merge. Reviewed June 12, 2026, 3:10 AM ET / 07:10 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model internal, reasoning high; reviewed against b873daf790ed.

Label changes

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

What I checked:

  • failure reason: retryable codex transport failure (capacity)
  • codex failure detail: Codex review failed for this PR with exit 1.
  • codex stderr: U-spin here looks like the same family)\n- Screen Recording permission check fails on macOS 26 when spawned by Node.js #75 — Screen Recording permission check failures when spawned by another process (different mechanism; here the grant is intact and the socket ownership is the problem)\n\nHappy to provide more sanitized logs, lsof snapshots of socket ownership during a wedge, or to test patched builds.".
  • codex stdout: No stdout captured.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 12, 2026

@steipete steipete left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the valid default-app fallback finding in b1c37377:

  • runtime selection now prefers the reusable daemon, then a healthy .gui Peekaboo.app on the default bridge.sock, before daemon auto-start
  • legacy .onDemand hosts still migrate replacement-first to daemon.sock
  • custom daemon sockets remain isolated and do not inherit the app fallback
  • diagnostics and permission probing use the same candidate classification
  • added focused regression tests and raw live outputs to the PR body

The changelog entries remain intentionally: this is user-visible behavior, and repository maintainer policy requires the maintainer merge entry plus contributor credit.

@clawsweeper re-review

@steipete

Copy link
Copy Markdown
Collaborator Author

Addressed the valid default-app fallback finding in b1c37377: daemon-first selection now preserves a healthy .gui Peekaboo.app fallback, legacy .onDemand hosts still migrate, custom sockets remain isolated, and diagnostics/permission probing share the same policy. Focused tests and raw live outputs are in the PR body. Changelog entries remain intentionally under the repository's maintainer policy for user-visible fixes and contributor credit.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 12, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 12, 2026
@steipete

Copy link
Copy Markdown
Collaborator Author

Retrying after the targeted review failed with a retryable Codex capacity/transport error. The exact-head local autoreview completed cleanly with no actionable findings at 0.91 confidence, and the PR body contains the live evidence requested by the prior review.

@clawsweeper re-review

@steipete

Copy link
Copy Markdown
Collaborator Author

Maintainer review note: two targeted ClawSweeper re-review attempts on b1c37377 failed before code review with the same retryable Codex capacity/transport error. They produced no code finding. The exact-head local autoreview completed successfully with no actionable findings (0.91 confidence), the prior valid fallback finding is fixed and live-tested, and the requested raw evidence is in the PR body. I will land only after all required CI jobs are green.

@steipete steipete merged commit d50472e into main Jun 12, 2026
5 checks passed
@steipete steipete deleted the codex/fix-bridge-socket-ownership branch June 12, 2026 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item.

Projects

None yet

1 participant