feat(agent2agent): add /a2a slash command + self-respawning watch chain by elijahr · Pull Request #287 · axiomantic/spellbook

elijahr · 2026-05-08T09:35:00Z

What does this PR do?

Replaces the prior LLM-side polling loop in agent2agent (which burned ~72k tokens in 7 minutes of idle waiting) with a self-respawning background Task agent that delivers messages to idle sessions with zero new tokens during the idle window. Adds a /a2a slash command with 8 user-facing subcommands and a 6-phase orchestration of the watch chain.

Related issue

Checklist

Tests pass locally
Documentation updated (if applicable)

What this ships

Self-respawning bg Task agent loop (watch + chain pattern). The bg Task blocks on a single Bash invocation; fswatch wakes it on inbox changes; the parent re-arms silently after each cycle.
Renamed helper subcommands: listen -> open, unlisten -> close (breaking for any pre-T1 caller; failure mode is a clear argparse error).
New helper subcommands: drain (atomic batch claim from pending/ -> inbox), watch (lockfile + RECOVER + WATCH_RECYCLE + fswatch/poll), _open_state (slash-internal write/clear/read/alive).
Hook backstop (_bg_agent_alive) for orphaned-chain detection in SessionStart + UserPromptSubmit hooks. FAIL-SAFE-DEAD with a 90s mtime window.
/a2a slash command with 8 user-facing subcommands (open, close, send, check, read, peek, names, bound-name) + 6-phase orchestration of the watch chain.
SKILL.md architecture documentation, including the silent-idle cost model.

Why

The prior agent2agent iteration consumed ~72k tokens in 7 minutes of LLM-side polling. The new watch chain delivers messages to idle sessions with zero new tokens during the idle window (validated by T7 manual e2e A9 ship gate over 540s). The bg Task agent blocks on a single Bash invocation; fswatch wakes it on inbox changes; parent re-arms silently each cycle.

Validation

113 automated tests pass in the targeted suite — helper unit tests, hook tests, slash command structural tests, and concurrency invariant tests with disjoint+union assertions on atomic batch claim.
T7 manual e2e (separate session exercising the live chain): A9 ship gate PASSED — 5-minute idle window in tester session showed 0 new tokens, 0 tool uses, 0 transcript lines. All 6 scenarios passed: normal delivery, concurrent senders, idle silence, orphan detection, idempotent close, lockfile mutex.

Known limitations

A9 idle-token gate is manual-only. Automated coverage of a 540s idle path is structurally infeasible inside a 30s pytest timeout. Future: CI smoke that runs watch --max-elapsed=2 and asserts a single WATCH_RECYCLE elapsed=2s line.
Phase D verbatim prompt LLM-honor is empirical. Tests assert byte-for-byte match between commands/a2a.md Phase D and a fixture, but cannot verify that the bg Task agent's LLM actually executes the prompt as written. T7 confirmed it does on the tester's harness.
Re-arm bookkeeping is a prose contract in commands/a2a.md Phase F. The orchestrator must call _open_state write with the new bg agent's id every cycle. The hook orphan-detection backstop catches the failure mode (false-positive orphan after 90s) but does not prevent it.
Lockfile mutex on networked / cloud-synced filesystems (NFS, SMB, iCloud) may not honor flock semantics. No detection or warning today; default ~/.local/share/agent2agent/ is local. Future: detect and warn.
AGENT2AGENT_DIR env propagation to bg Task when the harness runs the bg Task in a clean shell is unvalidated for non-default bus paths.
Rename is breaking for any pre-T1 caller of listen / unlisten (now open / close). Failure mode is graceful (clear argparse error). Acceptable for an internal tool.

Implementation notes

fswatch -0 -l 0.1 inbox/ (long-running stream, NUL-delimited, 100ms event-coalescing latency).
Polling backstop default 500ms; fallback when fswatch is missing.
WATCH_RECYCLE default 540s (forced by Bash 600000ms ceiling — recycle every 9 min to avoid tool-timeout).
Lockfile inbox/.watcher.lock path persists across cycles; mutex enforced by fcntl.flock(LOCK_EX|LOCK_NB) + kernel fd cleanup.
_open_state record at <bus>/.open/<session-id> (no .json extension).
12 commits, zero new third-party imports, stdlib only.

Hard rename of helper subcommands cmd_listen->cmd_open and cmd_unlisten->cmd_close in preparation for the watch-chain feature. No behavior change. Test names, SKILL.md prose, and the hook test helper updated to match. The user-vocabulary trigger phrase "listen for messages" is preserved in slash command frontmatter for muscle memory.

Three comments missed in 6366183 because the original rename grep gate only checked skills/ and tests/, not hooks/. The hook's section header and inline comment referenced `listen <name>` for the bind-via-helper contract, and one test docstring still said "never listened". No behavior change.

cmd_open now creates pending/ alongside inbox/processed/sent/ so the forthcoming watch-chain can stage messages there briefly between hook-receive and parent-drain without duplicating delivery. The new cmd_drain subcommand atomically moves a batch (or all batches, or the oldest batch) from pending/<batch-id>/ to processed/ via os.replace and emits the bodies as JSON. Malformed JSON files are still moved to processed/ but reported with an error/raw_path entry so they cannot poison subsequent drains. Drain is idempotent: missing pending dir, missing batch, and empty batch all return count=0 exit 0.

Add the cmd_watch subcommand scoped to lockfile + RECOVER + WATCH_RECYCLE behavior only. Acquires inbox/.watcher.lock via fcntl.flock so only one watcher per name runs at a time; emits PENDING_BATCH for any pre-existing non-empty pending/<batch>/ on entry; otherwise sleeps in poll-interval increments until --max-elapsed and exits with WATCH_RECYCLE. T3b will add fswatch wakeup, 500ms polling backstop, and atomic batch claim.

- C1: drop unlink in _cleanup (flock+unlink race; kernel fd-cleanup is the actual mutex source). Tests updated to assert acquirability, not path absence. - I1: defensive try/except FileNotFoundError around RECOVER inner iterdir for concurrent drain. - I2: new test_watch_kill_releases_lockfile_via_kernel exercises kernel auto-release via SIGKILL. - I3: replace lockfile.exists() readiness with pid-content readiness; closes a small TOCTOU in the concurrent-attempt test.

T3a left WAIT as a budget-only sleep loop; this completes T3b by spawning fswatch (when available) for low-latency wake, running a poll_interval backstop in the same loop so a missing or wedged fswatch cannot stall delivery, and atomically claiming up to --max-batch inbox files into pending/<batch-id>/ before emitting PENDING_BATCH and exiting 0. Spurious fswatch wakes (filtered dotfiles) and concurrent-reader-wins-all races (every candidate vanishes mid-claim) re-enter WAIT without emitting a zero-count batch. fswatch subprocess is reaped via terminate->wait->kill in a finally clause on every exit path. Adds 5 subprocess-driven tests (2 fswatch-gated via shutil.which); 48/48 helper tests pass.

Adds the slash-command-internal `_open_state {write|clear|read|alive}` subcommand that persists per-session open-state at `<bus>/.open/<sid>` (name, agent_id, started_at, output_file). `alive` mirrors the hook's `_bg_agent_alive` mtime probe (0 = fresh, 1 = stale/missing transcript, 2 = FAIL-SAFE-DEAD when state file is missing or malformed) so the slash command and the orphan-detection hook agree on liveness.

Add _bg_agent_alive and _agent2agent_check_orphaned_chain to spellbook_hook.py. Liveness mirrors cmd__open_state alive byte-for-byte (FAIL-SAFE-DEAD: missing/stale transcript or absent state -> dead). The SessionStart handler now runs the orphan check BEFORE the source != "compact" early return so cold starts and resumes also surface the re-arm hint; the compact-source path appends the hint after the existing recovery directive. UserPromptSubmit appends the hint after the notify line. Hook reads only .open/<sid> JSON metadata and stats the transcript path; it never touches inbox bodies.

Adds commands/a2a.md, the operator-facing slash interface to the agent2agent inter-session message bus. The command mirrors helper subcommands (check, send, peek, read, names, bound-name) and orchestrates the watch-chain (open, close): open binds an inbox name and dispatches a backgrounded Task agent running the helper's watch subcommand with the load-bearing Phase D prompt template verbatim, persists chain state via _open_state write, and silently re-dispatches on every completion (PENDING_BATCH or WATCH_RECYCLE). close is idempotent. Adds smoke tests under tests/test_skills/ asserting frontmatter parses, all six phases (A-F) are present, the verbatim Phase D prompt matches byte-for-byte, and every public subcommand has its own section.

- watch: print holder's pid (read from lockfile) instead of os.getpid() of the losing watcher. The holder writes its pid AFTER flock; reading it gives an operator a real kill target. Falls back to "unknown" on read failure or empty lockfile (race window between holder's truncate and write). - close: differentiate stdout — emit "closed '<name>'" only when an inbox or session binding was actually released; emit "not bound to '<name>'" on no-op. Both branches exit 0 (idempotent contract intact). - commands/a2a.md: document both close-stdout lines. - tests: strengthen test_watch_concurrent_attempt_blocked_by_lockfile to assert reported pid == holder pid (regression guard). Add test_close_idempotent_reports_not_bound and test_close_unknown_name_reports_not_bound. Both bugs surfaced during T7 manual e2e testing.

…iliation SKILL.md gains an "Architecture: watch chain vs hook-receive" section, a "Watch-Chain (Idle Delivery)" section explaining the slash command, the fswatch dependency, the compaction limitation and the open-state record, plus a Silent-Idle Cost Model subsection with per-cycle/hour/day token estimates and a `/a2a close` recommendation for overnight idle. Quick Reference adds protocol-internal `watch`, `drain`, and `_open_state` rows. Phase D portability: replace the hardcoded developer-machine path in commands/a2a.md with `$SPELLBOOK_DIR/...` so the bg Task agent's Bash command resolves correctly for any operator (the harness substitutes per ~/.claude/CLAUDE.md). Test constant + impl plan §Task 6 prompt template updated to match. Lockfile-deletion errata reconciled in design §3.1 and impl plan §Task 3a tests: the lockfile path persists post-exit; mutual exclusion is via `fcntl.flock` + kernel fd cleanup, not file deletion. The `_bg_agent_alive` docstring (and helper / plan / test mirror) drop the misleading "byte-for-byte parity" claim; the hook returns `bool` while the helper CLI op returns exit codes 0/1/2. Both remain FAIL-SAFE-DEAD. Plan §Task 4 pseudocode notes that the `open-state written/cleared` print lines are suppressed in the final implementation (silence preferred for slash dispatch). Frontmatter convention verified: `description`-only matches mode.md / handoff.md. Tests: +3 SKILL.md structural smoke tests (test_skill_md_documents_watch_chain, test_skill_md_silent_idle_cost_model_cites_token_numbers, test_skill_md_protocol_internal_subcommands_marked) and +1 docstring pin (TestBgAgentAlive::test_docstring_does_not_overclaim_byte_for_byte_parity). test_a2a_phase_d_prompt_is_verbatim updated to match the portable template. 112 tests pass (108 baseline + 4 new).

…overclaim - SKILL.md: fswatch -1 -> fswatch -0 -l 0.1 (actual long-running invocation, not one-shot) - SKILL.md: _open_state shared liveness contract framing - helper for slash, hook inline for perf/reliability (no shell-out) - New pin test guards against future fswatch flag drift between helper source and SKILL.md prose

axiomantic-momus · 2026-05-08T09:35:44Z

❌ Momus review failed — phase phase2 failed: provider error: 402 This request requires more credits, or fewer max_tokens. You requested up to 8192 tokens, but can only afford 7532. To increase, visit https://openrouter.ai/settings/keys and create a key with a h...

run log

gemini-code-assist

Code Review

This pull request implements a 'watch chain' architecture for the agent2agent skill, facilitating idle message delivery via a background task. Significant changes include the /a2a slash command specification, new watch and drain subcommands in the helper script, and hook-level logic to detect orphaned watch chains. The review feedback correctly identifies an opportunity to refactor the _handle_session_start hook to consolidate duplicated fallback logic. However, the PR fails to include the mandatory version bump and changelog entry required by the repository style guide.

gemini-code-assist · 2026-05-08T09:39:24Z

    project_path = data.get("cwd", "")
    if not project_path:
-        return _fallback_directive()
+        result = _fallback_directive()
+        if orphan_hint:
+            result["hookSpecificOutput"]["additionalContext"] += (
+                "\n\n" + orphan_hint
+            )
+        return result

    # Load workflow state
    ws = _mcp_call("workflow_state_load", {
        "project_path": project_path,
        "max_age_hours": 24,
    })
    if not ws or not ws.get("found"):
-        return _fallback_directive()
+        result = _fallback_directive()
+        if orphan_hint:
+            result["hookSpecificOutput"]["additionalContext"] += (
+                "\n\n" + orphan_hint
+            )
+        return result

    state = ws.get("state", {})


To improve maintainability and reduce code duplication, you can refactor this block to consolidate the fallback logic. The current implementation repeats the code for appending orphan_hint to the fallback directive.

project_path = data.get("cwd", "") ws = None if project_path: # Load workflow state ws = _mcp_call("workflow_state_load", { "project_path": project_path, "max_age_hours": 24, }) if not ws or not ws.get("found"): result = _fallback_directive() if orphan_hint: result["hookSpecificOutput"]["additionalContext"] += ( "\n\n" + orphan_hint ) return result state = ws.get("state", {})

Two PR-#287 follow-ups bundled: 1. hooks/spellbook_hook.py _handle_session_start: collapse the duplicate "_fallback_directive() + orphan_hint" branches into a single guard. Per gemini-code-assist review feedback. Pure refactor; no behavior change. 2. tests/unit/test_stint_hooks.py: gate-error JSON now lives on stderr per Claude Code hook protocol (commit 324cab5 on main routed gate block messages to sys.stderr but only updated tests/test_security/* — this parallel test in tests/unit/ was missed). Read proc.stderr to match.

elijahr · 2026-05-08T14:49:27Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the /a2a slash command and a background "watch chain" architecture for the agent-to-agent message bus, facilitating near real-time message delivery while sessions are idle. The implementation includes new helper subcommands for watching and draining inboxes, as well as hook-level backstops to detect and recover from dropped watch chains. Feedback primarily addresses high-severity style guide violations: the PR lacks the mandatory minor version bump and CHANGELOG entry, and several tests must be refactored to use the tripwire mocking framework instead of monkeypatch.setattr for function replacement.

gemini-code-assist · 2026-05-08T14:52:52Z

+---
+description: "agent2agent inter-session message bus. Triggers: '/a2a', 'open inbox', 'close inbox', 'listen for messages', 'listen as', 'send a message to session', 'check inbox', 'reply to that session', 'inter-agent chat', 'inter-agent messaging', 'agent2agent', 'a2a', 'agent bus', 'message another session', 'tell session Y to', 'ask session Y'. Subcommands: open, close, send, check, read, peek, names, bound-name."


This PR introduces a significant new feature (the /a2a slash command and background watch chain) but is missing the required version bump in the .version file and a corresponding entry in CHANGELOG.md. Per the repository style guide (lines 7-11, 18), new features require a minor version bump (0.X.0). Please add these to comply with the project's release process.

References

Every PR must include a version bump in the .version file and a corresponding CHANGELOG.md entry. Missing these is a high severity issue. ^(link)

gemini-code-assist · 2026-05-08T14:52:52Z

+        # Force MCP failure so we hit the _fallback_directive branch.
+        # _mcp_call returns None when MCP is unreachable.
+        monkeypatch.setattr(spellbook_hook, "_mcp_call", lambda *a, **k: None)
+
+        missing_transcript = tmp_path / "dead.output"
+        _seed_open_state(
+            tmp_path, "sess-orphan-compact", name="alice",
+            agent_id="agent-x", output_file=missing_transcript,
+        )
+        result = spellbook_hook._handle_session_start({
+            "session_id": "sess-orphan-compact",
+            "source": "compact",
+            "cwd": str(tmp_path),
+        })


This test uses monkeypatch.setattr to mock _mcp_call, which is forbidden by the repository style guide (lines 67-71, 109). Tripwire is the only acceptable framework for mocking functions in this repository. Please rewrite this using tripwire's three-step flow: register the mock, execute within the with tripwire: sandbox, and assert the call afterwards.

m = tripwire.mock("spellbook_hook:_mcp_call") m.returns(None) missing_transcript = tmp_path / "dead.output" _seed_open_state( tmp_path, "sess-orphan-compact", name="alice", agent_id="agent-x", output_file=missing_transcript, ) with tripwire: result = spellbook_hook._handle_session_start({ "session_id": "sess-orphan-compact", "source": "compact", "cwd": str(tmp_path), }) m.assert_call()

References

tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. ^(link)

gemini-code-assist · 2026-05-08T14:52:53Z

+        monkeypatch.setattr(
+            spellbook_hook, "_memory_recall_for_prompt", lambda *a, **k: None
+        )
+        monkeypatch.setattr(
+            spellbook_hook, "_agent2agent_notify_for_prompt", lambda *a, **k: None
+        )
+        monkeypatch.setattr(
+            spellbook_hook, "_memory_autostore_for_prompt", lambda *a, **k: None
+        )
+
+        outputs = spellbook_hook._handle_user_prompt_submit({
+            "session_id": "sess-ups-orphan",
+            "prompt": "hello",
+            "cwd": str(tmp_path),
+        })


Multiple uses of monkeypatch.setattr for mocking are present here, violating the repository style guide (lines 67-71, 109). These should be converted to use tripwire.mock with the standard three-step flow.

m1 = tripwire.mock("spellbook_hook:_memory_recall_for_prompt") m1.returns(None) m2 = tripwire.mock("spellbook_hook:_agent2agent_notify_for_prompt") m2.returns(None) m3 = tripwire.mock("spellbook_hook:_memory_autostore_for_prompt") m3.returns(None) with tripwire: outputs = spellbook_hook._handle_user_prompt_submit({ "session_id": "sess-ups-orphan", "prompt": "hello", "cwd": str(tmp_path), }) m1.assert_call() m2.assert_call() m3.assert_call()

References

tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. ^(link)

gemini-code-assist · 2026-05-08T14:52:53Z

+        monkeypatch.setattr(
+            spellbook_hook, "_memory_recall_for_prompt", lambda *a, **k: None
+        )
+        monkeypatch.setattr(
+            spellbook_hook, "_agent2agent_notify_for_prompt", lambda *a, **k: None
+        )
+        monkeypatch.setattr(
+            spellbook_hook, "_memory_autostore_for_prompt", lambda *a, **k: None
+        )
+
+        outputs = spellbook_hook._handle_user_prompt_submit({
+            "session_id": "sess-ups-clean",
+            "prompt": "hello",
+            "cwd": str(tmp_path),
+        })


Another set of monkeypatch.setattr violations. Please convert to tripwire.mock.

Suggested change

monkeypatch.setattr(

spellbook_hook, "_memory_recall_for_prompt", lambda *a, **k: None

)

monkeypatch.setattr(

spellbook_hook, "_agent2agent_notify_for_prompt", lambda *a, **k: None

)

monkeypatch.setattr(

spellbook_hook, "_memory_autostore_for_prompt", lambda *a, **k: None

)

outputs = spellbook_hook._handle_user_prompt_submit({

"session_id": "sess-ups-clean",

"prompt": "hello",

"cwd": str(tmp_path),

})

m1 = tripwire.mock("spellbook_hook:_memory_recall_for_prompt")

m1.returns(None)

m2 = tripwire.mock("spellbook_hook:_agent2agent_notify_for_prompt")

m2.returns(None)

m3 = tripwire.mock("spellbook_hook:_memory_autostore_for_prompt")

m3.returns(None)

with tripwire:

outputs = spellbook_hook._handle_user_prompt_submit({

"session_id": "sess-ups-clean",

"prompt": "hello",

"cwd": str(tmp_path),

})

m1.assert_call()

m2.assert_call()

m3.assert_call()

References

tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. ^(link)

gemini-code-assist · 2026-05-08T14:52:53Z

+from datetime import datetime, timedelta, timezone
 from pathlib import Path

 import pytest


The repository style guide (lines 27-31) requires using tripwire as the only acceptable mocking framework. Please add the import here so it can be used in the tests below.

Suggested change

import pytest

import pytest

import tripwire

References

tripwire is the ONLY acceptable framework for mocking. ^(link)

gemini-code-assist · 2026-05-08T14:52:53Z

+    monkeypatch.setattr(a2a.os, "replace", flaky_replace)
+
+    with pytest.raises(OSError):
+        _run(a2a, "drain", "alice", "batch-x")


This test uses monkeypatch.setattr to replace a function (os.replace), which is forbidden by the repository style guide (lines 67-71, 109). Please use tripwire.mock with the three-step flow.

Suggested change

monkeypatch.setattr(a2a.os, "replace", flaky_replace)

with pytest.raises(OSError):

_run(a2a, "drain", "alice", "batch-x")

m = tripwire.mock("os:replace")

m.calls(flaky_replace)

with tripwire:

with pytest.raises(OSError):

_run(a2a, "drain", "alice", "batch-x")

m.assert_call()

References

tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. ^(link)

gemini-code-assist · 2026-05-08T14:52:53Z

+    assert pending_files.isdisjoint(processed_files)
+
+    # Restore real os.replace and re-run; remaining messages must drain cleanly.
+    monkeypatch.setattr(a2a.os, "replace", real_replace)


Manual restoration of the original function is not required when using tripwire, as the mock is automatically scoped to the with tripwire: block.

Suggested change

monkeypatch.setattr(a2a.os, "replace", real_replace)

# tripwire handles restoration automatically

References

tripwire is the ONLY acceptable framework for mocking. ^(link)

…+ version) Bundled fixes from gemini-code-assist's second review pass: - tests/test_skills/test_agent2agent_helper.py: skip the entire module on Windows (`pytest.skip(..., allow_module_level=True)`) so collection doesn't fail with `ModuleNotFoundError: fcntl`. The agent2agent helper uses `fcntl.flock` for the watch lockfile and is fundamentally POSIX-only. - tests/test_hooks/test_agent2agent_hook.py + tests/test_skills/ test_agent2agent_helper.py: convert all `monkeypatch.setattr` function replacements to `tripwire.mock(...)` per the repository style guide (the styleguide forbids monkeypatch.setattr for function/method replacement). Uses `dirty_equals.IsInstance` matchers for type-tolerant assertions. - .version: 0.63.2 -> 0.64.0 (minor bump per styleguide rule that new features / new commands warrant a minor bump). - CHANGELOG.md: add a 0.64.0 entry describing the watch-chain feature (helper subcommands, /a2a slash command, hook backstop, lockfile mutex, A9 zero-idle-tokens validation), the orphan-hint refactor, the test_stint_hooks stderr fix, and the tripwire conversion.

elijahr · 2026-05-08T15:21:32Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an 'Agent2Agent watch-chain' architecture to enable near-real-time message delivery for idle sessions, replacing the previous polling-based approach. The changes include a new /a2a slash command, internal helper subcommands for managing the watch chain, and hook-based orphan detection to handle process death or compaction. The PR also updates documentation, refactors test mocks to use the required tripwire framework, and bumps the version to 0.64.0. I have reviewed the changes and agree with the reviewer's feedback regarding the need to guard fcntl and signal imports to ensure Windows compatibility.

gemini-code-assist · 2026-05-08T15:27:19Z

+import fcntl
 import json
 import os
 import re
+import select
 import shutil
+import signal


The fcntl module and signal.SIGTERM are not available on Windows. Importing fcntl at the top level will cause the script to crash on Windows, even if the functionality is not used. Please guard these imports and usages to ensure Windows compatibility.

The integration TestAgent2AgentHook class spawns the agent2agent helper as a subprocess, and `test_hook_helper_constants_in_sync` imports the helper module. Both fail on Windows because the helper imports `fcntl` at module level (POSIX-only). The fast unit tests (TestAgent2AgentHookFast) remain unskipped — they stub the subprocess call so they don't need the helper to actually run.

Per gemini-code-assist's CRITICAL finding: importing fcntl at the top level crashes on Windows even when the watch subcommand isn't invoked. Wrap the import in a try/except with a fallback to None so the helper loads cleanly on Windows; cmd_watch performs an explicit runtime check and returns a clear error message ("watch: not supported on this platform") rather than NameError'ing on the first fcntl.flock call. Other subcommands (open, close, send, read, peek, drain, etc.) are filesystem-only and now work on Windows. The watch subcommand and the test_skills/test_agent2agent_helper.py module remain POSIX-only. signal.SIGTERM is available on Windows (verified in CPython 3.9+) so no guard is needed there.

elijahr · 2026-05-08T15:34:05Z

/gemini review

gemini-code-assist

Code Review

This pull request implements the Agent2Agent watch-chain architecture, enabling near-real-time message delivery for idle sessions with zero token cost. It introduces the /a2a slash command for orchestration, adds background file-watching capabilities to the agent2agent.py helper, and implements hook-based liveness checks to detect orphaned chains. Feedback focuses on correcting a non-executable Bash command in the slash command specification and increasing the liveness threshold from 90 seconds to 600 seconds to prevent false-positive "dropped chain" detections during the 540-second idle watch window.

gemini-code-assist · 2026-05-08T15:39:22Z

+   (skip any that produce empty stdout):
+   - `basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)"` — project basename
+   - `git branch --show-current 2>/dev/null` — current branch (skip if detached)
+   - `stint_check().top.name` — top stint name from MCP (skip if empty)


The entry stint_check().top.name is not a valid Bash command, which breaks the consistency of this list of one-liners. This should be described as a tool call (e.g., "Call the stint_check tool and extract the top stint name") or replaced with a proper shell command if one is available via the spellbook CLI.

gemini-code-assist · 2026-05-08T15:39:22Z

+        age = time.time() - op.stat().st_mtime
+    except OSError:
+        return False
+    return age < 90.0


The 90-second liveness threshold is likely too short and will cause false-positive orphan detections during idle periods. The background watch agent can be silent for up to 540 seconds (the WATCH_RECYCLE budget) while waiting for messages. Since the transcript file is only updated when the process produces output, its mtime will not change during this window. To avoid false positives, this threshold should be increased to exceed the maximum watch duration (e.g., 600 seconds).

Suggested change

return age < 90.0

return age < 600.0

gemini-code-assist · 2026-05-08T15:39:22Z

+            age = time.time() - op_path.stat().st_mtime
+        except OSError:
+            return 1
+        return 0 if age < 90.0 else 1


The 90-second liveness threshold here should be synchronized with the one in the hook and increased to avoid false positives during the 540-second idle watch window.

Suggested change

return 0 if age < 90.0 else 1

return 0 if age < 600.0 else 1

… in /a2a Two gemini-flagged issues from PR-#287 review #4: 1. CORRECTNESS BUG (medium-priority): the 90s mtime liveness threshold in `_bg_agent_alive` and the parallel `cmd__open_state alive` exit-code probe is shorter than the watch agent's 540s WATCH_RECYCLE budget. While blocked on `watch`, the bg Task agent emits no stdout, so its transcript mtime does not advance during a normal idle window — the A9 zero-idle-tokens validation specifically confirms this. The 90s threshold would therefore false-positive DEAD anywhere from 90s into a normal idle window onward. Bump to 600s (60s grace margin over 540s for the inter-cycle re-dispatch). Tests adjusted accordingly: - test_returns_false_when_transcript_stale: stale_age 120s → 700s - test_open_state_alive_stale_transcript_returns_1: 120s → 700s - docstrings + SKILL.md prose updated 90s → 600s 2. DOC FIX (medium-priority): `commands/a2a.md` Phase B-1 listed `stint_check().top.name` alongside actual Bash one-liners. That pseudocode isn't executable; reword the bullet so it's clearly an MCP tool call, not a shell command, and present the bash + non-bash bullets uniformly.

elijahr · 2026-05-08T15:46:48Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a near-real-time "watch-chain" architecture for the agent2agent skill, replacing the previous polling-based approach with an OS-level file-watching system to enable idle delivery with zero token cost. The changes include a new /a2a slash command, specialized watch and drain subcommands, and hook-based liveness probes to detect orphaned background agents. Feedback identifies several race conditions in the file system logic within agent2agent.py, recommending additional try/except blocks to handle FileNotFoundError during concurrent directory iterations and ValueError when accessing process pipes during termination.

gemini-code-assist · 2026-05-08T15:52:05Z

+        target_dirs = sorted(
+            (d for d in pending_root.iterdir() if d.is_dir()),
+            key=lambda p: p.name,
+        )
+    else:
+        candidates = sorted(
+            (d for d in pending_root.iterdir() if d.is_dir()),
+            key=lambda p: p.name,
+        )
+        target_dirs = [candidates[0]] if candidates else []


The iterdir() calls in both the all and default branches are not protected against the pending_root directory being deleted concurrently. This can lead to an unhandled FileNotFoundError. Consider wrapping these in try/except blocks, similar to the implementation in cmd_watch.

References

Silent except Exception: pass blocks should log the exception, but specific expected errors like FileNotFoundError in race-prone IO should be handled gracefully. ^(link)

gemini-code-assist · 2026-05-08T15:52:05Z

+    for d in target_dirs:
+        messages = sorted(
+            (m for m in d.iterdir() if not m.name.startswith(".")),
+            key=lambda p: p.name,
+        )


Iterating over a batch directory that might have been removed (e.g., by a concurrent drain --all from another session) will raise FileNotFoundError. This loop should be wrapped in a try/except block to skip missing directories gracefully.

gemini-code-assist · 2026-05-08T15:52:06Z

+                    try:
+                        os.read(fswatch_proc.stdout.fileno(), 4096)
+                    except OSError:
+                        pass


While OSError is caught, fswatch_proc.stdout.fileno() can raise a ValueError if the file object has been closed (e.g., if the process exited and was reaped just before this call). Adding ValueError to the caught exceptions would prevent a potential crash in the watch loop.

References

Silent except blocks should log, but here we are intentionally draining a pipe where errors are expected during process termination. ^(link)

Three medium-priority defensive-coding fixes from gemini's PR-#287 review #5: 1. cmd_drain pending_root.iterdir(): catch FileNotFoundError (the dir can vanish if another session calls close() concurrently). Treat as "no pending batches". 2. cmd_drain per-batch d.iterdir(): catch FileNotFoundError (a parallel drain --all from another session can remove a batch dir mid-iteration). Skip the missing batch and continue. 3. cmd_watch fswatch stdout.fileno() drain: also catch ValueError. A racing reaper can close the pipe between poll() and fileno(); the resulting ValueError used to crash the watch loop. The drain refactor consolidates the three iterdir() callsites (default, --all, --batch-id) into a single try-protected block whose result is filtered to the requested mode.

elijahr · 2026-05-08T15:58:08Z

/gemini review

gemini-code-assist

Code Review

This pull request implements the Agent2Agent watch-chain architecture, providing near-real-time message delivery for idle sessions without polling. It introduces the /a2a slash command, adds watch, drain, and _open_state subcommands to the helper script, and integrates liveness and orphan-chain detection into the spellbook hooks. Documentation and tests have been updated to reflect the transition from listen/unlisten to open/close terminology, and the version is bumped to 0.64.0. I have no feedback to provide.

elijahr added 12 commits May 7, 2026 19:30

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

elijahr added 2 commits May 8, 2026 09:46

Merge remote-tracking branch 'origin/main' into feat/a2a-watch-chain

4950086

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

elijahr added 2 commits May 8, 2026 10:32

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

elijahr merged commit d9e2a96 into main May 8, 2026
8 checks passed

elijahr deleted the feat/a2a-watch-chain branch May 26, 2026 07:30

		---
		description: "agent2agent inter-session message bus. Triggers: '/a2a', 'open inbox', 'close inbox', 'listen for messages', 'listen as', 'send a message to session', 'check inbox', 'reply to that session', 'inter-agent chat', 'inter-agent messaging', 'agent2agent', 'a2a', 'agent bus', 'message another session', 'tell session Y to', 'ask session Y'. Subcommands: open, close, send, check, read, peek, names, bound-name."

	monkeypatch.setattr(a2a.os, "replace", real_replace)
	# tripwire handles restoration automatically

Conversation

elijahr commented May 8, 2026

What does this PR do?

Related issue

Checklist

What this ships

Why

Validation

Known limitations

Implementation notes

Uh oh!

axiomantic-momus Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

axiomantic-momus Bot commented May 8, 2026 •

edited

Loading