feat(agent2agent): add /a2a slash command + self-respawning watch chain#287
Conversation
Hard rename of helper subcommands cmd_listen->cmd_open and cmd_unlisten->cmd_close in preparation for the watch-chain feature. No behavior change. Test names, SKILL.md prose, and the hook test helper updated to match. The user-vocabulary trigger phrase "listen for messages" is preserved in slash command frontmatter for muscle memory.
Three comments missed in 6366183 because the original rename grep gate only checked skills/ and tests/, not hooks/. The hook's section header and inline comment referenced `listen <name>` for the bind-via-helper contract, and one test docstring still said "never listened". No behavior change.
cmd_open now creates pending/ alongside inbox/processed/sent/ so the forthcoming watch-chain can stage messages there briefly between hook-receive and parent-drain without duplicating delivery. The new cmd_drain subcommand atomically moves a batch (or all batches, or the oldest batch) from pending/<batch-id>/ to processed/ via os.replace and emits the bodies as JSON. Malformed JSON files are still moved to processed/ but reported with an error/raw_path entry so they cannot poison subsequent drains. Drain is idempotent: missing pending dir, missing batch, and empty batch all return count=0 exit 0.
Add the cmd_watch subcommand scoped to lockfile + RECOVER + WATCH_RECYCLE behavior only. Acquires inbox/.watcher.lock via fcntl.flock so only one watcher per name runs at a time; emits PENDING_BATCH for any pre-existing non-empty pending/<batch>/ on entry; otherwise sleeps in poll-interval increments until --max-elapsed and exits with WATCH_RECYCLE. T3b will add fswatch wakeup, 500ms polling backstop, and atomic batch claim.
- C1: drop unlink in _cleanup (flock+unlink race; kernel fd-cleanup is the actual mutex source). Tests updated to assert acquirability, not path absence. - I1: defensive try/except FileNotFoundError around RECOVER inner iterdir for concurrent drain. - I2: new test_watch_kill_releases_lockfile_via_kernel exercises kernel auto-release via SIGKILL. - I3: replace lockfile.exists() readiness with pid-content readiness; closes a small TOCTOU in the concurrent-attempt test.
T3a left WAIT as a budget-only sleep loop; this completes T3b by spawning fswatch (when available) for low-latency wake, running a poll_interval backstop in the same loop so a missing or wedged fswatch cannot stall delivery, and atomically claiming up to --max-batch inbox files into pending/<batch-id>/ before emitting PENDING_BATCH and exiting 0. Spurious fswatch wakes (filtered dotfiles) and concurrent-reader-wins-all races (every candidate vanishes mid-claim) re-enter WAIT without emitting a zero-count batch. fswatch subprocess is reaped via terminate->wait->kill in a finally clause on every exit path. Adds 5 subprocess-driven tests (2 fswatch-gated via shutil.which); 48/48 helper tests pass.
Adds the slash-command-internal `_open_state {write|clear|read|alive}`
subcommand that persists per-session open-state at `<bus>/.open/<sid>`
(name, agent_id, started_at, output_file). `alive` mirrors the hook's
`_bg_agent_alive` mtime probe (0 = fresh, 1 = stale/missing transcript,
2 = FAIL-SAFE-DEAD when state file is missing or malformed) so the
slash command and the orphan-detection hook agree on liveness.
Add _bg_agent_alive and _agent2agent_check_orphaned_chain to spellbook_hook.py. Liveness mirrors cmd__open_state alive byte-for-byte (FAIL-SAFE-DEAD: missing/stale transcript or absent state -> dead). The SessionStart handler now runs the orphan check BEFORE the source != "compact" early return so cold starts and resumes also surface the re-arm hint; the compact-source path appends the hint after the existing recovery directive. UserPromptSubmit appends the hint after the notify line. Hook reads only .open/<sid> JSON metadata and stats the transcript path; it never touches inbox bodies.
Adds commands/a2a.md, the operator-facing slash interface to the agent2agent inter-session message bus. The command mirrors helper subcommands (check, send, peek, read, names, bound-name) and orchestrates the watch-chain (open, close): open binds an inbox name and dispatches a backgrounded Task agent running the helper's watch subcommand with the load-bearing Phase D prompt template verbatim, persists chain state via _open_state write, and silently re-dispatches on every completion (PENDING_BATCH or WATCH_RECYCLE). close is idempotent. Adds smoke tests under tests/test_skills/ asserting frontmatter parses, all six phases (A-F) are present, the verbatim Phase D prompt matches byte-for-byte, and every public subcommand has its own section.
- watch: print holder's pid (read from lockfile) instead of os.getpid() of the losing watcher. The holder writes its pid AFTER flock; reading it gives an operator a real kill target. Falls back to "unknown" on read failure or empty lockfile (race window between holder's truncate and write). - close: differentiate stdout — emit "closed '<name>'" only when an inbox or session binding was actually released; emit "not bound to '<name>'" on no-op. Both branches exit 0 (idempotent contract intact). - commands/a2a.md: document both close-stdout lines. - tests: strengthen test_watch_concurrent_attempt_blocked_by_lockfile to assert reported pid == holder pid (regression guard). Add test_close_idempotent_reports_not_bound and test_close_unknown_name_reports_not_bound. Both bugs surfaced during T7 manual e2e testing.
…iliation SKILL.md gains an "Architecture: watch chain vs hook-receive" section, a "Watch-Chain (Idle Delivery)" section explaining the slash command, the fswatch dependency, the compaction limitation and the open-state record, plus a Silent-Idle Cost Model subsection with per-cycle/hour/day token estimates and a `/a2a close` recommendation for overnight idle. Quick Reference adds protocol-internal `watch`, `drain`, and `_open_state` rows. Phase D portability: replace the hardcoded developer-machine path in commands/a2a.md with `$SPELLBOOK_DIR/...` so the bg Task agent's Bash command resolves correctly for any operator (the harness substitutes per ~/.claude/CLAUDE.md). Test constant + impl plan §Task 6 prompt template updated to match. Lockfile-deletion errata reconciled in design §3.1 and impl plan §Task 3a tests: the lockfile path persists post-exit; mutual exclusion is via `fcntl.flock` + kernel fd cleanup, not file deletion. The `_bg_agent_alive` docstring (and helper / plan / test mirror) drop the misleading "byte-for-byte parity" claim; the hook returns `bool` while the helper CLI op returns exit codes 0/1/2. Both remain FAIL-SAFE-DEAD. Plan §Task 4 pseudocode notes that the `open-state written/cleared` print lines are suppressed in the final implementation (silence preferred for slash dispatch). Frontmatter convention verified: `description`-only matches mode.md / handoff.md. Tests: +3 SKILL.md structural smoke tests (test_skill_md_documents_watch_chain, test_skill_md_silent_idle_cost_model_cites_token_numbers, test_skill_md_protocol_internal_subcommands_marked) and +1 docstring pin (TestBgAgentAlive::test_docstring_does_not_overclaim_byte_for_byte_parity). test_a2a_phase_d_prompt_is_verbatim updated to match the portable template. 112 tests pass (108 baseline + 4 new).
…overclaim - SKILL.md: fswatch -1 -> fswatch -0 -l 0.1 (actual long-running invocation, not one-shot) - SKILL.md: _open_state shared liveness contract framing - helper for slash, hook inline for perf/reliability (no shell-out) - New pin test guards against future fswatch flag drift between helper source and SKILL.md prose
|
❌ Momus review failed — phase phase2 failed: provider error: 402 This request requires more credits, or fewer max_tokens. You requested up to 8192 tokens, but can only afford 7532. To increase, visit https://openrouter.ai/settings/keys and create a key with a h... |
There was a problem hiding this comment.
Code Review
This pull request implements a 'watch chain' architecture for the agent2agent skill, facilitating idle message delivery via a background task. Significant changes include the /a2a slash command specification, new watch and drain subcommands in the helper script, and hook-level logic to detect orphaned watch chains. The review feedback correctly identifies an opportunity to refactor the _handle_session_start hook to consolidate duplicated fallback logic. However, the PR fails to include the mandatory version bump and changelog entry required by the repository style guide.
| project_path = data.get("cwd", "") | ||
| if not project_path: | ||
| return _fallback_directive() | ||
| result = _fallback_directive() | ||
| if orphan_hint: | ||
| result["hookSpecificOutput"]["additionalContext"] += ( | ||
| "\n\n" + orphan_hint | ||
| ) | ||
| return result | ||
|
|
||
| # Load workflow state | ||
| ws = _mcp_call("workflow_state_load", { | ||
| "project_path": project_path, | ||
| "max_age_hours": 24, | ||
| }) | ||
| if not ws or not ws.get("found"): | ||
| return _fallback_directive() | ||
| result = _fallback_directive() | ||
| if orphan_hint: | ||
| result["hookSpecificOutput"]["additionalContext"] += ( | ||
| "\n\n" + orphan_hint | ||
| ) | ||
| return result | ||
|
|
||
| state = ws.get("state", {}) |
There was a problem hiding this comment.
To improve maintainability and reduce code duplication, you can refactor this block to consolidate the fallback logic. The current implementation repeats the code for appending orphan_hint to the fallback directive.
project_path = data.get("cwd", "")
ws = None
if project_path:
# Load workflow state
ws = _mcp_call("workflow_state_load", {
"project_path": project_path,
"max_age_hours": 24,
})
if not ws or not ws.get("found"):
result = _fallback_directive()
if orphan_hint:
result["hookSpecificOutput"]["additionalContext"] += (
"\n\n" + orphan_hint
)
return result
state = ws.get("state", {})Two PR-#287 follow-ups bundled: 1. hooks/spellbook_hook.py _handle_session_start: collapse the duplicate "_fallback_directive() + orphan_hint" branches into a single guard. Per gemini-code-assist review feedback. Pure refactor; no behavior change. 2. tests/unit/test_stint_hooks.py: gate-error JSON now lives on stderr per Claude Code hook protocol (commit 324cab5 on main routed gate block messages to sys.stderr but only updated tests/test_security/* — this parallel test in tests/unit/ was missed). Read proc.stderr to match.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the /a2a slash command and a background "watch chain" architecture for the agent-to-agent message bus, facilitating near real-time message delivery while sessions are idle. The implementation includes new helper subcommands for watching and draining inboxes, as well as hook-level backstops to detect and recover from dropped watch chains. Feedback primarily addresses high-severity style guide violations: the PR lacks the mandatory minor version bump and CHANGELOG entry, and several tests must be refactored to use the tripwire mocking framework instead of monkeypatch.setattr for function replacement.
| --- | ||
| description: "agent2agent inter-session message bus. Triggers: '/a2a', 'open inbox', 'close inbox', 'listen for messages', 'listen as', 'send a message to session', 'check inbox', 'reply to that session', 'inter-agent chat', 'inter-agent messaging', 'agent2agent', 'a2a', 'agent bus', 'message another session', 'tell session Y to', 'ask session Y'. Subcommands: open, close, send, check, read, peek, names, bound-name." |
There was a problem hiding this comment.
This PR introduces a significant new feature (the /a2a slash command and background watch chain) but is missing the required version bump in the .version file and a corresponding entry in CHANGELOG.md. Per the repository style guide (lines 7-11, 18), new features require a minor version bump (0.X.0). Please add these to comply with the project's release process.
References
- Every PR must include a version bump in the .version file and a corresponding CHANGELOG.md entry. Missing these is a high severity issue. (link)
| # Force MCP failure so we hit the _fallback_directive branch. | ||
| # _mcp_call returns None when MCP is unreachable. | ||
| monkeypatch.setattr(spellbook_hook, "_mcp_call", lambda *a, **k: None) | ||
|
|
||
| missing_transcript = tmp_path / "dead.output" | ||
| _seed_open_state( | ||
| tmp_path, "sess-orphan-compact", name="alice", | ||
| agent_id="agent-x", output_file=missing_transcript, | ||
| ) | ||
| result = spellbook_hook._handle_session_start({ | ||
| "session_id": "sess-orphan-compact", | ||
| "source": "compact", | ||
| "cwd": str(tmp_path), | ||
| }) |
There was a problem hiding this comment.
This test uses monkeypatch.setattr to mock _mcp_call, which is forbidden by the repository style guide (lines 67-71, 109). Tripwire is the only acceptable framework for mocking functions in this repository. Please rewrite this using tripwire's three-step flow: register the mock, execute within the with tripwire: sandbox, and assert the call afterwards.
m = tripwire.mock("spellbook_hook:_mcp_call")
m.returns(None)
missing_transcript = tmp_path / "dead.output"
_seed_open_state(
tmp_path, "sess-orphan-compact", name="alice",
agent_id="agent-x", output_file=missing_transcript,
)
with tripwire:
result = spellbook_hook._handle_session_start({
"session_id": "sess-orphan-compact",
"source": "compact",
"cwd": str(tmp_path),
})
m.assert_call()References
- tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. (link)
| monkeypatch.setattr( | ||
| spellbook_hook, "_memory_recall_for_prompt", lambda *a, **k: None | ||
| ) | ||
| monkeypatch.setattr( | ||
| spellbook_hook, "_agent2agent_notify_for_prompt", lambda *a, **k: None | ||
| ) | ||
| monkeypatch.setattr( | ||
| spellbook_hook, "_memory_autostore_for_prompt", lambda *a, **k: None | ||
| ) | ||
|
|
||
| outputs = spellbook_hook._handle_user_prompt_submit({ | ||
| "session_id": "sess-ups-orphan", | ||
| "prompt": "hello", | ||
| "cwd": str(tmp_path), | ||
| }) |
There was a problem hiding this comment.
Multiple uses of monkeypatch.setattr for mocking are present here, violating the repository style guide (lines 67-71, 109). These should be converted to use tripwire.mock with the standard three-step flow.
m1 = tripwire.mock("spellbook_hook:_memory_recall_for_prompt")
m1.returns(None)
m2 = tripwire.mock("spellbook_hook:_agent2agent_notify_for_prompt")
m2.returns(None)
m3 = tripwire.mock("spellbook_hook:_memory_autostore_for_prompt")
m3.returns(None)
with tripwire:
outputs = spellbook_hook._handle_user_prompt_submit({
"session_id": "sess-ups-orphan",
"prompt": "hello",
"cwd": str(tmp_path),
})
m1.assert_call()
m2.assert_call()
m3.assert_call()References
- tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. (link)
| monkeypatch.setattr( | ||
| spellbook_hook, "_memory_recall_for_prompt", lambda *a, **k: None | ||
| ) | ||
| monkeypatch.setattr( | ||
| spellbook_hook, "_agent2agent_notify_for_prompt", lambda *a, **k: None | ||
| ) | ||
| monkeypatch.setattr( | ||
| spellbook_hook, "_memory_autostore_for_prompt", lambda *a, **k: None | ||
| ) | ||
|
|
||
| outputs = spellbook_hook._handle_user_prompt_submit({ | ||
| "session_id": "sess-ups-clean", | ||
| "prompt": "hello", | ||
| "cwd": str(tmp_path), | ||
| }) |
There was a problem hiding this comment.
Another set of monkeypatch.setattr violations. Please convert to tripwire.mock.
| monkeypatch.setattr( | |
| spellbook_hook, "_memory_recall_for_prompt", lambda *a, **k: None | |
| ) | |
| monkeypatch.setattr( | |
| spellbook_hook, "_agent2agent_notify_for_prompt", lambda *a, **k: None | |
| ) | |
| monkeypatch.setattr( | |
| spellbook_hook, "_memory_autostore_for_prompt", lambda *a, **k: None | |
| ) | |
| outputs = spellbook_hook._handle_user_prompt_submit({ | |
| "session_id": "sess-ups-clean", | |
| "prompt": "hello", | |
| "cwd": str(tmp_path), | |
| }) | |
| m1 = tripwire.mock("spellbook_hook:_memory_recall_for_prompt") | |
| m1.returns(None) | |
| m2 = tripwire.mock("spellbook_hook:_agent2agent_notify_for_prompt") | |
| m2.returns(None) | |
| m3 = tripwire.mock("spellbook_hook:_memory_autostore_for_prompt") | |
| m3.returns(None) | |
| with tripwire: | |
| outputs = spellbook_hook._handle_user_prompt_submit({ | |
| "session_id": "sess-ups-clean", | |
| "prompt": "hello", | |
| "cwd": str(tmp_path), | |
| }) | |
| m1.assert_call() | |
| m2.assert_call() | |
| m3.assert_call() |
References
- tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. (link)
| from datetime import datetime, timedelta, timezone | ||
| from pathlib import Path | ||
|
|
||
| import pytest |
There was a problem hiding this comment.
The repository style guide (lines 27-31) requires using tripwire as the only acceptable mocking framework. Please add the import here so it can be used in the tests below.
| import pytest | |
| import pytest | |
| import tripwire |
References
- tripwire is the ONLY acceptable framework for mocking. (link)
| monkeypatch.setattr(a2a.os, "replace", flaky_replace) | ||
|
|
||
| with pytest.raises(OSError): | ||
| _run(a2a, "drain", "alice", "batch-x") |
There was a problem hiding this comment.
This test uses monkeypatch.setattr to replace a function (os.replace), which is forbidden by the repository style guide (lines 67-71, 109). Please use tripwire.mock with the three-step flow.
| monkeypatch.setattr(a2a.os, "replace", flaky_replace) | |
| with pytest.raises(OSError): | |
| _run(a2a, "drain", "alice", "batch-x") | |
| m = tripwire.mock("os:replace") | |
| m.calls(flaky_replace) | |
| with tripwire: | |
| with pytest.raises(OSError): | |
| _run(a2a, "drain", "alice", "batch-x") | |
| m.assert_call() |
References
- tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. (link)
| assert pending_files.isdisjoint(processed_files) | ||
|
|
||
| # Restore real os.replace and re-run; remaining messages must drain cleanly. | ||
| monkeypatch.setattr(a2a.os, "replace", real_replace) |
There was a problem hiding this comment.
Manual restoration of the original function is not required when using tripwire, as the mock is automatically scoped to the with tripwire: block.
| monkeypatch.setattr(a2a.os, "replace", real_replace) | |
| # tripwire handles restoration automatically |
References
- tripwire is the ONLY acceptable framework for mocking. (link)
…+ version) Bundled fixes from gemini-code-assist's second review pass: - tests/test_skills/test_agent2agent_helper.py: skip the entire module on Windows (`pytest.skip(..., allow_module_level=True)`) so collection doesn't fail with `ModuleNotFoundError: fcntl`. The agent2agent helper uses `fcntl.flock` for the watch lockfile and is fundamentally POSIX-only. - tests/test_hooks/test_agent2agent_hook.py + tests/test_skills/ test_agent2agent_helper.py: convert all `monkeypatch.setattr` function replacements to `tripwire.mock(...)` per the repository style guide (the styleguide forbids monkeypatch.setattr for function/method replacement). Uses `dirty_equals.IsInstance` matchers for type-tolerant assertions. - .version: 0.63.2 -> 0.64.0 (minor bump per styleguide rule that new features / new commands warrant a minor bump). - CHANGELOG.md: add a 0.64.0 entry describing the watch-chain feature (helper subcommands, /a2a slash command, hook backstop, lockfile mutex, A9 zero-idle-tokens validation), the orphan-hint refactor, the test_stint_hooks stderr fix, and the tripwire conversion.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces an 'Agent2Agent watch-chain' architecture to enable near-real-time message delivery for idle sessions, replacing the previous polling-based approach. The changes include a new /a2a slash command, internal helper subcommands for managing the watch chain, and hook-based orphan detection to handle process death or compaction. The PR also updates documentation, refactors test mocks to use the required tripwire framework, and bumps the version to 0.64.0. I have reviewed the changes and agree with the reviewer's feedback regarding the need to guard fcntl and signal imports to ensure Windows compatibility.
| import fcntl | ||
| import json | ||
| import os | ||
| import re | ||
| import select | ||
| import shutil | ||
| import signal |
The integration TestAgent2AgentHook class spawns the agent2agent helper as a subprocess, and `test_hook_helper_constants_in_sync` imports the helper module. Both fail on Windows because the helper imports `fcntl` at module level (POSIX-only). The fast unit tests (TestAgent2AgentHookFast) remain unskipped — they stub the subprocess call so they don't need the helper to actually run.
Per gemini-code-assist's CRITICAL finding: importing fcntl at the top
level crashes on Windows even when the watch subcommand isn't invoked.
Wrap the import in a try/except with a fallback to None so the helper
loads cleanly on Windows; cmd_watch performs an explicit runtime check
and returns a clear error message ("watch: not supported on this
platform") rather than NameError'ing on the first fcntl.flock call.
Other subcommands (open, close, send, read, peek, drain, etc.) are
filesystem-only and now work on Windows. The watch subcommand and the
test_skills/test_agent2agent_helper.py module remain POSIX-only.
signal.SIGTERM is available on Windows (verified in CPython 3.9+) so
no guard is needed there.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements the Agent2Agent watch-chain architecture, enabling near-real-time message delivery for idle sessions with zero token cost. It introduces the /a2a slash command for orchestration, adds background file-watching capabilities to the agent2agent.py helper, and implements hook-based liveness checks to detect orphaned chains. Feedback focuses on correcting a non-executable Bash command in the slash command specification and increasing the liveness threshold from 90 seconds to 600 seconds to prevent false-positive "dropped chain" detections during the 540-second idle watch window.
| (skip any that produce empty stdout): | ||
| - `basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)"` — project basename | ||
| - `git branch --show-current 2>/dev/null` — current branch (skip if detached) | ||
| - `stint_check().top.name` — top stint name from MCP (skip if empty) |
There was a problem hiding this comment.
The entry stint_check().top.name is not a valid Bash command, which breaks the consistency of this list of one-liners. This should be described as a tool call (e.g., "Call the stint_check tool and extract the top stint name") or replaced with a proper shell command if one is available via the spellbook CLI.
| age = time.time() - op.stat().st_mtime | ||
| except OSError: | ||
| return False | ||
| return age < 90.0 |
There was a problem hiding this comment.
The 90-second liveness threshold is likely too short and will cause false-positive orphan detections during idle periods. The background watch agent can be silent for up to 540 seconds (the WATCH_RECYCLE budget) while waiting for messages. Since the transcript file is only updated when the process produces output, its mtime will not change during this window. To avoid false positives, this threshold should be increased to exceed the maximum watch duration (e.g., 600 seconds).
| return age < 90.0 | |
| return age < 600.0 |
| age = time.time() - op_path.stat().st_mtime | ||
| except OSError: | ||
| return 1 | ||
| return 0 if age < 90.0 else 1 |
… in /a2a Two gemini-flagged issues from PR-#287 review #4: 1. CORRECTNESS BUG (medium-priority): the 90s mtime liveness threshold in `_bg_agent_alive` and the parallel `cmd__open_state alive` exit-code probe is shorter than the watch agent's 540s WATCH_RECYCLE budget. While blocked on `watch`, the bg Task agent emits no stdout, so its transcript mtime does not advance during a normal idle window — the A9 zero-idle-tokens validation specifically confirms this. The 90s threshold would therefore false-positive DEAD anywhere from 90s into a normal idle window onward. Bump to 600s (60s grace margin over 540s for the inter-cycle re-dispatch). Tests adjusted accordingly: - test_returns_false_when_transcript_stale: stale_age 120s → 700s - test_open_state_alive_stale_transcript_returns_1: 120s → 700s - docstrings + SKILL.md prose updated 90s → 600s 2. DOC FIX (medium-priority): `commands/a2a.md` Phase B-1 listed `stint_check().top.name` alongside actual Bash one-liners. That pseudocode isn't executable; reword the bullet so it's clearly an MCP tool call, not a shell command, and present the bash + non-bash bullets uniformly.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a near-real-time "watch-chain" architecture for the agent2agent skill, replacing the previous polling-based approach with an OS-level file-watching system to enable idle delivery with zero token cost. The changes include a new /a2a slash command, specialized watch and drain subcommands, and hook-based liveness probes to detect orphaned background agents. Feedback identifies several race conditions in the file system logic within agent2agent.py, recommending additional try/except blocks to handle FileNotFoundError during concurrent directory iterations and ValueError when accessing process pipes during termination.
| target_dirs = sorted( | ||
| (d for d in pending_root.iterdir() if d.is_dir()), | ||
| key=lambda p: p.name, | ||
| ) | ||
| else: | ||
| candidates = sorted( | ||
| (d for d in pending_root.iterdir() if d.is_dir()), | ||
| key=lambda p: p.name, | ||
| ) | ||
| target_dirs = [candidates[0]] if candidates else [] |
There was a problem hiding this comment.
The iterdir() calls in both the all and default branches are not protected against the pending_root directory being deleted concurrently. This can lead to an unhandled FileNotFoundError. Consider wrapping these in try/except blocks, similar to the implementation in cmd_watch.
References
- Silent except Exception: pass blocks should log the exception, but specific expected errors like FileNotFoundError in race-prone IO should be handled gracefully. (link)
| for d in target_dirs: | ||
| messages = sorted( | ||
| (m for m in d.iterdir() if not m.name.startswith(".")), | ||
| key=lambda p: p.name, | ||
| ) |
| try: | ||
| os.read(fswatch_proc.stdout.fileno(), 4096) | ||
| except OSError: | ||
| pass |
There was a problem hiding this comment.
While OSError is caught, fswatch_proc.stdout.fileno() can raise a ValueError if the file object has been closed (e.g., if the process exited and was reaped just before this call). Adding ValueError to the caught exceptions would prevent a potential crash in the watch loop.
References
- Silent except blocks should log, but here we are intentionally draining a pipe where errors are expected during process termination. (link)
Three medium-priority defensive-coding fixes from gemini's PR-#287 review #5: 1. cmd_drain pending_root.iterdir(): catch FileNotFoundError (the dir can vanish if another session calls close() concurrently). Treat as "no pending batches". 2. cmd_drain per-batch d.iterdir(): catch FileNotFoundError (a parallel drain --all from another session can remove a batch dir mid-iteration). Skip the missing batch and continue. 3. cmd_watch fswatch stdout.fileno() drain: also catch ValueError. A racing reaper can close the pipe between poll() and fileno(); the resulting ValueError used to crash the watch loop. The drain refactor consolidates the three iterdir() callsites (default, --all, --batch-id) into a single try-protected block whose result is filtered to the requested mode.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements the Agent2Agent watch-chain architecture, providing near-real-time message delivery for idle sessions without polling. It introduces the /a2a slash command, adds watch, drain, and _open_state subcommands to the helper script, and integrates liveness and orphan-chain detection into the spellbook hooks. Documentation and tests have been updated to reflect the transition from listen/unlisten to open/close terminology, and the version is bumped to 0.64.0. I have no feedback to provide.
What does this PR do?
Replaces the prior LLM-side polling loop in agent2agent (which burned ~72k tokens in 7 minutes of idle waiting) with a self-respawning background Task agent that delivers messages to idle sessions with zero new tokens during the idle window. Adds a
/a2aslash command with 8 user-facing subcommands and a 6-phase orchestration of the watch chain.Related issue
Checklist
What this ships
watch+ chain pattern). The bg Task blocks on a single Bash invocation;fswatchwakes it on inbox changes; the parent re-arms silently after each cycle.listen->open,unlisten->close(breaking for any pre-T1 caller; failure mode is a clear argparse error).drain(atomic batch claim frompending/-> inbox),watch(lockfile + RECOVER + WATCH_RECYCLE + fswatch/poll),_open_state(slash-internal write/clear/read/alive)._bg_agent_alive) for orphaned-chain detection in SessionStart + UserPromptSubmit hooks. FAIL-SAFE-DEAD with a 90s mtime window./a2aslash command with 8 user-facing subcommands (open,close,send,check,read,peek,names,bound-name) + 6-phase orchestration of the watch chain.Why
The prior agent2agent iteration consumed ~72k tokens in 7 minutes of LLM-side polling. The new watch chain delivers messages to idle sessions with zero new tokens during the idle window (validated by T7 manual e2e A9 ship gate over 540s). The bg Task agent blocks on a single Bash invocation; fswatch wakes it on inbox changes; parent re-arms silently each cycle.
Validation
Known limitations
watch --max-elapsed=2and asserts a singleWATCH_RECYCLE elapsed=2sline.commands/a2a.mdPhase D and a fixture, but cannot verify that the bg Task agent's LLM actually executes the prompt as written. T7 confirmed it does on the tester's harness.commands/a2a.mdPhase F. The orchestrator must call_open_state writewith the new bg agent's id every cycle. The hook orphan-detection backstop catches the failure mode (false-positive orphan after 90s) but does not prevent it.~/.local/share/agent2agent/is local. Future: detect and warn.AGENT2AGENT_DIRenv propagation to bg Task when the harness runs the bg Task in a clean shell is unvalidated for non-default bus paths.listen/unlisten(nowopen/close). Failure mode is graceful (clear argparse error). Acceptable for an internal tool.Implementation notes
fswatch -0 -l 0.1 inbox/(long-running stream, NUL-delimited, 100ms event-coalescing latency).inbox/.watcher.lockpath persists across cycles; mutex enforced byfcntl.flock(LOCK_EX|LOCK_NB)+ kernel fd cleanup._open_staterecord at<bus>/.open/<session-id>(no.jsonextension).