Skip to content

feat(agent2agent): add /a2a slash command + self-respawning watch chain#287

Merged
elijahr merged 19 commits into
mainfrom
feat/a2a-watch-chain
May 8, 2026
Merged

feat(agent2agent): add /a2a slash command + self-respawning watch chain#287
elijahr merged 19 commits into
mainfrom
feat/a2a-watch-chain

Conversation

@elijahr

@elijahr elijahr commented May 8, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Replaces the prior LLM-side polling loop in agent2agent (which burned ~72k tokens in 7 minutes of idle waiting) with a self-respawning background Task agent that delivers messages to idle sessions with zero new tokens during the idle window. Adds a /a2a slash command with 8 user-facing subcommands and a 6-phase orchestration of the watch chain.

Related issue

Checklist

  • Tests pass locally
  • Documentation updated (if applicable)

What this ships

  • Self-respawning bg Task agent loop (watch + chain pattern). The bg Task blocks on a single Bash invocation; fswatch wakes it on inbox changes; the parent re-arms silently after each cycle.
  • Renamed helper subcommands: listen -> open, unlisten -> close (breaking for any pre-T1 caller; failure mode is a clear argparse error).
  • New helper subcommands: drain (atomic batch claim from pending/ -> inbox), watch (lockfile + RECOVER + WATCH_RECYCLE + fswatch/poll), _open_state (slash-internal write/clear/read/alive).
  • Hook backstop (_bg_agent_alive) for orphaned-chain detection in SessionStart + UserPromptSubmit hooks. FAIL-SAFE-DEAD with a 90s mtime window.
  • /a2a slash command with 8 user-facing subcommands (open, close, send, check, read, peek, names, bound-name) + 6-phase orchestration of the watch chain.
  • SKILL.md architecture documentation, including the silent-idle cost model.

Why

The prior agent2agent iteration consumed ~72k tokens in 7 minutes of LLM-side polling. The new watch chain delivers messages to idle sessions with zero new tokens during the idle window (validated by T7 manual e2e A9 ship gate over 540s). The bg Task agent blocks on a single Bash invocation; fswatch wakes it on inbox changes; parent re-arms silently each cycle.

Validation

  • 113 automated tests pass in the targeted suite — helper unit tests, hook tests, slash command structural tests, and concurrency invariant tests with disjoint+union assertions on atomic batch claim.
  • T7 manual e2e (separate session exercising the live chain): A9 ship gate PASSED — 5-minute idle window in tester session showed 0 new tokens, 0 tool uses, 0 transcript lines. All 6 scenarios passed: normal delivery, concurrent senders, idle silence, orphan detection, idempotent close, lockfile mutex.

Known limitations

  1. A9 idle-token gate is manual-only. Automated coverage of a 540s idle path is structurally infeasible inside a 30s pytest timeout. Future: CI smoke that runs watch --max-elapsed=2 and asserts a single WATCH_RECYCLE elapsed=2s line.
  2. Phase D verbatim prompt LLM-honor is empirical. Tests assert byte-for-byte match between commands/a2a.md Phase D and a fixture, but cannot verify that the bg Task agent's LLM actually executes the prompt as written. T7 confirmed it does on the tester's harness.
  3. Re-arm bookkeeping is a prose contract in commands/a2a.md Phase F. The orchestrator must call _open_state write with the new bg agent's id every cycle. The hook orphan-detection backstop catches the failure mode (false-positive orphan after 90s) but does not prevent it.
  4. Lockfile mutex on networked / cloud-synced filesystems (NFS, SMB, iCloud) may not honor flock semantics. No detection or warning today; default ~/.local/share/agent2agent/ is local. Future: detect and warn.
  5. AGENT2AGENT_DIR env propagation to bg Task when the harness runs the bg Task in a clean shell is unvalidated for non-default bus paths.
  6. Rename is breaking for any pre-T1 caller of listen / unlisten (now open / close). Failure mode is graceful (clear argparse error). Acceptable for an internal tool.

Implementation notes

  • fswatch -0 -l 0.1 inbox/ (long-running stream, NUL-delimited, 100ms event-coalescing latency).
  • Polling backstop default 500ms; fallback when fswatch is missing.
  • WATCH_RECYCLE default 540s (forced by Bash 600000ms ceiling — recycle every 9 min to avoid tool-timeout).
  • Lockfile inbox/.watcher.lock path persists across cycles; mutex enforced by fcntl.flock(LOCK_EX|LOCK_NB) + kernel fd cleanup.
  • _open_state record at <bus>/.open/<session-id> (no .json extension).
  • 12 commits, zero new third-party imports, stdlib only.

elijahr added 12 commits May 7, 2026 19:30
Hard rename of helper subcommands cmd_listen->cmd_open and
cmd_unlisten->cmd_close in preparation for the watch-chain feature.
No behavior change. Test names, SKILL.md prose, and the hook test
helper updated to match. The user-vocabulary trigger phrase
"listen for messages" is preserved in slash command frontmatter
for muscle memory.
Three comments missed in 6366183 because the original rename grep gate
only checked skills/ and tests/, not hooks/. The hook's section header
and inline comment referenced `listen <name>` for the bind-via-helper
contract, and one test docstring still said "never listened".

No behavior change.
cmd_open now creates pending/ alongside inbox/processed/sent/ so the
forthcoming watch-chain can stage messages there briefly between
hook-receive and parent-drain without duplicating delivery. The new
cmd_drain subcommand atomically moves a batch (or all batches, or the
oldest batch) from pending/<batch-id>/ to processed/ via os.replace
and emits the bodies as JSON. Malformed JSON files are still moved to
processed/ but reported with an error/raw_path entry so they cannot
poison subsequent drains. Drain is idempotent: missing pending dir,
missing batch, and empty batch all return count=0 exit 0.
Add the cmd_watch subcommand scoped to lockfile + RECOVER + WATCH_RECYCLE
behavior only. Acquires inbox/.watcher.lock via fcntl.flock so only one
watcher per name runs at a time; emits PENDING_BATCH for any pre-existing
non-empty pending/<batch>/ on entry; otherwise sleeps in poll-interval
increments until --max-elapsed and exits with WATCH_RECYCLE. T3b will
add fswatch wakeup, 500ms polling backstop, and atomic batch claim.
- C1: drop unlink in _cleanup (flock+unlink race; kernel fd-cleanup
  is the actual mutex source). Tests updated to assert acquirability,
  not path absence.
- I1: defensive try/except FileNotFoundError around RECOVER inner
  iterdir for concurrent drain.
- I2: new test_watch_kill_releases_lockfile_via_kernel exercises
  kernel auto-release via SIGKILL.
- I3: replace lockfile.exists() readiness with pid-content readiness;
  closes a small TOCTOU in the concurrent-attempt test.
T3a left WAIT as a budget-only sleep loop; this completes T3b by spawning
fswatch (when available) for low-latency wake, running a poll_interval
backstop in the same loop so a missing or wedged fswatch cannot stall
delivery, and atomically claiming up to --max-batch inbox files into
pending/<batch-id>/ before emitting PENDING_BATCH and exiting 0. Spurious
fswatch wakes (filtered dotfiles) and concurrent-reader-wins-all races
(every candidate vanishes mid-claim) re-enter WAIT without emitting a
zero-count batch. fswatch subprocess is reaped via terminate->wait->kill
in a finally clause on every exit path. Adds 5 subprocess-driven tests
(2 fswatch-gated via shutil.which); 48/48 helper tests pass.
Adds the slash-command-internal `_open_state {write|clear|read|alive}`
subcommand that persists per-session open-state at `<bus>/.open/<sid>`
(name, agent_id, started_at, output_file). `alive` mirrors the hook's
`_bg_agent_alive` mtime probe (0 = fresh, 1 = stale/missing transcript,
2 = FAIL-SAFE-DEAD when state file is missing or malformed) so the
slash command and the orphan-detection hook agree on liveness.
Add _bg_agent_alive and _agent2agent_check_orphaned_chain to
spellbook_hook.py. Liveness mirrors cmd__open_state alive byte-for-byte
(FAIL-SAFE-DEAD: missing/stale transcript or absent state -> dead). The
SessionStart handler now runs the orphan check BEFORE the
source != "compact" early return so cold starts and resumes also
surface the re-arm hint; the compact-source path appends the hint after
the existing recovery directive. UserPromptSubmit appends the hint after
the notify line. Hook reads only .open/<sid> JSON metadata and stats
the transcript path; it never touches inbox bodies.
Adds commands/a2a.md, the operator-facing slash interface to the
agent2agent inter-session message bus. The command mirrors helper
subcommands (check, send, peek, read, names, bound-name) and
orchestrates the watch-chain (open, close): open binds an inbox
name and dispatches a backgrounded Task agent running the helper's
watch subcommand with the load-bearing Phase D prompt template
verbatim, persists chain state via _open_state write, and silently
re-dispatches on every completion (PENDING_BATCH or WATCH_RECYCLE).
close is idempotent. Adds smoke tests under tests/test_skills/
asserting frontmatter parses, all six phases (A-F) are present,
the verbatim Phase D prompt matches byte-for-byte, and every
public subcommand has its own section.
- watch: print holder's pid (read from lockfile) instead of os.getpid()
  of the losing watcher. The holder writes its pid AFTER flock; reading
  it gives an operator a real kill target. Falls back to "unknown" on
  read failure or empty lockfile (race window between holder's truncate
  and write).
- close: differentiate stdout — emit "closed '<name>'" only when an
  inbox or session binding was actually released; emit "not bound to
  '<name>'" on no-op. Both branches exit 0 (idempotent contract intact).
- commands/a2a.md: document both close-stdout lines.
- tests: strengthen test_watch_concurrent_attempt_blocked_by_lockfile
  to assert reported pid == holder pid (regression guard). Add
  test_close_idempotent_reports_not_bound and
  test_close_unknown_name_reports_not_bound.

Both bugs surfaced during T7 manual e2e testing.
…iliation

SKILL.md gains an "Architecture: watch chain vs hook-receive" section, a
"Watch-Chain (Idle Delivery)" section explaining the slash command, the
fswatch dependency, the compaction limitation and the open-state record,
plus a Silent-Idle Cost Model subsection with per-cycle/hour/day token
estimates and a `/a2a close` recommendation for overnight idle. Quick
Reference adds protocol-internal `watch`, `drain`, and `_open_state` rows.

Phase D portability: replace the hardcoded developer-machine path in
commands/a2a.md with `$SPELLBOOK_DIR/...` so the bg Task agent's Bash
command resolves correctly for any operator (the harness substitutes per
~/.claude/CLAUDE.md). Test constant + impl plan §Task 6 prompt template
updated to match.

Lockfile-deletion errata reconciled in design §3.1 and impl plan §Task
3a tests: the lockfile path persists post-exit; mutual exclusion is via
`fcntl.flock` + kernel fd cleanup, not file deletion. The
`_bg_agent_alive` docstring (and helper / plan / test mirror) drop the
misleading "byte-for-byte parity" claim; the hook returns `bool` while
the helper CLI op returns exit codes 0/1/2. Both remain FAIL-SAFE-DEAD.
Plan §Task 4 pseudocode notes that the `open-state written/cleared`
print lines are suppressed in the final implementation (silence
preferred for slash dispatch). Frontmatter convention verified:
`description`-only matches mode.md / handoff.md.

Tests: +3 SKILL.md structural smoke tests
(test_skill_md_documents_watch_chain,
test_skill_md_silent_idle_cost_model_cites_token_numbers,
test_skill_md_protocol_internal_subcommands_marked) and +1 docstring
pin
(TestBgAgentAlive::test_docstring_does_not_overclaim_byte_for_byte_parity).
test_a2a_phase_d_prompt_is_verbatim updated to match the portable
template. 112 tests pass (108 baseline + 4 new).
…overclaim

- SKILL.md: fswatch -1 -> fswatch -0 -l 0.1 (actual long-running
  invocation, not one-shot)
- SKILL.md: _open_state shared liveness contract framing -
  helper for slash, hook inline for perf/reliability (no shell-out)
- New pin test guards against future fswatch flag drift between
  helper source and SKILL.md prose
@axiomantic-momus

axiomantic-momus Bot commented May 8, 2026

Copy link
Copy Markdown

Momus review failed — phase phase2 failed: provider error: 402 This request requires more credits, or fewer max_tokens. You requested up to 8192 tokens, but can only afford 7532. To increase, visit https://openrouter.ai/settings/keys and create a key with a h...

run log

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a 'watch chain' architecture for the agent2agent skill, facilitating idle message delivery via a background task. Significant changes include the /a2a slash command specification, new watch and drain subcommands in the helper script, and hook-level logic to detect orphaned watch chains. The review feedback correctly identifies an opportunity to refactor the _handle_session_start hook to consolidate duplicated fallback logic. However, the PR fails to include the mandatory version bump and changelog entry required by the repository style guide.

Comment thread hooks/spellbook_hook.py
Comment on lines 2332 to 2354
project_path = data.get("cwd", "")
if not project_path:
return _fallback_directive()
result = _fallback_directive()
if orphan_hint:
result["hookSpecificOutput"]["additionalContext"] += (
"\n\n" + orphan_hint
)
return result

# Load workflow state
ws = _mcp_call("workflow_state_load", {
"project_path": project_path,
"max_age_hours": 24,
})
if not ws or not ws.get("found"):
return _fallback_directive()
result = _fallback_directive()
if orphan_hint:
result["hookSpecificOutput"]["additionalContext"] += (
"\n\n" + orphan_hint
)
return result

state = ws.get("state", {})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and reduce code duplication, you can refactor this block to consolidate the fallback logic. The current implementation repeats the code for appending orphan_hint to the fallback directive.

    project_path = data.get("cwd", "")
    ws = None
    if project_path:
        # Load workflow state
        ws = _mcp_call("workflow_state_load", {
            "project_path": project_path,
            "max_age_hours": 24,
        })

    if not ws or not ws.get("found"):
        result = _fallback_directive()
        if orphan_hint:
            result["hookSpecificOutput"]["additionalContext"] += (
                "\n\n" + orphan_hint
            )
        return result

    state = ws.get("state", {})

elijahr added 2 commits May 8, 2026 09:46
Two PR-#287 follow-ups bundled:

1. hooks/spellbook_hook.py _handle_session_start: collapse the duplicate
   "_fallback_directive() + orphan_hint" branches into a single guard.
   Per gemini-code-assist review feedback. Pure refactor; no behavior
   change.

2. tests/unit/test_stint_hooks.py: gate-error JSON now lives on stderr
   per Claude Code hook protocol (commit 324cab5 on main routed gate
   block messages to sys.stderr but only updated tests/test_security/*
   — this parallel test in tests/unit/ was missed). Read proc.stderr to
   match.
@elijahr

elijahr commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the /a2a slash command and a background "watch chain" architecture for the agent-to-agent message bus, facilitating near real-time message delivery while sessions are idle. The implementation includes new helper subcommands for watching and draining inboxes, as well as hook-level backstops to detect and recover from dropped watch chains. Feedback primarily addresses high-severity style guide violations: the PR lacks the mandatory minor version bump and CHANGELOG entry, and several tests must be refactored to use the tripwire mocking framework instead of monkeypatch.setattr for function replacement.

Comment thread commands/a2a.md
Comment on lines +1 to +2
---
description: "agent2agent inter-session message bus. Triggers: '/a2a', 'open inbox', 'close inbox', 'listen for messages', 'listen as', 'send a message to session', 'check inbox', 'reply to that session', 'inter-agent chat', 'inter-agent messaging', 'agent2agent', 'a2a', 'agent bus', 'message another session', 'tell session Y to', 'ask session Y'. Subcommands: open, close, send, check, read, peek, names, bound-name."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This PR introduces a significant new feature (the /a2a slash command and background watch chain) but is missing the required version bump in the .version file and a corresponding entry in CHANGELOG.md. Per the repository style guide (lines 7-11, 18), new features require a minor version bump (0.X.0). Please add these to comply with the project's release process.

References
  1. Every PR must include a version bump in the .version file and a corresponding CHANGELOG.md entry. Missing these is a high severity issue. (link)

Comment on lines +637 to +650
# Force MCP failure so we hit the _fallback_directive branch.
# _mcp_call returns None when MCP is unreachable.
monkeypatch.setattr(spellbook_hook, "_mcp_call", lambda *a, **k: None)

missing_transcript = tmp_path / "dead.output"
_seed_open_state(
tmp_path, "sess-orphan-compact", name="alice",
agent_id="agent-x", output_file=missing_transcript,
)
result = spellbook_hook._handle_session_start({
"session_id": "sess-orphan-compact",
"source": "compact",
"cwd": str(tmp_path),
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test uses monkeypatch.setattr to mock _mcp_call, which is forbidden by the repository style guide (lines 67-71, 109). Tripwire is the only acceptable framework for mocking functions in this repository. Please rewrite this using tripwire's three-step flow: register the mock, execute within the with tripwire: sandbox, and assert the call afterwards.

        m = tripwire.mock("spellbook_hook:_mcp_call")
        m.returns(None)

        missing_transcript = tmp_path / "dead.output"
        _seed_open_state(
            tmp_path, "sess-orphan-compact", name="alice",
            agent_id="agent-x", output_file=missing_transcript,
        )
        with tripwire:
            result = spellbook_hook._handle_session_start({
                "session_id": "sess-orphan-compact",
                "source": "compact",
                "cwd": str(tmp_path),
            })
        m.assert_call()
References
  1. tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. (link)

Comment on lines +707 to +721
monkeypatch.setattr(
spellbook_hook, "_memory_recall_for_prompt", lambda *a, **k: None
)
monkeypatch.setattr(
spellbook_hook, "_agent2agent_notify_for_prompt", lambda *a, **k: None
)
monkeypatch.setattr(
spellbook_hook, "_memory_autostore_for_prompt", lambda *a, **k: None
)

outputs = spellbook_hook._handle_user_prompt_submit({
"session_id": "sess-ups-orphan",
"prompt": "hello",
"cwd": str(tmp_path),
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Multiple uses of monkeypatch.setattr for mocking are present here, violating the repository style guide (lines 67-71, 109). These should be converted to use tripwire.mock with the standard three-step flow.

        m1 = tripwire.mock("spellbook_hook:_memory_recall_for_prompt")
        m1.returns(None)
        m2 = tripwire.mock("spellbook_hook:_agent2agent_notify_for_prompt")
        m2.returns(None)
        m3 = tripwire.mock("spellbook_hook:_memory_autostore_for_prompt")
        m3.returns(None)

        with tripwire:
            outputs = spellbook_hook._handle_user_prompt_submit({
                "session_id": "sess-ups-orphan",
                "prompt": "hello",
                "cwd": str(tmp_path),
            })
        m1.assert_call()
        m2.assert_call()
        m3.assert_call()
References
  1. tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. (link)

Comment on lines +731 to +745
monkeypatch.setattr(
spellbook_hook, "_memory_recall_for_prompt", lambda *a, **k: None
)
monkeypatch.setattr(
spellbook_hook, "_agent2agent_notify_for_prompt", lambda *a, **k: None
)
monkeypatch.setattr(
spellbook_hook, "_memory_autostore_for_prompt", lambda *a, **k: None
)

outputs = spellbook_hook._handle_user_prompt_submit({
"session_id": "sess-ups-clean",
"prompt": "hello",
"cwd": str(tmp_path),
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Another set of monkeypatch.setattr violations. Please convert to tripwire.mock.

Suggested change
monkeypatch.setattr(
spellbook_hook, "_memory_recall_for_prompt", lambda *a, **k: None
)
monkeypatch.setattr(
spellbook_hook, "_agent2agent_notify_for_prompt", lambda *a, **k: None
)
monkeypatch.setattr(
spellbook_hook, "_memory_autostore_for_prompt", lambda *a, **k: None
)
outputs = spellbook_hook._handle_user_prompt_submit({
"session_id": "sess-ups-clean",
"prompt": "hello",
"cwd": str(tmp_path),
})
m1 = tripwire.mock("spellbook_hook:_memory_recall_for_prompt")
m1.returns(None)
m2 = tripwire.mock("spellbook_hook:_agent2agent_notify_for_prompt")
m2.returns(None)
m3 = tripwire.mock("spellbook_hook:_memory_autostore_for_prompt")
m3.returns(None)
with tripwire:
outputs = spellbook_hook._handle_user_prompt_submit({
"session_id": "sess-ups-clean",
"prompt": "hello",
"cwd": str(tmp_path),
})
m1.assert_call()
m2.assert_call()
m3.assert_call()
References
  1. tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. (link)

from datetime import datetime, timedelta, timezone
from pathlib import Path

import pytest

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The repository style guide (lines 27-31) requires using tripwire as the only acceptable mocking framework. Please add the import here so it can be used in the tests below.

Suggested change
import pytest
import pytest
import tripwire
References
  1. tripwire is the ONLY acceptable framework for mocking. (link)

Comment on lines +561 to +564
monkeypatch.setattr(a2a.os, "replace", flaky_replace)

with pytest.raises(OSError):
_run(a2a, "drain", "alice", "batch-x")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test uses monkeypatch.setattr to replace a function (os.replace), which is forbidden by the repository style guide (lines 67-71, 109). Please use tripwire.mock with the three-step flow.

Suggested change
monkeypatch.setattr(a2a.os, "replace", flaky_replace)
with pytest.raises(OSError):
_run(a2a, "drain", "alice", "batch-x")
m = tripwire.mock("os:replace")
m.calls(flaky_replace)
with tripwire:
with pytest.raises(OSError):
_run(a2a, "drain", "alice", "batch-x")
m.assert_call()
References
  1. tripwire is the ONLY acceptable framework for mocking. monkeypatch.setattr is forbidden for replacing functions. (link)

assert pending_files.isdisjoint(processed_files)

# Restore real os.replace and re-run; remaining messages must drain cleanly.
monkeypatch.setattr(a2a.os, "replace", real_replace)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Manual restoration of the original function is not required when using tripwire, as the mock is automatically scoped to the with tripwire: block.

Suggested change
monkeypatch.setattr(a2a.os, "replace", real_replace)
# tripwire handles restoration automatically
References
  1. tripwire is the ONLY acceptable framework for mocking. (link)

…+ version)

Bundled fixes from gemini-code-assist's second review pass:

- tests/test_skills/test_agent2agent_helper.py: skip the entire module
  on Windows (`pytest.skip(..., allow_module_level=True)`) so collection
  doesn't fail with `ModuleNotFoundError: fcntl`. The agent2agent helper
  uses `fcntl.flock` for the watch lockfile and is fundamentally
  POSIX-only.
- tests/test_hooks/test_agent2agent_hook.py + tests/test_skills/
  test_agent2agent_helper.py: convert all `monkeypatch.setattr` function
  replacements to `tripwire.mock(...)` per the repository style guide
  (the styleguide forbids monkeypatch.setattr for function/method
  replacement). Uses `dirty_equals.IsInstance` matchers for
  type-tolerant assertions.
- .version: 0.63.2 -> 0.64.0 (minor bump per styleguide rule that new
  features / new commands warrant a minor bump).
- CHANGELOG.md: add a 0.64.0 entry describing the watch-chain feature
  (helper subcommands, /a2a slash command, hook backstop, lockfile
  mutex, A9 zero-idle-tokens validation), the orphan-hint refactor,
  the test_stint_hooks stderr fix, and the tripwire conversion.
@elijahr

elijahr commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an 'Agent2Agent watch-chain' architecture to enable near-real-time message delivery for idle sessions, replacing the previous polling-based approach. The changes include a new /a2a slash command, internal helper subcommands for managing the watch chain, and hook-based orphan detection to handle process death or compaction. The PR also updates documentation, refactors test mocks to use the required tripwire framework, and bumps the version to 0.64.0. I have reviewed the changes and agree with the reviewer's feedback regarding the need to guard fcntl and signal imports to ensure Windows compatibility.

Comment on lines +24 to +30
import fcntl
import json
import os
import re
import select
import shutil
import signal

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The fcntl module and signal.SIGTERM are not available on Windows. Importing fcntl at the top level will cause the script to crash on Windows, even if the functionality is not used. Please guard these imports and usages to ensure Windows compatibility.

elijahr added 2 commits May 8, 2026 10:32
The integration TestAgent2AgentHook class spawns the agent2agent helper
as a subprocess, and `test_hook_helper_constants_in_sync` imports the
helper module. Both fail on Windows because the helper imports `fcntl`
at module level (POSIX-only). The fast unit tests (TestAgent2AgentHookFast)
remain unskipped — they stub the subprocess call so they don't need the
helper to actually run.
Per gemini-code-assist's CRITICAL finding: importing fcntl at the top
level crashes on Windows even when the watch subcommand isn't invoked.
Wrap the import in a try/except with a fallback to None so the helper
loads cleanly on Windows; cmd_watch performs an explicit runtime check
and returns a clear error message ("watch: not supported on this
platform") rather than NameError'ing on the first fcntl.flock call.

Other subcommands (open, close, send, read, peek, drain, etc.) are
filesystem-only and now work on Windows. The watch subcommand and the
test_skills/test_agent2agent_helper.py module remain POSIX-only.

signal.SIGTERM is available on Windows (verified in CPython 3.9+) so
no guard is needed there.
@elijahr

elijahr commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Agent2Agent watch-chain architecture, enabling near-real-time message delivery for idle sessions with zero token cost. It introduces the /a2a slash command for orchestration, adds background file-watching capabilities to the agent2agent.py helper, and implements hook-based liveness checks to detect orphaned chains. Feedback focuses on correcting a non-executable Bash command in the slash command specification and increasing the liveness threshold from 90 seconds to 600 seconds to prevent false-positive "dropped chain" detections during the 540-second idle watch window.

Comment thread commands/a2a.md Outdated
(skip any that produce empty stdout):
- `basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)"` — project basename
- `git branch --show-current 2>/dev/null` — current branch (skip if detached)
- `stint_check().top.name` — top stint name from MCP (skip if empty)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The entry stint_check().top.name is not a valid Bash command, which breaks the consistency of this list of one-liners. This should be described as a tool call (e.g., "Call the stint_check tool and extract the top stint name") or replaced with a proper shell command if one is available via the spellbook CLI.

Comment thread hooks/spellbook_hook.py Outdated
age = time.time() - op.stat().st_mtime
except OSError:
return False
return age < 90.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The 90-second liveness threshold is likely too short and will cause false-positive orphan detections during idle periods. The background watch agent can be silent for up to 540 seconds (the WATCH_RECYCLE budget) while waiting for messages. Since the transcript file is only updated when the process produces output, its mtime will not change during this window. To avoid false positives, this threshold should be increased to exceed the maximum watch duration (e.g., 600 seconds).

Suggested change
return age < 90.0
return age < 600.0

age = time.time() - op_path.stat().st_mtime
except OSError:
return 1
return 0 if age < 90.0 else 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The 90-second liveness threshold here should be synchronized with the one in the hook and increased to avoid false positives during the 540-second idle watch window.

Suggested change
return 0 if age < 90.0 else 1
return 0 if age < 600.0 else 1

… in /a2a

Two gemini-flagged issues from PR-#287 review #4:

1. CORRECTNESS BUG (medium-priority): the 90s mtime liveness threshold
   in `_bg_agent_alive` and the parallel `cmd__open_state alive` exit-code
   probe is shorter than the watch agent's 540s WATCH_RECYCLE budget.
   While blocked on `watch`, the bg Task agent emits no stdout, so its
   transcript mtime does not advance during a normal idle window — the
   A9 zero-idle-tokens validation specifically confirms this. The 90s
   threshold would therefore false-positive DEAD anywhere from 90s into
   a normal idle window onward. Bump to 600s (60s grace margin over 540s
   for the inter-cycle re-dispatch). Tests adjusted accordingly:
   - test_returns_false_when_transcript_stale: stale_age 120s → 700s
   - test_open_state_alive_stale_transcript_returns_1: 120s → 700s
   - docstrings + SKILL.md prose updated 90s → 600s

2. DOC FIX (medium-priority): `commands/a2a.md` Phase B-1 listed
   `stint_check().top.name` alongside actual Bash one-liners. That
   pseudocode isn't executable; reword the bullet so it's clearly an
   MCP tool call, not a shell command, and present the bash + non-bash
   bullets uniformly.
@elijahr

elijahr commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a near-real-time "watch-chain" architecture for the agent2agent skill, replacing the previous polling-based approach with an OS-level file-watching system to enable idle delivery with zero token cost. The changes include a new /a2a slash command, specialized watch and drain subcommands, and hook-based liveness probes to detect orphaned background agents. Feedback identifies several race conditions in the file system logic within agent2agent.py, recommending additional try/except blocks to handle FileNotFoundError during concurrent directory iterations and ValueError when accessing process pipes during termination.

Comment on lines +499 to +508
target_dirs = sorted(
(d for d in pending_root.iterdir() if d.is_dir()),
key=lambda p: p.name,
)
else:
candidates = sorted(
(d for d in pending_root.iterdir() if d.is_dir()),
key=lambda p: p.name,
)
target_dirs = [candidates[0]] if candidates else []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The iterdir() calls in both the all and default branches are not protected against the pending_root directory being deleted concurrently. This can lead to an unhandled FileNotFoundError. Consider wrapping these in try/except blocks, similar to the implementation in cmd_watch.

References
  1. Silent except Exception: pass blocks should log the exception, but specific expected errors like FileNotFoundError in race-prone IO should be handled gracefully. (link)

Comment on lines +511 to +515
for d in target_dirs:
messages = sorted(
(m for m in d.iterdir() if not m.name.startswith(".")),
key=lambda p: p.name,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Iterating over a batch directory that might have been removed (e.g., by a concurrent drain --all from another session) will raise FileNotFoundError. This loop should be wrapped in a try/except block to skip missing directories gracefully.

Comment on lines +770 to +773
try:
os.read(fswatch_proc.stdout.fileno(), 4096)
except OSError:
pass

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While OSError is caught, fswatch_proc.stdout.fileno() can raise a ValueError if the file object has been closed (e.g., if the process exited and was reaped just before this call). Adding ValueError to the caught exceptions would prevent a potential crash in the watch loop.

References
  1. Silent except blocks should log, but here we are intentionally draining a pipe where errors are expected during process termination. (link)

Three medium-priority defensive-coding fixes from gemini's PR-#287
review #5:

1. cmd_drain pending_root.iterdir(): catch FileNotFoundError (the dir
   can vanish if another session calls close() concurrently). Treat
   as "no pending batches".
2. cmd_drain per-batch d.iterdir(): catch FileNotFoundError (a parallel
   drain --all from another session can remove a batch dir mid-iteration).
   Skip the missing batch and continue.
3. cmd_watch fswatch stdout.fileno() drain: also catch ValueError. A
   racing reaper can close the pipe between poll() and fileno(); the
   resulting ValueError used to crash the watch loop.

The drain refactor consolidates the three iterdir() callsites (default,
--all, --batch-id) into a single try-protected block whose result is
filtered to the requested mode.
@elijahr

elijahr commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Agent2Agent watch-chain architecture, providing near-real-time message delivery for idle sessions without polling. It introduces the /a2a slash command, adds watch, drain, and _open_state subcommands to the helper script, and integrates liveness and orphan-chain detection into the spellbook hooks. Documentation and tests have been updated to reflect the transition from listen/unlisten to open/close terminology, and the version is bumped to 0.64.0. I have no feedback to provide.

@elijahr elijahr merged commit d9e2a96 into main May 8, 2026
8 checks passed
@elijahr elijahr deleted the feat/a2a-watch-chain branch May 26, 2026 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant