Skip to content

Commit be81446

Browse files
authored
feat(agents+installer): narrowing-role subagents (Phase 5) + spellbook-cco fork wiring (Phase 7) (#284)
* docs: regenerate stale skill and command pages Output of scripts/generate_docs.py against the current source-of-truth in skills/ and commands/. No source changes; just catches up the docs mirror with drift accumulated since the last regen. * feat(installer): add install_agents component for symlink discovery New installer/components/agents.py mirrors create_skill_symlinks / create_command_symlinks with two divergences: skip+warn on user-authored target files (don't clobber), and verify_source narrowing on uninstall. Tests cover install (idempotence, broken-symlink replace, user-file preservation, dry-run, mixed states, non-md filter, source resolution) and uninstall (spellbook-only narrowing, user-file preservation, missing-dir, idempotence). 17 cases total. Wiring into ClaudeCodeInstaller and the cleanup list lands in a separate commit (Task A2). * refactor(installer): align install_agents action labels with ui.py vocabulary Code review and fact-check on commit 3d88d0f2 surfaced three findings: - "replaced" action label was not recognized by installer/ui.py (which knows installed/upgraded/created/skipped/failed/unchanged) and fell through to a generic icon. Renamed to "upgraded". - Comment on the broken-symlink uninstall branch over-broadened the check to "raw target lies inside agents dir" when the code actually checks the raw target's parent dir; corrected. - Test name "test_non_md_files_in_source_ignored" only covered one of three behaviors the body verified; renamed (or split) for clarity. * feat(installer): wire install_agents into ClaudeCodeInstaller Adds the agents step to ClaudeCodeInstaller.install() (after scripts, before CLAUDE.md) and the symmetric removal to .uninstall(). Adds a narrowed install-time pre-cleanup pass that purges stale agent symlinks (renamed/removed source files, including those pointing into a prior worktree's agents/) without clobbering currently-valid entries, so re-install remains idempotent. Integration tests verify: install creates resolvable symlinks at the target, dry-run is side-effect-free, the install step's InstallResult reports counts, uninstall preserves user-authored files, re-install is idempotent, and stale symlinks from a prior worktree are purged. * refactor(installer): tighten agents cleanup heuristic and aggregation message Code review and fact-check on commit 65253279 surfaced four findings: - The install-time cleanup pre-pass marked an entry stale when its basename was not in the current spellbook source set, which would unlink user-authored alias symlinks pointing to valid agent files under non-canonical names. Switched to a resolved-target-path set so aliases pointing to valid current sources are preserved. - The broken-symlink fallback heuristic relied on a "spellbook" substring match, which produced false positives for broken symlinks pointing at a *different* spellbook installation. Tightened to compare the broken symlink's parent dir against the current installation's agents/ dir via exact resolved-path equality. - The aggregation message dropped the failed count, leaving the user with success=False and no breakdown when a partial failure occurred. Now includes "{N} failed" when nonzero. - Reworded the uninstall-side comment block to accurately describe what cleanup_spellbook_symlinks does (it also removes any broken symlink unconditionally, not just spellbook-substring matches). * feat(agents): add canonical implementer.md and schema validation test Adds the canonical narrowing-role agent (implementer.md) as the seed for the remaining 8 new agents. tools: frontmatter narrows to Edit, Write, Read, Grep, Glob, Bash — Bash is gated by the L4 PreToolUse hook for tier-based authorization. Body uses the 5-section schema (Purpose, Tools, Output Schema, Guardrails, Constraints). Schema test parametrizes over the 9 expected new agents (validating whichever currently exist) and the 7 existing agents (byte-identical snapshot). The snapshot file at tests/test_security/agent_snapshots.json guards against accidental modification of the existing 7 during WI-5 work. code-reviewer.md and justice-resolver.md are exempt from the tools-presence check pending a separate cleanup task. * refactor(agents): describe current bash gate, harden schema test Fact-check on commit 508665c2 found that implementer.md described the WI-6 L4 tier classifier as the bash gate, but that infrastructure doesn't exist on main yet (lives on security-architecture-phase-3). Rewrote the description, body, guardrails, and constraints to describe the spellbook PreToolUse bash gate that exists today — pattern-based denial of dangerous commands. WI-6 can later expand the language when tier-based narrowing lands on main. Code review additionally suggested: - Body section ordering test now matches headings as full lines, immunizing against future agent files that include literal '## X' inside fenced code blocks. - Replaced opaque "F3" reference in TOOLS_EXEMPT_EXISTING comment with inline rationale. - Clarified the test_tools_exempt docstring to enumerate the three steps required when removing the exemption (regenerate snapshot, delete entry, delete test). * feat(agents): add 8 narrowing-role subagents Author the 8 remaining narrowing-role subagent files in agents/, each conforming to the canonical 5-section schema established by agents/implementer.md (Task B). Each file's `tools:` frontmatter is a narrowing list — the agent has access to those tools and only those tools, never more — and matches the canonical mapping in tests/test_security/test_agent_frontmatter.py. - web-researcher: WebFetch, WebSearch, Read. Quarantined web research returning structured findings. Flagged in Constraints as requiring WI-8 (devcontainer) before being safe to dispatch in production. - git-committer: Bash, Read. Local git only — read, status, diff, log, add, commit, branch, fetch, worktree. Never pushes. - git-pusher: Bash, Read. Only `git push`. Operator confirmation required for every push. - pr-creator: Bash, Read. Only `gh pr create`, `gh pr edit`, `gh pr view`, `gh pr diff`, `gh pr list`. No merge or ready-mark. - pr-merger: Bash, Read. Only `gh pr merge`, `gh pr ready`. Operator confirmation required for every merge and ready-mark. - jira-reader: Read (frontmatter); Atlassian MCP read tools at runtime. Read-only Jira inspection, no mutations. - jira-mutator: Read (frontmatter); Atlassian MCP read+write tools at runtime. Operator confirmation required for state transitions. - test-runner: Bash, Read, Grep. Test commands only (pytest, npm test, etc.); no source edits, no git side effects. The schema test parametrizes over `_existing_new_agents()` so all 9 narrowing-role agents (the canonical implementer plus these 8) now validate against the canonical contract: 55 passed, 1 skipped (regen mode). Full security suite remains green: 389 passed. * refactor(agents): align bash-gate framing with current gate scope Two groups of fixes from code review + fact-check on the 8 narrowing-role agent files added in the prior commit. Group A — gate-attribution accuracy. Five files (git-committer, git-pusher, pr-creator, pr-merger, test-runner) over-claimed that the spellbook PreToolUse bash gate blocks specific git/gh subcommand patterns (e.g. `git push`, `git reset --hard`, `gh pr merge`, `gh pr ready`, `--force` to protected branches, `gh pr merge --admin`, "any git side effects"). The actual gate blocks generic escalation/destructive shell patterns, not per-agent subcommand allow-lists. Softened the gate-claim language to match the canonical seed in agents/implementer.md ("destructive shell idioms, exfiltration shapes") and reframed the guardrails so operator confirmation and agent role separation are the primary enforcement, with the bash gate as defense-in-depth. The role-contract MUST NOT bullets stay — only the false gate attribution is removed. Group B — reviewer wording fixes: * jira-mutator description: "expected" → "REQUIRED" for state transitions, aligning with sibling git-pusher / pr-merger phrasing and the existing body Guardrail. * git-pusher Output Schema: renamed `remote` → `remote_refspec` and rewrote the description to clarify the `<remote>/<ref>` format with examples that include nested refs. Updated the `required` array. * git-pusher upstream-verify guardrail: rewrote to handle the first-push case (no upstream yet) without ambiguity. * test-runner Tools: dropped the `cat` example; `Read` handles file content and `ls`/`find` cover read-only Bash verbs. Added a sentence explaining why `Glob` is omitted (broader than the test-runner's scoping discipline; `find` invocations inherit the bash-gate's scoping constraints). * jira-reader / jira-mutator descriptions: appended the narrowing-list + MCP runtime-discovery semantics so the declarable `tools: Read` is no longer surprising. Verifications: - tests/test_security/test_agent_frontmatter.py: 55 passed, 1 skipped - tests/test_security/ full suite: 389 passed, 1 skipped - tests/installer/ full suite: 61 passed - Forbidden term scan (L4 / tier classifier / tier-based / capability escalation / WI-4): zero matches. - Forbidden claim scan (gate also blocks / gate blocks gh pr / gate blocks git push|reset|checkout|stash): zero matches. agents/implementer.md is the canonical seed and was not modified. * test(hooks): isolate Stop-hook tests from real cache and worker-LLM gate `TestStopHookHarvest::test_stop_hook_harvests_single_candidate` (and sibling tests in the same class) was order-dependent in two ways: 1. `STOP_HARVEST_CACHE_PATH` resolves at import time to `~/.local/spellbook/cache/last-stop-harvest.json`. The class's methods that exercise the success path call `_record_stop_harvest` on the real on-disk cache. Subsequent runs whose `tmp_path/ session.jsonl` collided with a cached entry got short-circuited at the idempotency check, leaving `mock_http.calls == []` and tripping the assertion. Reproduced deterministically by running with `--basetemp=<previously-cached-pytest-N-dir>`. 2. `_handle_stop` calls `feature_enabled("transcript_harvest")` from `spellbook.worker_llm.config`, which reads the user's real `spellbook.json`. If any test (or env override) flipped that flag on, the hook took the worker-LLM branch and never POSTed to `/api/memory/unconsolidated`. Add an `autouse` fixture on `TestStopHookHarvest` that: - rebinds `STOP_HARVEST_CACHE_PATH` to a per-test path under `tmp_path` (no leakage to or from `~/.local`); - forces `feature_enabled` to return False so the regex path is taken regardless of user config or leaked monkeypatches. Tests that already monkeypatch `STOP_HARVEST_CACHE_PATH` themselves (idempotent / records-sha / retries-on-failure / network-failure) still override the autouse stub via `monkeypatch.setattr` ordering; no change to their behavior. No production code changes. Full suite: 4564 passed, 132 skipped. * test(agents): close green-mirage gaps in WI-5 schema and uninstall coverage Address findings F1-F4 from the green-mirage audit on the WI-5 surface. F1: Add `test_all_expected_new_agents_exist` — set-equality assertion that every key in EXPECTED_NEW_AGENTS has a matching file in agents/. Without this, `_existing_new_agents()` would silently shrink its parametrize input if an expected agent file were deleted, and the per-agent schema tests would still pass on the smaller set. F4: Remove `test_canonical_implementer_exists`. F1's set-equality check strictly subsumes it (implementer.md is in EXPECTED_NEW_AGENTS). F2: Strip fenced code blocks before scanning for required headings in `test_new_agent_has_required_body_sections_in_order`. The prior "\n{heading}\n" substring scan would still false-match a heading literal- embedded in a fenced block; the docstring overclaimed immunity. New helper `_strip_fenced_blocks` (DOTALL ``` ... ``` removal) plus updated comment make the assertion match its stated intent. F3: Add two tests in `TestUninstallAgents` covering the previously unreached broken-symlink branch in `uninstall_agents`: - `test_uninstall_removes_broken_symlink_into_spellbook`: positive case, parent-dir resolves into $SPELLBOOK/agents/ -> link is removed. - `test_uninstall_preserves_broken_symlink_pointing_elsewhere`: negative case, parent-dir is OUTSIDE the spellbook -> link is preserved and its readlink target is unchanged. Verification: - tests/test_security/test_agent_frontmatter.py + tests/installer/test_agents_symlink.py: 82 passed, 1 skipped (snapshot regen). - Full suite: 4566 passed, 132 skipped, 531 deselected. * docs(changelog): add Phase 5 narrowing-role subagents entry Document the user-facing surface introduced by WI-5: nine narrowing-role agents in agents/ and the per-config-dir symlink installer that lands them under $CLAUDE_CONFIG_DIR/agents/. Covers the narrowing-vs-granting semantics so users understand the tools: frontmatter contract. * docs(claude): record PR review bot info * test(installer): register posix_only/windows_only pytest marks Add two new pytest marks for platform-conditional skipping: - posix_only: auto-skipped on Windows (sys.platform.startswith('win')) - windows_only: auto-skipped on POSIX Wire them into tests/conftest.py's pytest_collection_modifyitems hook alongside the existing requires_memory_tools and docker handling. Foundational scaffolding for WI-7 (security architecture phase 7); subsequent platform-dispatched installer tests rely on these marks. * test(installer): polish posix_only/windows_only mark scaffolding Address review nits: prose marker descriptions, module docstring explaining placement under tests/installer/, clearer comment on _FakeConfig pytest plugin shape, and simplified _patch_platform fixture docstring. * feat(sandbox): pin cco at SHA 9744b9f for L5 Linux sandbox Pin nikvdp/cco at commit 9744b9f (2026-04-30) inside scripts/spellbook-sandbox per Sec 9.3 audit (2026-05-06). Linux bwrap backend audited PASS: kernel-managed /proc in unshared PID namespace, --cap-drop ALL, --die-with-parent, seccomp TIOCSTI/TIOCLINUX filter. macOS L5 documented as intentionally absent; macOS sessions rely on L4 PreToolUse hooks (shipped) and L6 devcontainer (WI-8 planned) instead of cco-based isolation, since cco's sandbox-exec profile is write-protect-only ((allow default) baseline, no process-fork/exec deny, no DYLD scrubbing). Pinning mechanism: a CCO_PINNED_SHA constant near the top of the script plus a runtime gate that compares the short SHA reported by 'cco --version' (cco resolves --version from 'git rev-parse --short HEAD' of its install dir at ~/.local/share/cco). The gate aborts on mismatch unless the operator sets SPELLBOOK_SANDBOX_SKIP_CCO_PIN=1 (audited downgrade). Tests in tests/installer/test_aliases.py (posix_only): - pin SHA parsed from a structured header line == 9744b9f - Sec 9.3 audit doc filename present - macOS L5 rationale anchor phrase present in script or sidecar - executable bit preserved - --help smoke test (cco-not-found stderr is exact-matched) Part of WI-7 security architecture phase 7. * test(sandbox): tighten cco SHA pin tests + improve error guidance Address review findings from Task 2 of WI-7: - H1: integration test now branches on three exit conditions (success, cco-absent, SHA-mismatch); robust on developer machines with cco installed at a non-pinned SHA - M1: new posix_only test simulates fake cco shim with bad/empty/ mismatched --version output, asserts fail-closed exit - M2: SHA-mismatch error message now includes cco install URL - L1: documented legitimate use case for SPELLBOOK_SANDBOX_SKIP_CCO_PIN - L2: moved posix_only mark from module-level to per-test, allowing static-content checks to run on all platforms * docs(sandbox): correct WI attribution for L4 components bashlex AST is WI-6a's deliverable; secret-path denylist is WI-6d. Parenthetical now lists both, matching impl plan attribution. * feat(installer): add install_aliases_windows() stub for Q-O deferral Add a documented stub for Windows alias install that returns a noop dict matching install_aliases()'s return shape. Windows alias install and sandbox path are deferred to a later WI (Q-O); cco lacks a Windows backend and spellbook-sandbox is POSIX-only. The stub satisfies the API contract for the upcoming claude_code.py platform dispatch (Task 4 of WI-7) without committing to Windows behavior that hasn't been audited. Tests at tests/installer/test_aliases_windows.py marked windows_only; skipped on POSIX dev machines; exercised when a Windows CI is added. * fix(installer): polish Windows alias stub per review Address Phase 4.5 review findings on Task 3/5: - M1: tests now monkeypatch Path.home and snapshot it too, catching any regression that writes to the user's actual rc files - M2: skipped_reason switched to human-readable prose, matching the existing skip-path string style - L1 (pre-existing): install_aliases() docstring now correctly types rc_path as str | None (matches actual return on skip paths) - L2: stub docstring now sketches the production implementation scaffolding (PowerShell $PROFILE, marker reuse, return shape) - N1: stub log message surfaces dry_run for operator visibility * feat(installer): platform-aware Claude Code alias dispatch Add a dispatch helper in installer/platforms/claude_code.py that routes alias install based on get_platform(): - LINUX -> install_aliases() (wraps `claude` -> spellbook-sandbox) - MACOS -> documented noop; cco sandbox-exec is insufficient per Sec 9.3 - WINDOWS -> install_aliases_windows() (Q-O stub from Task 3) Unknown platforms raise NotImplementedError to surface future enum additions explicitly. The existing install.py:1148 interactive offer remains as a separate opt-in path; this is the platform-install-time path that runs unconditionally during Claude Code platform install. * fix(installer): harden Claude Code alias dispatch per review Address review findings on commit 24ba6861 (Phase 4.5 review of WI-7 Task 4/5): - F1: dispatch call in ClaudeCodeInstaller.install() is now wrapped in try/except, so a bad rc-file does not abort the security-critical hooks install step that follows - F2: LINUX branch now checks shutil.which("cco") and noops with a clear skipped_reason if cco is missing, matching the legacy install.py:1126 gating pattern - F3: new behavioral test verifies the dispatch is actually called from install() and produces an "aliases" InstallResult in the returned list - F4: new test pins the F1 contract -- _install_claude_code_aliases raising does NOT abort install(); subsequent components still run (CLAUDE.md, hooks) - F5/F6: comments clarify the success=True+action=skipped convention and document the unknown-platform test's scope (control-flow vs contract assertion) The existing LINUX dispatch test now stubs shutil.which so the cco gate is deterministic on developer machines. A new test_dispatch_linux_skips_install_aliases_when_cco_missing covers the F2 negative path with caplog assertion on the operator-facing message. * docs(readme): document Windows alias + sandbox deferral (WI-7 Q-O) Add a Windows subsection under Sandboxed Usage that records the Phase 7 deferral of Windows alias install and sandbox path (open question Q-O). Documents the current installer noop behavior, the cmd.exe doskey limitation, and points Windows users at WSL2 + the Linux install path for full sandboxing. * fix(docs): tighten Windows TBD note per review - Direct WSL2 readers to install cco inside the WSL distro before running the installer (L1) - Tighten the direct-invoke bullet now that the WSL2 bullet covers cco placement (N1) - Replace "documented noop" with "intentional noop" for clarity (N2) * test: mark pre-existing test_aliases.py POSIX-only These tests assume POSIX rc-file paths and "\n" line endings; mark them POSIX-only so Windows runners skip them via the conftest auto-skip hook. * chore: add ruff to dev deps + minimal config Adds `ruff>=0.6.0` to the dev dependency group and a minimal `[tool.ruff]` configuration block (line-length=100, target-version=py311, default lint rules, `__init__.py` F401 carve-out, and `versions/` Alembic revisions excluded). Also fixes the `bigfoot[http]` / `python-tripwire[http]` extras: those extras were dropped in python-tripwire 0.21+, leaving the test suite unable to import `tripwire.plugins.http` (the `bigfoot.http.mock_response` shim used in tests/test_worker_llm/conftest.py). Drops the `[http]` markers and adds explicit `requests>=2.31` since the http plugin needs it. Without this, `uv run pytest` fails on first worker_llm test. target-version is py311 (not py310 to match `requires-python`) because the codebase uses 3.11 syntax (`except*` in spellbook/admin/routes/ws.py). That mismatch is pre-existing and out of scope for this commit; ruff's target-version follows the actual code. * style: ruff auto-fix + manual cleanup (pre-existing violations) Resolves all 506 ruff lint violations on the branch so that `uv run ruff check .` exits 0. Approach (per plan): 1. `ruff check --fix` (safe-fix only) — 394 violations auto-fixed (mostly F401 unused imports, F541 empty f-strings). 2. Manual fixes for the 111 remaining (F841 with unsafe-fix, plus E402, E731, E702, E741, E712, F821). Notable manual carve-outs: - installer/compat.py: restored re-exported names that ruff removed as F401 (`UnsupportedPlatformError`, `ServiceManager`, `get_data_dir`). These are part of the documented re-export contract; added explicit `# noqa: F401` markers. - tests/test_spellbook_mcp/test_server_no_skill_tools.py: restored `from spellbook import skill_ops` inside `with pytest.raises(ImportError):` — the import IS the test. - spellbook/sessions/watcher.py: restored `import asyncio` (F821 — the auto-fix removed it but lines 275/278 use `asyncio.get_running_loop()` / `asyncio.run()`). - E402 sites with `logger = logging.getLogger(__name__)` between imports carry a `# noqa: E402` with rationale. Test status: - `uv run ruff check .` → All checks passed! - `uv run pytest -x` → 4508 passed, 135 skipped, 531 deselected. * fix(installer): broaden F1 test assertion + fix audit doc path reference - F1 contract test now also asserts `mcp` component runs after alias dispatch raises (per Phase 4.6.4 fact-check finding on test_install_does_not_abort_when_dispatch_raises) - Drop non-existent `docs/security/audits/sec_9_3_result.md` path from _install_claude_code_aliases docstring; the script header carries the canonical path * test(installer): update test_aliases.py for spellbook-cco fork (Task 7) Flips the audit-pinned SHA constant to the elijahr/cco fork tip (d7044ef) and adds EXPECTED_VANILLA_CCO_SHA (9744b9f) as the fallback constant for SPELLBOOK_USE_VANILLA_CCO=1 rollback coverage. Updates MACOS_RATIONALE_PHRASE to match the revised header phrasing landed by Task 6 in scripts/spellbook-sandbox. Renames the fail-closed-gate shim from cco to spellbook-cco so the three SHA-mismatch parametrize cases drive the gating binary the script now actually checks. Updates --help-runs-cleanly to look for the new error shape. Adds a new truth-table-row test (test_sandbox_both_skip_vars_set_to_one_warn_and_skip) covering the SKIP_PIN=1 + SKIP_CCO_PIN=1 case, which the existing class did not exercise: new var wins skip precedence (gate skipped), legacy var still triggers DEPRECATION because operator is using the deprecated name. Consolidates the Task-6-local EXPECTED_FORK_SHA / EXPECTED_VANILLA_SHA constants to alias the module-level constants so the SHA values live in exactly one place. * feat(installer): add spellbook_cco component for fork-pinned cco wrapper Introduce installer/components/spellbook_cco.py: clones the elijahr/cco fork at audit-pinned SHA d7044ef into ~/.local/spellbook/cco, writes ~/.local/bin/spellbook-cco with a managed-tag header, and verifies the pin via two-step check (git rev-parse + cco --version awk-parse). The wrapper exposes the hardened fork's macOS SBPL profile (DYLD scrub + file-read denies + scoped process-exec deny + mach-priv-task-port deny) to the rest of spellbook. emit_rollback_warning() is exported as the single source of truth for the SPELLBOOK_USE_VANILLA_CCO=1 rollback warning. SPELLBOOK_INSTALLER_SKIP_FORK_PIN=1 bypasses pin verification with a stderr WARNING. * test(installer): add tests for spellbook_cco component Adds 22 tests covering Tier-0 constants, Tier-1 mocked-shape paths, and Tier-2 paths exercised against a file:// bare-repo fixture whose stub emits the fixture's actual head_sha. Tests cover dry-run shape, real clone-and-verify, rollback on each pin-verify step in isolation, idempotency, namespaced wrapper tag, untagged-wrapper warning, dirty/wrong-remote aborts, PATH warnings, SKIP_FORK_PIN bypass, vanilla-rollback env var, Windows shape-noop, uninstall happy path, untagged-preserve, idempotency, dry-run, and a darwin smoke. * feat(installer): wire spellbook_cco install/uninstall into Claude Code platform Adds a once-globally install_spellbook_cco hook to ClaudeCodeInstaller.install() that runs before the per-config-dir loop, and a mirroring uninstall_spellbook_cco call in uninstall(). Merges the Linux and macOS branches in _install_claude_code_aliases into a shared POSIX path that gates on shutil.which(sandbox_binary), where sandbox_binary defaults to 'spellbook-cco' (or 'cco' under SPELLBOOK_USE_VANILLA_CCO=1). Updates the L5 macOS rationale comment block to cite the elijahr/cco fork's hardened SBPL profile, and rewrites all 'cco not on PATH' diagnostics to point at re-running install.py. The rollback WARNING is emitted via the imported emit_rollback_warning() so installer-side warnings stay byte-equal across call sites. * refactor(installer): tui detects spellbook-cco binary on PATH Replaces shutil.which('cco') in the TUI install offer with shutil.which(sandbox_binary), where sandbox_binary defaults to 'spellbook-cco' (or 'cco' under SPELLBOOK_USE_VANILLA_CCO=1). Updates the rationale string to reference the audit-pinned elijahr/cco fork and removes the inline nikvdp/cco install URL in favor of pointing operators at install.py. Emits the canonical rollback WARNING via emit_rollback_warning() under the env-var override. * refactor(installer): install.py offers spellbook-cco aliases Replaces shutil.which('cco') in the post-install alias offer with shutil.which(sandbox_binary), and extracts the offer block into _offer_sandbox_aliases() so the TUI and CLI flows share a single implementation. Under SPELLBOOK_USE_VANILLA_CCO=1 the offer falls back to gating on 'cco' and emits the canonical rollback WARNING via emit_rollback_warning(). Adds tests/installer/test_cco_entry_points.py with 7 tests covering both tui.py and install.py wiring: default-path detection, env-override routing, WARNING emission on env override, and WARNING-absence regression guards on the default path. * refactor(scripts): spellbook-sandbox gates on spellbook-cco fork pin Rewrites scripts/spellbook-sandbox to gate on the elijahr/cco fork's spellbook-cco wrapper and verify the d7044ef pin via 'spellbook-cco --version'. SPELLBOOK_USE_VANILLA_CCO=1 falls back to vanilla 'cco' at the prior 9744b9f pin and emits a script-prefixed WARNING that the SBPL profile is bypassed. Adds the SPELLBOOK_SANDBOX_SKIP_PIN env var; SPELLBOOK_SANDBOX_SKIP_CCO_PIN remains accepted for one release with a DEPRECATION warning per a documented truth table. The truth-table predicates use POSIX ${VAR+x} to distinguish unset-vs-empty so set-but-empty does not fall through to the legacy var. * docs(readme): describe spellbook-cco onboarding instead of vanilla cco install.py now writes ~/.local/bin/spellbook-cco automatically (audit-pinned elijahr/cco fork at d7044ef); the wrapper is the only spellbook-supported entry point. Vanilla nikvdp/cco is retained solely for the post-deploy rollback path via SPELLBOOK_USE_VANILLA_CCO=1. References docs/security.md for the full threat model and hardening details. * test(installer): assert spellbook-cco which() interaction in alias-dispatch tests WI-7's `_install_claude_code_aliases` calls `shutil.which("spellbook-cco")` to gate the per-dir alias install on the wrapper's presence. Tripwire's SubprocessPlugin always intercepts `shutil.which`; without an explicit `assert_which` after the sandbox closes, the strict verifier raises `UnassertedInteractionsError` at teardown when the binary is absent from PATH. This is the case in CI (ubuntu-latest and the non-dev macOS runner shape) but NOT on the macOS dev environment, where `spellbook-cco` IS on PATH during local runs -- masking the failure during pre-merge verification. The dispatcher only runs on POSIX (LINUX/MACOS); Windows takes the `install_aliases_windows` branch which does not call `shutil.which` for the cco wrapper. Hence the new helper / inline assertion is gated on `sys.platform != "win32"`, mirroring the existing `_assert_powershell_which_if_windows` pattern. Tests fixed (all in alias-dispatch sandboxes): - tests/installer/test_claude_code_wires_default_mode_and_permissions.py: - test_install_emits_default_mode_result - test_install_emits_permissions_result - test_install_writes_acceptedits_default_mode - test_uninstall_emits_default_mode_and_permissions_results - test_uninstall_clears_managed_default_mode_from_settings - tests/installer/test_l2_derivation.py: - test_claude_code_installer_uses_derived_deny The assertion is exact-equality on both `name` and `returns`: any change to the dispatcher's gating call (different binary name, dropped call, duplicate call, real-path return) would fail the assertion. * fix(tests): pin LF line endings for agent snapshot test The WI-5 byte-identity snapshot test (tests/test_security/test_agent_frontmatter.py:: test_existing_agent_byte_identical_to_snapshot) computes SHA-256 over the raw bytes of each existing agent file and compares against a committed snapshot generated on macOS (LF line endings). On Windows CI, git's default checkout converts LF to CRLF, so Path(...).read_bytes() returns CRLF-encoded bytes whose SHA does not match the committed LF-based snapshot. All 7 existing-agent snapshot parameter cases fail on Windows for this reason. Verified locally: the same file yields different SHAs under LF vs CRLF. Pin LF on the agent markdown files and on the snapshot file itself via .gitattributes. This makes byte-identity platform-agnostic by construction without changing the snapshot's semantics: "byte-identical to the committed snapshot" continues to mean exactly that, regardless of host OS or git autocrlf configuration. Cannot directly verify Windows behavior from macOS; the fix is mechanical (git's documented gitattributes contract) and the local test suite continues to pass (55 passed, 1 skipped). * docs(security): rename cco section to spellbook-cco and document rollback Rename '## Sandboxing with cco (macOS)' to '## Sandboxing with spellbook-cco (Linux + macOS)' so the auto-anchor matches the README link target ('#sandboxing-with-spellbook-cco-linux--macos'). Replace 'cco' with 'spellbook-cco' across the section body and CLI examples where the reference is to spellbook's own invocation; preserve 'cco' in the upstream nikvdp/cco attribution and in vanilla-fallback prose. Replace the now-stale 'curl | bash' install snippet with a one-liner noting that the spellbook installer handles spellbook-cco. Add a new '### Rolling back to vanilla cco' subsection covering the SPELLBOOK_USE_VANILLA_CCO=1 env override: when to use it (catastrophic post-deploy breakage), the SBPL/DYLD hardening that the override bypasses, and how to remove the override afterward. The runtime WARNING wording remains owned by installer/components/spellbook_cco.py:_WARNING_USE_VANILLA_CCO; this doc paraphrases the *why* rather than duplicating the warning bytes. * docs(changelog): add spellbook-cco fork integration entry Document the user-facing surface introduced by Task 13 of the spellbook-cco integration: new installer/components/spellbook_cco.py component, hardened SBPL profile (DYLD scrub + dylib-path denies + scoped process-exec + mach-priv-task-port deny), pin-bypass and rename of SPELLBOOK_SANDBOX_SKIP_PIN env var, macOS feature parity via install_spellbook_cco chained before install_aliases, and SPELLBOOK_USE_VANILLA_CCO=1 rollback path. Placed under the [0.57.0] banner adjacent to the prior Phase 5 narrowing-role subagents entry to keep all security architecture Phase 5 work grouped on this branch. * test(installer): mark spellbook_cco install tests posix_only The install_spellbook_cco() function short-circuits to {"action": "skipped", "skipped_reason": "spellbook-cco unavailable on Windows"} when os.name == "nt". Eleven tests in tests/installer/test_spellbook_cco.py drive the install path expecting the POSIX result shape (action="installed", a clone on disk, a wrapper file with shebang, etc.) and so fail on Windows CI under the short-circuit. Mark those eleven tests with @pytest.mark.posix_only so the existing conftest.py skip rule excludes them on Windows. The Windows shape is already covered indirectly by the constant tests at the top of the file (which import the module and read its public attributes), and the early-return is a one-line constant that does not need its own test. Tests deliberately left unmarked: the Tier-0 constant assertions (test_pinned_sha_constant et al.), the dry-run short-circuits (test_install_dry_run_does_not_clone, test_uninstall_dry_run_does_not_remove) which return before the Windows check, and all uninstall tests since uninstall_spellbook_cco has no Windows short-circuit. * test(installer): tighten green-mirage assertions on canonical strings Phase 4.6.3 green-mirage audit surfaced 6 IMPORTANT findings on the spellbook-cco fork integration test suite — all substring-on-known- canonical patterns where production code is correct but the assertions were too lax to defend against silent prose drift. - Import _WARNING_USE_VANILLA_CCO / _WARNING_PATH_NOT_SET / _WARNING_SKIP_FORK_PIN as the chokepoint and assert full stderr equality (or k * canonical for double-emit codepaths) instead of fragmented "WARNING:" + "VAR=1" substring pairs. - Replace partial InstallResult field-by-field assertions in chain- wiring tests with full-record equality on the dataclass. - Add a shared events list to test_install_chains_install_spellbook_cco to verify that install_spellbook_cco runs strictly before the per-dir alias dispatcher, matching the docstring contract. - Tighten lax message substrings ("cco" / "spellbook-cco") to canonical skipped_reason strings produced by _install_claude_code_aliases. No production code changes. 63 tests pass, ruff clean. * fix: remove dead expressions left by ruff F841 cleanup Three call-statements whose return values were dropped during the branch-wide `ruff --fix` pass (commit 983fe62b) but whose expressions remained as orphan statements with no side effect: - spellbook/admin/routes/fractal.py:73 — `max(1, math.ceil(total / per_page))` was originally `total_pages = ...`. The variable is unused (pagination is computed inside build_list_response), so the line is pure dead. Also drop the now-unused `import math`. - spellbook/cli/commands/session.py:119 — `getattr(args, "json", False)` was originally `json_mode = ...`. The variable is unused. - spellbook/mcp/routes.py:208 — `bool(body.get("is_primary", False))` was originally a temp assignment. The value is unused. Flagged by gemini-code-assist review on PR 284. Not the asyncio.to_thread call in routes/config.py:688 — that one is the actual write side effect, not dead. * test(installer): convert monkeypatch.setattr to tripwire in alias + cco entry-point tests Address gemini-code-assist HIGH findings on PR #284: replace monkeypatch.setattr / monkeypatch.setitem call sites in three tests/installer files with the canonical tripwire patterns: - tripwire.mock module-attribute mocks (get_platform, install_aliases, install_aliases_windows, install_spellbook_cco, uninstall_spellbook_cco, _install_claude_code_aliases, install_aliases module attrs). - tripwire.subprocess.mock_which / assert_which for shutil.which gates. - tripwire.mock for pathlib Path.home patching. Tests covered: - tests/installer/test_aliases.py (50 monkeypatch sites, 32 tests). - tests/installer/test_aliases_windows.py (2 sites, Path.home mocking). - tests/installer/test_cco_entry_points.py (9 sites). All tests pass locally. monkeypatch.setenv and monkeypatch.delenv are preserved per the AGENTS.md allowed-monkeypatch list (env vars only). Per AGENTS.md Testing with Tripwire canonical section: tripwire is the only acceptable mocking framework; monkeypatch.setattr on module attributes, class methods, or functions is forbidden. * fix(tests): missing WebSocketDisconnect import + align marker-description assertions CI failures from prior commit f5785298: - tests/admin/test_ws.py used WebSocketDisconnect without importing it - tests/installer/test_marks.py asserted old marker descriptions; pyproject.toml carries clearer text via a prior commit on main * test(diagram): convert test_diagram_update.py from unittest.mock to tripwire Addresses gemini HIGH finding at tests/unit/test_diagram_update.py:270 plus the rest of the file's unittest.mock.patch and mock.AsyncMock sites that gemini's single anchor finding stood in for. SUT refactor: - scripts/generate_diagrams.py: extract _get_repo_root() callable wrapping the bare REPO_ROOT module constant. Tripwire cannot intercept module- level non-callable attribute reads, so the indirection layer is what makes get_source_diff and the PATCH metadata path testable through the strict verifier. - main_async() now accepts an optional argv parameter so tests can drive argparse without monkeypatching sys.argv (also forbidden under the tripwire-only mocking policy). Test rewrite: - subprocess.run mocked via tripwire.subprocess.mock_run / assert_run (requires @pytest.mark.allow("subprocess")). - Module-level async/sync functions mocked via tripwire.mock("module:attr") with .returns()/.calls(coro). - builtins.input mocked via tripwire.mock("builtins:input").calls(...). - Asserting a function is NOT called is done by leaving it unmocked; tripwire's strict verifier raises UnmockedInteractionError if invoked. This is stricter than the old .assert_not_called() pattern. - Multi-mock assertion blocks wrapped in tripwire.in_any_order() since the discover/classify/generate fan-out has non-deterministic ordering. * test(memory): convert test_memory_auto_store.py from monkeypatch to tripwire Addresses gemini HIGH findings at tests/test_hooks/test_memory_auto_store.py:338 (STOP_HARVEST_CACHE_PATH constant) and :347 (function monkeypatches), plus all other monkeypatch.setattr sites in the file. SUT refactor (hooks/spellbook_hook.py): - Extract _get_stop_harvest_cache_path() callable wrapping the bare STOP_HARVEST_CACHE_PATH module constant. Tripwire cannot intercept module-level non-callable attribute reads, so the indirection layer makes the Stop-hook idempotency cache testable through the strict verifier. - Convert datetime.now(timezone.utc) in _emit_hook_error to use the existing _utcnow() helper. Test for oversized prompt can now freeze time via tripwire.mock('spellbook_hook:_utcnow') instead of replacing the module-level datetime class. Test rewrite: - _CountingMock + _make_counting helper: wraps tripwire.mock(...) with a fire counter and pre-stacks a FIFO budget of repeatable .calls(fn) configs (50 entries with required(False)). Required because the SUT invokes the same mock multiple times per test, and tripwire's FIFO semantics consume one entry per call. - mock_http controller exposes .track(counting_mock) and .drain(). drain() satisfies tripwire's strict 'every interaction must be asserted' contract by issuing one wildcard assert_call per recorded call, wrapped in in_any_order(). Content assertions on controller.calls remain in each test — strength is unchanged. - Autouse stop_hook_cache_path fixture (renamed from _isolate_stop_hook_state) registers both the cache-path and feature_enabled mocks via _make_counting, auto-tracks them on mock_http if present, and returns the per-test cache Path so tests that need to inspect the cache file have a stable handle. - Tests that needed monkeypatch.setattr(Path, 'home', ...) now use the stop_hook_cache_path fixture directly. Tests that previously monkeypatched _http_post to a flaky callable now use mock_http.set_handler() — a per-call hook on the controller that composes cleanly with the existing tripwire mock without re- registering the import-site mock mid-test. * test(cco): convert test_spellbook_cco.py from monkeypatch to tripwire The 32 monkeypatch.setattr sites in tests/installer/test_spellbook_cco.py replaced module-level Path/str constants and the _verify_pin helper, which tripwire cannot intercept directly. Two-part fix: 1. SUT: install_spellbook_cco / uninstall_spellbook_cco / _write_wrapper / _clone_or_fetch / _verify_pin gain keyword-only override args (repo_url, pinned_sha, wrapper_dir, wrapper_path, verify_pin_fn). Production callers leave them None and the SUT falls back to the existing module-level getters (_get_repo_url / _get_pinned_sha / _get_wrapper_dir / _get_wrapper_path); tests pass explicit values. 2. Tests: drop all monkeypatch.setattr on module constants. Tier 1 tests (subprocess plumbing mocked) use tripwire.subprocess.mock_run inside a sandbox; Tier 2 tests (real file:// fixture repo) run outside the sandbox with real subprocess and pass redirected paths directly to the SUT via the new keyword args. 22 tests pass. Production behaviour is byte-for-byte unchanged when callers do not pass the new keyword args. * refactor(agents): extract stale-symlink cleanup; handle relative targets Two MEDIUM gemini findings addressed in one commit: 1. installer/components/agents.py:235 -- the uninstall_agents broken- symlink path previously only honoured ABSOLUTE raw targets via raw_target.is_absolute(); relative symlink targets were silently skipped. Fix: resolve the raw target against entry.parent for the relative case so cleanup is robust to both ln-s styles. 2. installer/platforms/claude_code.py:412 -- the ~50-line install-time stale-symlink cleanup duplicated logic from uninstall_agents. Extracted to a new reusable helper cleanup_stale_agent_symlinks in installer.components.agents. The helper is the single source of truth for the staleness heuristic (resolved-target set + broken-link parent-dir match with strict=True/False fallback). claude_code's install path now calls the helper instead of inlining the loop. The helper deliberately differs from uninstall_agents in semantics: uninstall removes ALL spellbook-resolving symlinks; cleanup_stale only removes symlinks that point at spellbook agents that no longer exist (renamed / moved source files, worktree-switch stale). * fix(tests): correct tripwire required(False) ordering in Windows aliases tests `tripwire.mock(...).returns(value).required(False)` does NOT mark the already-registered entry as optional; `.required(False)` is a STICKY flag that applies to SUBSEQUENT .returns() / .calls() / .raises() calls on the same proxy. The two Windows-aliases tests had the flag AFTER the value-stacking call, so the lone FIFO entry inherited the default required=True. When ``install_aliases_windows`` did not invoke Path.home (the SUT's dry-run path short-circuits before the call), tripwire raised UnusedMocksError at teardown -- only visible on the Windows CI runner where these tests are not skipped. Fix: ``home_mock.required(False).returns(fake_home)``. The sticky flag is set before the FIFO push so the entry inherits required=False and no longer trips UnusedMocksError when the SUT short-circuits. * fix: address gemini MEDIUM findings (logging, type import, nested sandbox) Three findings from gemini's 2026-05-18 review: 1. hooks/spellbook_hook.py:1549 -- silent `except OSError: pass` in `_record_stop_harvest` swallowed disk-full / permission errors. Logger.warning now surfaces them; the operation remains best-effort (lossy cache is recoverable on the next harvest because dedup catches the re-emitted candidates). 2. installer/components/spellbook_cco.py -- `Callable[[Path, str], tuple[bool, str]] | None` annotation referenced an unimported name. Although `from __future__ import annotations` defers evaluation, static analysis still flags the undefined name. Added `from collections.abc import Callable` and dropped the string-literal quoting on the annotation. 3. tests/test_hooks/test_memory_auto_store.py:884 -- redundant nested `with tripwire:` inside an outer `with tripwire:` block. The outer sandbox is sufficient; the inner one was a leftover from the monkeypatch -> tripwire conversion. * fix(tests): access required() via __call__ MethodProxy on Windows tests The previous fix mis-targeted `.required(False)`: invoking it as `home_mock.required(False)` routes through `_BaseMock.__getattr__`, which constructs a fresh MethodProxy for a (non-existent) "required" method. Calling that proxy runs the standard MethodProxy.__call__ path, which asserts sandbox active and raises SandboxNotActiveError when the test sets up mocks BEFORE entering `with tripwire:`. The correct API (per other test_workflow_state_tools.py and test_spellbook_mcp/test_spawn_session.py usages) is to access the __call__ MethodProxy explicitly: home_mock.__call__.required(False).returns(fake_home) This goes directly to MethodProxy.required (line 89 of _mock_plugin.py), which sets the sticky flag without invoking __call__ -- and so doesn't need an active sandbox. * fix: gemini cycle-3 findings (stub class, silent errors, mid-file import) Four findings from gemini's 2026-05-18T03:41 review: 1. tests/unit/test_diagram_update.py:79 (HIGH) -- the _StubAgentClient class and its _stub_client_returns / _stub_client_raises factories were hand-rolled stand-ins for the SDK AgentClient, forbidden per AGENTS.md. Replaced with `types.SimpleNamespace(run=async_callable)` helpers (_agent_client_returning / _raising / _capturing). The SimpleNamespace is a stdlib data carrier, not a class that exists only to stand in for a real dependency. The two inline _CapturingClient classes in TestPatchDiagram and the classify- prompt test are removed in favour of the same shared helper. 2. installer/components/spellbook_cco.py:530 (MEDIUM) -- the `shutil.rmtree(..., ignore_errors=True)` rollback silently swallowed permission/IO failures. Switched to `shutil.rmtree(..., onerror= _onerror)` with a logger.warning callback so disk-full or chmod issues are observable. The rollback remains best-effort: the SUT still returns the canonical pin-verification skipped-reason regardless of removal success. 3. spellbook/sessions/skill_analyzer.py:418 (MEDIUM) -- the broad `except Exception: continue` block in skill-usage aggregation silently dropped malformed sessions. Added module-level logger and `logger.warning("Skipping malformed session at %s: %s", path, exc)` so corrupted transcripts surface in operator logs rather than vanishing from the aggregate. 4. tests/unit/test_stint_tools.py:124 (MEDIUM) -- the mid-file import of spellbook.coordination.stint at line 116 was deferred behind a schema-test block from a much earlier code structure. With the autouse `isolated_db` fixture handling init_db, the deferral is no longer necessary. Moved the import to the file header alongside spellbook.core.db. * fix: gemini cycle-4 findings (hardcoded audit path, memory test mocks) Two real findings from gemini's cycle-4 review: 1. installer/components/spellbook_cco.py:185 -- hardcoded `Users-eek-Development-spellbook` in the wrapper template's Audit comment baked the developer's project path into every installation. Replaced with runtime-computed project-encoded path derived from the installer module's repo root (`Path(__file__).resolve().parent.parent.parent`). New helpers (with tripwire-callable indirection): - _get_spellbook_repo_root() - _encode_project_path() _wrapper_template, _write_wrapper, install_spellbook_cco gain a keyword-only `spellbook_repo_root` override so tests pin the expected encoding deterministically. Production callers leave it None and fall back to _get_spellbook_repo_root(). The duplicated encoding helper (instead of importing from spellbook.memory.claude_memory._encode_project) is intentional: the installer component runs in the bootstrap path before the spellbook venv is guaranteed importable. 2. tests/test_cli/test_memory_cmd.py:69, :91 -- two unflagged `monkeypatch.setattr` sites converted to tripwire.mock per AGENTS.md "Testing with Tripwire" policy. Three other cycle-4 findings rejected as hallucinations with inline rationale on the gemini threads: - _utcnow "missing" (defined at hooks/spellbook_hook.py:73) - Platform/get_platform "wrong module" (canonical site is spellbook.core.compat) - tripwire.in_any_order "doesn't exist" (it does; tripwire.unordered does not) Test coverage: 230 passed, 4 skipped on installer/ + memory_cmd. * fix: gemini cycle-5 findings (wrapper quoting, silent errors, tripwire assert) Four real findings from gemini's cycle-5 review (CHANGELOG nag deferred per release-manager policy): 1. installer/components/spellbook_cco.py:224 -- the `exec` line in the wrapper template left `install_root` unquoted, so an install_root path containing spaces (or other shell-metasignificant chars) would fail at runtime. Quoted the `${install_root}/cco` argument and updated EXPECTED_WRAPPER_TEMPLATE in tests/installer/test_spellbook_cco.py to match. 2. installer/components/agents.py:240 -- broken-symlink readlink() OSError silently swallowed; replaced with logger.warning() noting the failing path + exception, per style guide. 3. installer/components/agents.py:354 -- stale-symlink unlink() OSError silently swallowed; same pattern, replaced with logger.warning(). Added module-level `logger = logging.getLogger(__name__)`. 4. tests/test_cli/test_memory_cmd.py -- tripwire.mock for get_db_path was registered but never asserted and the SUT call site was not wrapped in `with tripwire:`. Both tests now: - wrap args.func(args) in `with tripwire:` - drain the timeline post-sandbox via `tripwire.in_any_order()` - search-path additionally drains the qmd/serena shutil.which() probes that do_memory_recall triggers (export-path takes the no-DB short-circuit before any subprocess calls). * fix: gemini cycle-6 findings (logging, dup nav, phase label, rmtree) Five real findings from gemini's cycle-6 review (CHANGELOG version-bump nag deferred per release-manager policy): 1. CHANGELOG.md -- the spellbook_cco entry was labeled "security architecture Phase 5" but matches the "Phase 7" wiring from the PR title. Relabeled to Phase 7. 2. mkdocs.yml:222 -- duplicate `commands/canvas.md` entry in the nav (already declared on line 220). Removed. 3. installer/platforms/claude_code.py:421 -- the wide `except Exception` that catches install_spellbook_cco failures recorded the message in the InstallResult but did not log. Added logger.exception() so the traceback is observable in debug logs alongside the stringified result. 4. installer/platforms/claude_code.py:461 -- same pattern for the alias-install dispatch wrapper. Added logger.exception(). 5. installer/components/spellbook_cco.py:684 -- uninstall path used shutil.rmtree(ignore_errors=True), silently swallowing all permission/IO failures. Replaced with the same onerror logger callback used elsewhere in this module (the install-path pin- verification rollback). Per-path failures now surface as logger.warning(); the uninstall path remains best-effort. Test updates: - tests/installer/test_aliases.py::test_install_does_not_abort_when_dispatch_raises drains the new claude_code.py logger.exception ERROR entry via tripwire.log.assert_log. - tests/installer/test_claude_code_wires_default_mode_and_permissions.py -- _assert_install_mocks and _assert_install_and_uninstall_mocks call a new _drain_cco_install_exception_log helper that drains the optional ERROR log emitted when the unmocked install_spellbook_cco subprocess call fails. required=False with a fallback for tripwire builds without the kwarg. Test coverage: 222 passed, 4 skipped on installer/ + 8 on memory_cmd. * docs(changelog): restore lost space in `SB-BASH-* rules` Gemini cycle-7 caught a stray missing space between `SB-BASH-*` and `rules` in the "Bash policy unified across Claude and Gemini paths" entry. Cosmetic regression; restored. * test(memory): replace _CountingMock helper with strict tripwire registrations Removes the _CountingMock / _make_counting hand-rolled wrapper that pre-stacked 50 .required(False).calls(_wrapped) entries per fixture. That pattern was an optional-mock cargo-cult: it satisfied tripwire's "every intercepted call must be asserted" contract but destroyed the strict-verification signal by suppressing both UnusedMocksError (extra registrations) and UnmockedInteractionError (over-call beyond the budget). The replacement uses a small _MockBuilder that: - Wraps tripwire.mock(path) bound to a default side-effect callable. - Exposes .expect(n) which registers exactly n strict .calls(fn) entries. - Tracks itself in a per-test registry; an autouse finalizer wildcard-asserts exactly n recorded interactions after the sandbox exits to satisfy tripwire's intercepted-call contract. Every test in this file now declares the SUT's exact mock call count via .expect(n). Tripwire's FIFO enforces the count strictly: - SUT calls greater than registered: UnmockedInteractionError on the extra call. - SUT calls less than registered: UnusedMocksError at teardown. The real test signal (POST URL/payload shape) is still asserted via the existing mock_http.calls equality checks; the strict count adds quantitative discipline back on top. mock_http.drain() is gone; the controller exposes the same .calls / .responses / .set_response / .set_handler surface plus .expect(n) and a duck-typed .verify_all() that the autouse finalizer invokes. Verified by intentionally over-counting (UnusedMocksError fires) and under- counting (UnmockedInteractionError fires); both raise as expected. Per-test call accounting is documented inline at the top of each TestClass. * fix: gemini cycle-8 findings (changelog typo, Path/utf-8/PATH/silent except) - CHANGELOG: restore missing space in "SB-BASH-* rules" on line 356. The prior cycle-7 commit afa9dd87 that claimed this fix landed as an empty commit, so the typo persisted. This commit applies it for real. - installer/components/spellbook_cco.py: drop redundant str() wrapping around Path objects in subprocess.run argv lists. subprocess accepts path-like objects since 3.6. Updates the tripwire mock_run command= registrations in tests/installer/test_spellbook_cco.py to match. - installer/components/spellbook_cco.py: add encoding="utf-8" to all Path.write_text / Path.read_text calls (the managed-clone marker and the spellbook-cco wrapper). - installer/components/spellbook_cco.py: resolve each PATH entry (expanduser + resolve) before comparing against the wrapper dir so the on-PATH check handles trailing slashes, "~", and relative entries the same way as the resolved wrapper dir. - installer/components/agents.py: replace two silent "except: pass" blocks in _prune_stale_agent_symlinks with logger.debug(..., exc_info= True) calls. Debug-level so normal worktree teardowns don't spam WARN, but the resolve failures are still auditable. * docs(changelog): backtick `SB-BASH-*` so the missing-space fix sticks The cycle-7 commit afa9dd87 lost the space ("SB-BASH-*rules" -> intended "SB-BASH-* rules") and the cycle-8 attempt landed as an empty commit because the legacy pre-commit markdownlint --fix hook quietly reformats the bare `SB-BASH-* ` token (the `*` becomes ambiguous emphasis to markdownlint MD037 and the space is stripped). Wrap the identifier in backticks so it is a code span, which: - makes the missing-space fix durable against markdownlint --fix - typographically marks `SB-BASH-*` as a rule-id family, matching how BASH-* / EXF-* are referenced two lines below. * docs(changelog): backtick `BASH-*` / `EXF-*` so markdownlint stops eating the space Side-effect cleanup from the previous commit: markdownlint --fix collapsed `BASH-* / EXF-*` to `BASH-*/ EXF-*` (the same MD037 emphasis detection that ate the SB-BASH-* space). Wrap each identifier in backticks so the slash separator and surrounding whitespace are stable. --------- Co-authored-by: elijahr <153711+elijahr@users.noreply.github.com>
1 parent 94d9da8 commit be81446

242 files changed

Lines changed: 9810 additions & 1058 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitattributes

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Pin line endings on files where byte-identity is assumed by tests.
2+
# See tests/test_security/test_agent_frontmatter.py for the snapshot test.
3+
agents/*.md text eol=lf
4+
tests/test_security/agent_snapshots.json text eol=lf

CHANGELOG.md

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -361,10 +361,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
361361
- **Bash policy unified across Claude and Gemini paths.** Renamed
362362
`hooks/gemini-policy.toml` to `hooks/bash-policy.toml`. Added a TOML
363363
loader to `spellbook/gates/rules.py` so the Claude path picks up the
364-
supplemental SB-BASH-* rules previously only consumed by the Gemini
364+
supplemental `SB-BASH-*` rules previously only consumed by the Gemini
365365
installer. Old filename is preserved as a migration alias for one
366366
release. SB-BASH-001..009 ship as additional defense-in-depth findings
367-
on top of the existing BASH-* / EXF-* regex set.
367+
on top of the existing `BASH-*` / `EXF-*` regex set.
368368
- **PR dance command** (`commands/pr-dance.md`) now explicitly refuses
369369
to merge under any session-level autonomy directive ("yolo", "do the
370370
PR dance autonomously", "just land it", "go go go"). Autonomy scopes
@@ -576,6 +576,74 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
576576
(one reviewer guardrail mentioning bigfoot is retained so PRs
577577
reintroducing the old name are flagged).
578578

579+
### Added
580+
581+
- **Narrowing-role subagents (security architecture Phase 5).** Nine
582+
new agents in `agents/` define narrowed tool surfaces for common
583+
development verbs: `implementer` for worktree edits, `web-researcher`
584+
(requires Phase 8 devcontainer to be safe in production),
585+
`git-committer` and `git-pusher` for split local/remote git, separate
586+
`pr-creator`/`pr-merger` for `gh pr` operations, `jira-reader`/
587+
`jira-mutator` for Atlassian MCP read vs. write, and `test-runner`
588+
for scoped test invocations. Each agent's `tools:` frontmatter is a
589+
*narrowing* list — `(parent_tools ∩ frontmatter_tools)` — and never
590+
grants capabilities the parent does not already hold. Agents are
591+
discovered via `$CLAUDE_CONFIG_DIR/agents/`; spellbook's installer
592+
now creates per-config-dir symlinks back to `$SPELLBOOK_DIR/agents/`,
593+
with idempotent install/uninstall that preserves user-authored agent
594+
files. Schema-validation tests guard the canonical 5-section body
595+
contract (`Purpose` / `Tools` / `Output Schema` / `Guardrails` /
596+
`Constraints`) and SHA-256-snapshot the existing 7 agents to catch
597+
unintended modification.
598+
599+
- **`installer/components/spellbook_cco.py` component (security
600+
architecture Phase 7).** New installer component clones the
601+
`elijahr/cco` fork at audit-pinned SHA `d7044ef` to
602+
`~/.local/spellbook/cco/` and writes the
603+
`~/.local/bin/spellbook-cco` wrapper. Pin verification is enforced
604+
by default; `SPELLBOOK_INSTALLER_SKIP_FORK_PIN=1` bypasses it with
605+
a stderr `WARNING` line so accidental drift is loud.
606+
- **macOS L5 sandbox layer ships via the fork's hardened SBPL
607+
profile.** The `spellbook-cco` wrapper applies the fork's profile,
608+
which adds DYLD environment scrub, file-read denies for
609+
user-writable dylib paths (preventing `DYLD_INSERT_LIBRARIES`-style
610+
injection), scoped `process-exec` deny+re-allow, and
611+
`mach-priv-task-port` deny. macOS now reaches feature parity with
612+
the Linux sandbox path rather than no-opping.
613+
- **`SPELLBOOK_SANDBOX_SKIP_PIN=1` escape hatch.** Renamed from
614+
`SPELLBOOK_SANDBOX_SKIP_CCO_PIN=1`; the old name is still accepted
615+
for one release with a stderr `DEPRECATION` warning, then removed.
616+
617+
### Changed
618+
619+
- **`installer/platforms/claude_code.py` macOS path.** No longer
620+
no-ops; chains `install_spellbook_cco` then `install_aliases` so
621+
macOS users get the same hardened sandbox surface as Linux users.
622+
- **`installer/tui.py` and `install.py` detection.** The TUI / CLI
623+
installer now probes for `spellbook-cco` on `PATH` instead of
624+
vanilla `cco`, so re-runs of the installer correctly identify
625+
prior spellbook-managed installs.
626+
- **`scripts/spellbook-sandbox` gating.** The sandbox launcher gates
627+
on `spellbook-cco` (not vanilla `cco`) and verifies the audit-pinned
628+
`d7044ef` SHA by parsing `spellbook-cco --version` output. A version
629+
mismatch fails closed with a remediation hint.
630+
- **`docs/security.md` retitled.** Section is now "Sandboxing with
631+
spellbook-cco (Linux + macOS)". Manual install instructions removed
632+
in favor of `install.py`, which is now the single source of truth
633+
for sandbox setup on both platforms.
634+
- **`README.md` cco onboarding rewritten.** Instructions reference
635+
`spellbook-cco` rather than vanilla `cco`, and point at `install.py`
636+
instead of the upstream `nikvdp/cco` repo for installation.
637+
638+
### Rollback
639+
640+
- **`SPELLBOOK_USE_VANILLA_CCO=1` falls back to vanilla
641+
`nikvdp/cco`.** If the hardened fork breaks something on your
642+
system, set this env var to revert to the upstream vanilla `cco`
643+
binary. **This bypasses the hardened SBPL profile and DYLD scrub**,
644+
so only use it long enough to recover, then unset the env var and
645+
re-run `install.py` so the spellbook-managed sandbox is restored.
646+
579647
## [0.56.0] - 2026-04-30
580648

581649
### Added

CLAUDE.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
11
# Spellbook Development
22

33
Read and follow the instructions in [AGENTS.md](./AGENTS.md).
4+
5+
### PR Review Bot
6+
- Bot username: gemini-code-assist[bot]
7+
- Re-review comment: /gemini review
8+
- Auto-reviews on PR creation: yes

README.md

Lines changed: 52 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@
3636
- [Quick Install](#quick-install)
3737
- [Windows Quickstart](#windows-quickstart)
3838
- [Sandboxed Usage](#sandboxed-usage)
39+
- [Rolling back to vanilla cco](#rolling-back-to-vanilla-cco)
40+
- [Windows: alias install + sandbox path TBD](#windows-alias-install--sandbox-path-tbd)
3941
- [What Spellbook Does](#what-spellbook-does)
4042
- [The orchestrator pattern](#the-orchestrator-pattern)
4143
- [Epistemic rigor](#epistemic-rigor)
@@ -50,7 +52,7 @@
5052
- [What's Included](#whats-included)
5153
- [Skills (57 total)](#skills-57-total)
5254
- [Commands (96 total)](#commands-96-total)
53-
- [Agents (7 total)](#agents-7-total)
55+
- [Agents (16 total)](#agents-16-total)
5456
- [Creative Modes](#creative-modes)
5557
- [Platform Support](#platform-support)
5658
- [AI Coding Assistants](#ai-coding-assistants)
@@ -108,19 +110,38 @@ irm https://raw.githubusercontent.com/axiomantic/spellbook/main/bootstrap.ps1 |
108110

109111
## Sandboxed Usage
110112

111-
Spellbook recommends running AI coding assistants inside [nikvdp/cco](https://github.com/nikvdp/cco)'s sandbox. The `spellbook-sandbox` launcher wraps `cco` with the right read-only allowances so spellbook's skills, hooks, and daemon auth work inside the sandbox while `$HOME` stays hidden.
113+
Spellbook recommends running AI coding assistants inside a `cco` sandbox. `install.py` writes `~/.local/bin/spellbook-cco` automatically — a wrapper around the audit-pinned [elijahr/cco](https://github.com/elijahr/cco) hardened fork (SHA `d7044ef`). The `spellbook-sandbox` launcher invokes `spellbook-cco` with the right read-only allowances so spellbook's skills, hooks, and daemon auth work inside the sandbox while `$HOME` stays hidden. You do not need to install upstream `cco` yourself.
112114

113-
```bash
114-
# Install cco first: https://github.com/nikvdp/cco#quick-start
115+
The `spellbook-cco` wrapper is the only spellbook-supported entry point. Vanilla `cco` exists solely for the post-deploy rollback path described below.
115116

116-
# Launch sandboxed
117+
```bash
118+
# install.py already wrote ~/.local/bin/spellbook-cco; just launch:
117119
spellbook-sandbox # Claude Code (default)
118120
spellbook-sandbox opencode # OpenCode CLI
119121
spellbook-sandbox opencode serve # OpenCode server (for desktop/web app)
120122
spellbook-sandbox codex # Codex
121123
```
122124

123-
The spellbook installer can set up `claude` and `opencode` shell aliases that point to `spellbook-sandbox` automatically. See [docs/security.md](docs/security.md#sandboxing-with-cco-macos) for the full threat model, `--safe` mode details, and OpenCode desktop app integration.
125+
The spellbook installer can set up `claude` and `opencode` shell aliases that point to `spellbook-sandbox` automatically. See [docs/security.md](docs/security.md#sandboxing-with-spellbook-cco-linux--macos) for the full threat model, `--safe` mode details, and OpenCode desktop app integration.
126+
127+
### Rolling back to vanilla cco
128+
129+
If the hardened fork misbehaves on your machine, set `SPELLBOOK_USE_VANILLA_CCO=1` in your shell rc and re-run `install.py`. The installer skips the `spellbook-cco` wrapper, the alias installer gates on `which cco` instead, and `spellbook-sandbox` routes through vanilla [nikvdp/cco](https://github.com/nikvdp/cco) at its legacy pin. Unset the variable and re-run `install.py` to return to the supported path.
130+
131+
### Windows: alias install + sandbox path TBD
132+
133+
Windows native sandboxing and alias installation are deferred to a later work item (open question Q-O). They are not in scope for the current security architecture phase.
134+
135+
Current behavior on Windows:
136+
137+
- The installer's `install_aliases_windows()` is an intentional noop. It returns `skipped_reason="Windows alias install is deferred to a later work item (Q-O)"` and does not modify any PowerShell `$PROFILE`.
138+
- `spellbook-cco` is not written on Windows native; spellbook does not currently sandbox Claude Code (or any other harness) on Windows.
139+
- `cmd.exe` is a known limitation: `doskey` macros do not persist across cmd sessions without an `AutoRun` registry edit, so cmd users are explicitly not served by the alias installer.
140+
141+
Windows users have two options until the Windows path lands:
142+
143+
- Invoke `spellbook-sandbox` directly, only meaningful if `spellbook-cco` (or vanilla `cco` under `SPELLBOOK_USE_VANILLA_CCO=1`) is already on `PATH`.
144+
- Use **WSL2 + the Linux install path** for full sandboxing. This is the recommended option. Run `install.py` inside the WSL Linux distro and the installer will write `spellbook-cco` there automatically.
124145

125146
## What Spellbook Does
126147

@@ -270,6 +291,9 @@ Reusable workflows for structured development:
270291
[testing-strategy]: https://axiomantic.github.io/spellbook/latest/skills/testing-strategy/
271292
[opportunity-awareness]: https://axiomantic.github.io/spellbook/latest/skills/opportunity-awareness/
272293
[branch-context]: https://axiomantic.github.io/spellbook/latest/skills/branch-context/
294+
[agent2agent]: https://axiomantic.github.io/spellbook/latest/skills/agent2agent/
295+
[canvas]: https://axiomantic.github.io/spellbook/latest/skills/canvas/
296+
[permissions-from-transcripts]: https://axiomantic.github.io/spellbook/latest/skills/permissions-from-transcripts/
273297
[distilling-prs]: https://axiomantic.github.io/spellbook/latest/skills/distilling-prs/
274298
[creating-issues-and-pull-requests]: https://axiomantic.github.io/spellbook/latest/skills/creating-issues-and-pull-requests/
275299
[autonomous-roundtable]: https://axiomantic.github.io/spellbook/latest/skills/autonomous-roundtable/
@@ -495,8 +519,11 @@ Reusable workflows for structured development:
495519
[/sharpen-audit]: https://axiomantic.github.io/spellbook/latest/commands/sharpen-audit/
496520
[/sharpen-improve]: https://axiomantic.github.io/spellbook/latest/commands/sharpen-improve/
497521
[/write-readme]: https://axiomantic.github.io/spellbook/latest/commands/write-readme/
522+
[/a2a]: https://axiomantic.github.io/spellbook/latest/commands/a2a/
523+
[/canvas]: https://axiomantic.github.io/spellbook/latest/commands/canvas/
524+
[/crystallize-consolidate]: https://axiomantic.github.io/spellbook/latest/commands/crystallize-consolidate/
498525

499-
### Agents (7 total)
526+
### Agents (16 total)
500527

501528
| Agent | Description |
502529
|-------|-------------|
@@ -507,6 +534,15 @@ Reusable workflows for structured development:
507534
| [justice-resolver] | Tarot: Conflict synthesizer |
508535
| [lovers-integrator] | Tarot: Integration specialist |
509536
| [queen-affective] | Tarot: Emotional state monitor |
537+
| [git-committer] | Narrowing-role: git commit operations |
538+
| [git-pusher] | Narrowing-role: git push operations |
539+
| [implementer] | Narrowing-role: code implementation |
540+
| [jira-mutator] | Narrowing-role: JIRA write operations |
541+
| [jira-reader] | Narrowing-role: JIRA read operations |
542+
| [pr-creator] | Narrowing-role: PR creation |
543+
| [pr-merger] | Narrowing-role: PR merge operations |
544+
| [test-runner] | Narrowing-role: test execution |
545+
| [web-researcher] | Narrowing-role: web research |
510546

511547
*† Derived from [superpowers](https://github.com/obra/superpowers)*
512548

@@ -517,6 +553,15 @@ Reusable workflows for structured development:
517553
[justice-resolver]: https://axiomantic.github.io/spellbook/latest/agents/justice-resolver/
518554
[lovers-integrator]: https://axiomantic.github.io/spellbook/latest/agents/lovers-integrator/
519555
[queen-affective]: https://axiomantic.github.io/spellbook/latest/agents/queen-affective/
556+
[git-committer]: https://axiomantic.github.io/spellbook/latest/agents/git-committer/
557+
[git-pusher]: https://axiomantic.github.io/spellbook/latest/agents/git-pusher/
558+
[implementer]: https://axiomantic.github.io/spellbook/latest/agents/implementer/
559+
[jira-mutator]: https://axiomantic.github.io/spellbook/latest/agents/jira-mutator/
560+
[jira-reader]: https://axiomantic.github.io/spellbook/latest/agents/jira-reader/
561+
[pr-creator]: https://axiomantic.github.io/spellbook/latest/agents/pr-creator/
562+
[pr-merger]: https://axiomantic.github.io/spellbook/latest/agents/pr-merger/
563+
[test-runner]: https://axiomantic.github.io/spellbook/latest/agents/test-runner/
564+
[web-researcher]: https://axiomantic.github.io/spellbook/latest/agents/web-researcher/
520565

521566
## Creative Modes
522567

agents/git-committer.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
name: git-committer
3+
description: Use for local git operations only — read, status, diff, log, add, commit, branch, fetch, and worktree. Does NOT push. Bash invocations pass through the spellbook PreToolUse bash gate, which blocks dangerous patterns and surfaces denials to the operator.
4+
tools: Bash, Read
5+
model: inherit
6+
---
7+
8+
## Purpose
9+
10+
Carry out local git work the parent dispatches: stage files, write
11+
commits, inspect history, manage branches and worktrees, and fetch from
12+
remotes. The agent narrows the parent's tool set to a local-only git
13+
surface; it never pushes to a remote, never opens or merges pull
14+
requests, and never expands the parent's capabilities. Push, PR, and
15+
merge operations are the responsibility of separate, scoped agents.
16+
17+
## Tools
18+
19+
`Bash` is the primary tool for git operations: `git status`, `git diff`,
20+
`git log`, `git show`, `git add`, `git commit`, `git branch`,
21+
`git checkout` (for branch switching, never `--`), `git fetch`,
22+
`git worktree`. Every Bash invocation passes through the spellbook
23+
PreToolUse bash gate, which blocks dangerous patterns (destructive
24+
shell idioms, exfiltration shapes) and may deny commands that match.
25+
`Read` opens files the parent points at — diffs, commit message
26+
templates, lockfiles. Conspicuously absent:
27+
`Edit`, `Write`, `Grep`, `Glob` — this agent does not modify source
28+
files, only stages and commits changes already on disk. The `tools:`
29+
frontmatter is a narrowing list — the agent has access to these tools
30+
and only these tools, never more.
31+
32+
## Output Schema
33+
34+
```json
35+
{
36+
"$schema": "http://json-schema.org/draft-07/schema#",
37+
"title": "GitCommitterResult",
38+
"type": "object",
39+
"required": ["commit_sha", "branch", "files_committed", "notes"],
40+
"properties": {
41+
"commit_sha": {
42+
"type": ["string", "null"],
43+
"description": "SHA of the commit produced by this run, or null if no commit was made."
44+
},
45+
"branch": {
46+
"type": "string",
47+
"description": "Branch the commit landed on (or current branch if no commit was made)."
48+
},
49+
"files_committed": {
50+
"type": "array",
51+
"items": {"type": "string"},
52+
"description": "Absolute paths of files included in the commit (empty if no commit was made)."
53+
},
54+
"notes": {
55+
"type": "string",
56+
"description": "Free-text notes: deviations, follow-up work, hook denials, or unresolved questions."
57+
}
58+
}
59+
}
60+
```
61+
62+
## Guardrails
63+
64+
- MUST verify the working directory and current branch before any git
65+
invocation; reject the dispatch if either does not match what the
66+
parent specified.
67+
- MUST NOT run `git push`, `git reset --hard`, `git checkout --`,
68+
`git stash drop`, `git rebase`, or any other destructive or
69+
remote-mutating git operation. Operator confirmation is the primary
70+
enforcement; the spellbook bash gate provides defense-in-depth for
71+
generic dangerous patterns but does not enforce per-agent
72+
subcommand allow-lists.
73+
- MUST follow project conventions for commit messages: no AI-attribution
74+
trailers, no GitHub issue numbers, no `--no-verify`, no `--amend`
75+
without explicit operator authorization.
76+
- MUST surface spellbook bash-gate denials to the operator verbatim and
77+
ask how to proceed; never paper over a denial with an alternative
78+
command shape.
79+
- MUST stage only the files the parent named or that fall within the
80+
parent-specified scope; never run `git add -A` or `git add .` to
81+
blanket-stage the working tree.
82+
83+
## Constraints
84+
85+
- `tools:` is a narrowing surface over the parent's toolset — the agent
86+
has Bash and Read, and only those, and cannot escalate.
87+
- Operates in a worktree or the current working directory; does NOT
88+
create new branches or worktrees unless explicitly dispatched to do so.
89+
- Bash invocations pass through the spellbook PreToolUse bash gate; ask
90+
the operator if a command is denied. The agent cannot escalate past a
91+
denial.
92+
- Scope is bounded by the parent's dispatch prompt; out-of-scope work is
93+
reported in `notes`, not silently executed.

0 commit comments

Comments
 (0)