feat(agents+installer): narrowing-role subagents (Phase 5) + spellbook-cco fork wiring (Phase 7)#284
feat(agents+installer): narrowing-role subagents (Phase 5) + spellbook-cco fork wiring (Phase 7)#284elijahr wants to merge 48 commits into
Conversation
Step 5's stop-and-report-merge-ready is non-negotiable. Make it explicit that 'do the PR dance autonomously', 'yolo', 'just land it', and similar phrasings authorize the iteration loop only -- never merge, tag-push, or branch deletion.
Output of scripts/generate_docs.py against the current source-of-truth in skills/ and commands/. No source changes; just catches up the docs mirror with drift accumulated since the last regen.
Add two new pytest marks for platform-conditional skipping:
- posix_only: auto-skipped on Windows (sys.platform.startswith('win'))
- windows_only: auto-skipped on POSIX
Wire them into tests/conftest.py's pytest_collection_modifyitems hook
alongside the existing requires_memory_tools and docker handling.
Foundational scaffolding for WI-7 (security architecture phase 7);
subsequent platform-dispatched installer tests rely on these marks.
Address review nits: prose marker descriptions, module docstring explaining placement under tests/installer/, clearer comment on _FakeConfig pytest plugin shape, and simplified _patch_platform fixture docstring.
Pin nikvdp/cco at commit 9744b9f (2026-04-30) inside scripts/spellbook-sandbox per Sec 9.3 audit (2026-05-06). Linux bwrap backend audited PASS: kernel-managed /proc in unshared PID namespace, --cap-drop ALL, --die-with-parent, seccomp TIOCSTI/TIOCLINUX filter. macOS L5 documented as intentionally absent; macOS sessions rely on L4 PreToolUse hooks (shipped) and L6 devcontainer (WI-8 planned) instead of cco-based isolation, since cco's sandbox-exec profile is write-protect-only ((allow default) baseline, no process-fork/exec deny, no DYLD scrubbing). Pinning mechanism: a CCO_PINNED_SHA constant near the top of the script plus a runtime gate that compares the short SHA reported by 'cco --version' (cco resolves --version from 'git rev-parse --short HEAD' of its install dir at ~/.local/share/cco). The gate aborts on mismatch unless the operator sets SPELLBOOK_SANDBOX_SKIP_CCO_PIN=1 (audited downgrade). Tests in tests/installer/test_aliases.py (posix_only): - pin SHA parsed from a structured header line == 9744b9f - Sec 9.3 audit doc filename present - macOS L5 rationale anchor phrase present in script or sidecar - executable bit preserved - --help smoke test (cco-not-found stderr is exact-matched) Part of WI-7 security architecture phase 7.
Address review findings from Task 2 of WI-7: - H1: integration test now branches on three exit conditions (success, cco-absent, SHA-mismatch); robust on developer machines with cco installed at a non-pinned SHA - M1: new posix_only test simulates fake cco shim with bad/empty/ mismatched --version output, asserts fail-closed exit - M2: SHA-mismatch error message now includes cco install URL - L1: documented legitimate use case for SPELLBOOK_SANDBOX_SKIP_CCO_PIN - L2: moved posix_only mark from module-level to per-test, allowing static-content checks to run on all platforms
There was a problem hiding this comment.
Code Review
This pull request implements Phase 5 of the security architecture by introducing nine narrowing-role subagents and a new installer component to manage agent symlinks. It also significantly updates the instruction crystallization system with new rule-preservation logic, a consolidation command, and enhanced verification phases. Other changes include a mandatory pre-dispatch ritual for development tasks, refined task complexity ceilings for renames, and updated skill triggers. Review feedback identifies a missing minor version bump in the .version file, a violation of the mocking style guide regarding the use of monkeypatch in tests, and opportunities to improve symlink robustness and installer code organization.
| The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), | ||
| and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). | ||
|
|
||
| ## [Unreleased] |
There was a problem hiding this comment.
The version bump in the .version file is missing from this pull request. Per the Repository Style Guide (line 7), every PR must include a version bump following semantic versioning. Given the scope of these changes (new agents and installer components), a minor version bump is required.
References
- Every PR must include a version bump in the .version file following semantic versioning. (link)
| monkeypatch.setattr( | ||
| spellbook_hook, "STOP_HARVEST_CACHE_PATH", per_test_cache, | ||
| ) | ||
| # (2) Force the worker-LLM gate to off so the regex path is taken, | ||
| # regardless of the user's spellbook.json or any leaked | ||
| # monkeypatch from another test module. | ||
| try: | ||
| from spellbook.worker_llm import config as _wl_config | ||
|
|
||
| monkeypatch.setattr( | ||
| _wl_config, "feature_enabled", lambda _name: False, | ||
| ) |
There was a problem hiding this comment.
monkeypatch.setattr used to replace a function or module attribute -- tripwire is required for mocking. Convert to tripwire.mock("<module>:<attr>").returns(...) plus with tripwire: sandbox plus .assert_call(...). monkeypatch is only allowed for env vars, cwd, and sys.path.
# (1) Per-test cache file, never the real ~/.local path.
per_test_cache = tmp_path / "_stop_harvest_cache.json"
m_cache = tripwire.mock.object(spellbook_hook, "STOP_HARVEST_CACHE_PATH")
m_cache.returns(per_test_cache)
# (2) Force the worker-LLM gate to off so the regex path is taken.
m_feat = tripwire.mock("spellbook.worker_llm.config:feature_enabled")
m_feat.returns(False)
with tripwire:
yieldReferences
- monkeypatch.setattr used to replace functions, methods, class attributes, or module-level callables is NOT allowed. tripwire is the ONLY acceptable framework. (link)
| raw_target = Path(entry.readlink()) | ||
| except OSError: | ||
| continue | ||
| if raw_target.is_absolute() and _resolve_in(raw_target.parent, agents_source_dir): |
There was a problem hiding this comment.
The check raw_target.is_absolute() will skip any relative symlinks. While the installer typically creates absolute links, handling relative links by resolving them against the entry's parent would make the cleanup more robust against manually created or legacy links.
| if raw_target.is_absolute() and _resolve_in(raw_target.parent, agents_source_dir): | |
| if not raw_target.is_absolute(): | |
| raw_target = (entry.parent / raw_target) | |
| if _resolve_in(raw_target.parent, agents_source_dir): |
| self._step("Cleaning up stale agent symlinks") | ||
| agents_source_dir = self.spellbook_dir / "agents" | ||
| agents_target_dir = self.config_dir / "agents" | ||
| # Build the set of CURRENT source targets (resolved paths), not | ||
| # basenames. A symlink in the target dir is "stale" only when its | ||
| # resolved target is no longer one of the current spellbook agent | ||
| # files. Using the resolved-path set (rather than basenames) | ||
| # preserves user-authored aliases that point at valid spellbook | ||
| # agents under non-canonical names: a symlink at | ||
| # ``$CLAUDE_CONFIG_DIR/agents/myalias.md`` whose resolved target is | ||
| # ``$SPELLBOOK_DIR/agents/alpha.md`` IS valid and must not be | ||
| # unlinked just because its basename does not appear in the source | ||
| # set. | ||
| current_source_targets = ( | ||
| {p.resolve() for p in agents_source_dir.glob("*.md") if p.is_file()} | ||
| if agents_source_dir.exists() | ||
| else set() | ||
| ) | ||
| if agents_target_dir.exists(): | ||
| for entry in sorted(agents_target_dir.glob("*.md")): | ||
| if not entry.is_symlink(): | ||
| continue | ||
| # Determine if this symlink should be treated as stale: | ||
| # - Resolves cleanly: stale iff its resolved target is NOT | ||
| # in the current source-target set (covers both renamed | ||
| # and removed source files, and aliases pointing at a | ||
| # no-longer-current source). | ||
| # - Broken (resolve fails): stale iff the raw target's | ||
| # parent resolves to this spellbook's agents/ dir | ||
| # exactly, OR the raw target's parent (string-resolved) | ||
| # equals our source agents dir even when the leaf is | ||
| # missing (worktree-switch stale). | ||
| stale = False | ||
| try: | ||
| resolved = entry.resolve(strict=True) | ||
| stale = resolved not in current_source_targets | ||
| except (OSError, RuntimeError): | ||
| # Broken symlink. Two stale-cleanup paths: | ||
| # (a) raw target's parent (resolved) is the current | ||
| # spellbook agents dir -- same-worktree stale. | ||
| # (b) raw target's parent, resolved with strict=False | ||
| # (which tolerates missing leaves), equals this | ||
| # spellbook's agents dir -- worktree-switch stale | ||
| # where the prior worktree dir no longer exists. | ||
| try: | ||
| raw_target = Path(entry.readlink()) | ||
| except OSError: | ||
| raw_target = None | ||
| if raw_target is not None and raw_target.is_absolute(): | ||
| try: | ||
| raw_target.parent.resolve().relative_to( | ||
| agents_source_dir.resolve() | ||
| ) | ||
| stale = True | ||
| except (ValueError, OSError): | ||
| pass | ||
| if not stale: | ||
| # Tightened heuristic: exact-equality of the | ||
| # broken link's parent dir against THIS | ||
| # spellbook's agents/ dir (string-resolved with | ||
| # strict=False so missing leaves don't error). | ||
| # This avoids false positives where the broken | ||
| # symlink points into a DIFFERENT spellbook | ||
| # installation -- substring matching on | ||
| # "spellbook" used to remove those, which is | ||
| # not our cleanup's responsibility. | ||
| try: | ||
| raw_parent_resolved = raw_target.parent.resolve( | ||
| strict=False | ||
| ) | ||
| if raw_parent_resolved == agents_source_dir.resolve(): | ||
| stale = True | ||
| except OSError: | ||
| pass | ||
| if stale and not self.dry_run: | ||
| try: | ||
| entry.unlink() | ||
| except OSError: | ||
| pass |
There was a problem hiding this comment.
The logic for cleaning up stale agent symlinks is quite extensive and partially duplicates logic in uninstall_agents. Consider refactoring this into a reusable function within installer.components.agents (e.g., cleanup_stale_agents) to improve maintainability and keep the platform-specific file focused.
bashlex AST is WI-6a's deliverable; secret-path denylist is WI-6d. Parenthetical now lists both, matching impl plan attribution.
Add a documented stub for Windows alias install that returns a noop dict matching install_aliases()'s return shape. Windows alias install and sandbox path are deferred to a later WI (Q-O); cco lacks a Windows backend and spellbook-sandbox is POSIX-only. The stub satisfies the API contract for the upcoming claude_code.py platform dispatch (Task 4 of WI-7) without committing to Windows behavior that hasn't been audited. Tests at tests/installer/test_aliases_windows.py marked windows_only; skipped on POSIX dev machines; exercised when a Windows CI is added.
Address Phase 4.5 review findings on Task 3/5: - M1: tests now monkeypatch Path.home and snapshot it too, catching any regression that writes to the user's actual rc files - M2: skipped_reason switched to human-readable prose, matching the existing skip-path string style - L1 (pre-existing): install_aliases() docstring now correctly types rc_path as str | None (matches actual return on skip paths) - L2: stub docstring now sketches the production implementation scaffolding (PowerShell $PROFILE, marker reuse, return shape) - N1: stub log message surfaces dry_run for operator visibility
Add a dispatch helper in installer/platforms/claude_code.py that routes alias install based on get_platform(): - LINUX -> install_aliases() (wraps `claude` -> spellbook-sandbox) - MACOS -> documented noop; cco sandbox-exec is insufficient per Sec 9.3 - WINDOWS -> install_aliases_windows() (Q-O stub from Task 3) Unknown platforms raise NotImplementedError to surface future enum additions explicitly. The existing install.py:1148 interactive offer remains as a separate opt-in path; this is the platform-install-time path that runs unconditionally during Claude Code platform install.
Address review findings on commit 24ba686 (Phase 4.5 review of WI-7 Task 4/5): - F1: dispatch call in ClaudeCodeInstaller.install() is now wrapped in try/except, so a bad rc-file does not abort the security-critical hooks install step that follows - F2: LINUX branch now checks shutil.which("cco") and noops with a clear skipped_reason if cco is missing, matching the legacy install.py:1126 gating pattern - F3: new behavioral test verifies the dispatch is actually called from install() and produces an "aliases" InstallResult in the returned list - F4: new test pins the F1 contract -- _install_claude_code_aliases raising does NOT abort install(); subsequent components still run (CLAUDE.md, hooks) - F5/F6: comments clarify the success=True+action=skipped convention and document the unknown-platform test's scope (control-flow vs contract assertion) The existing LINUX dispatch test now stubs shutil.which so the cco gate is deterministic on developer machines. A new test_dispatch_linux_skips_install_aliases_when_cco_missing covers the F2 negative path with caplog assertion on the operator-facing message.
Add a Windows subsection under Sandboxed Usage that records the Phase 7 deferral of Windows alias install and sandbox path (open question Q-O). Documents the current installer noop behavior, the cmd.exe doskey limitation, and points Windows users at WSL2 + the Linux install path for full sandboxing.
- Direct WSL2 readers to install cco inside the WSL distro before running the installer (L1) - Tighten the direct-invoke bullet now that the WSL2 bullet covers cco placement (N1) - Replace "documented noop" with "intentional noop" for clarity (N2)
These tests assume POSIX rc-file paths and "\n" line endings; mark them POSIX-only so Windows runners skip them via the conftest auto-skip hook.
Adds `ruff>=0.6.0` to the dev dependency group and a minimal `[tool.ruff]` configuration block (line-length=100, target-version=py311, default lint rules, `__init__.py` F401 carve-out, and `versions/` Alembic revisions excluded). Also fixes the `bigfoot[http]` / `python-tripwire[http]` extras: those extras were dropped in python-tripwire 0.21+, leaving the test suite unable to import `tripwire.plugins.http` (the `bigfoot.http.mock_response` shim used in tests/test_worker_llm/conftest.py). Drops the `[http]` markers and adds explicit `requests>=2.31` since the http plugin needs it. Without this, `uv run pytest` fails on first worker_llm test. target-version is py311 (not py310 to match `requires-python`) because the codebase uses 3.11 syntax (`except*` in spellbook/admin/routes/ws.py). That mismatch is pre-existing and out of scope for this commit; ruff's target-version follows the actual code.
Resolves all 506 ruff lint violations on the branch so that `uv run ruff check .` exits 0. Approach (per plan): 1. `ruff check --fix` (safe-fix only) — 394 violations auto-fixed (mostly F401 unused imports, F541 empty f-strings). 2. Manual fixes for the 111 remaining (F841 with unsafe-fix, plus E402, E731, E702, E741, E712, F821). Notable manual carve-outs: - installer/compat.py: restored re-exported names that ruff removed as F401 (`UnsupportedPlatformError`, `ServiceManager`, `get_data_dir`). These are part of the documented re-export contract; added explicit `# noqa: F401` markers. - tests/test_spellbook_mcp/test_server_no_skill_tools.py: restored `from spellbook import skill_ops` inside `with pytest.raises(ImportError):` — the import IS the test. - spellbook/sessions/watcher.py: restored `import asyncio` (F821 — the auto-fix removed it but lines 275/278 use `asyncio.get_running_loop()` / `asyncio.run()`). - E402 sites with `logger = logging.getLogger(__name__)` between imports carry a `# noqa: E402` with rationale. Test status: - `uv run ruff check .` → All checks passed! - `uv run pytest -x` → 4508 passed, 135 skipped, 531 deselected.
Output of scripts/generate_docs.py against the current source-of-truth in skills/ and commands/. No source changes; just catches up the docs mirror with drift accumulated since the last regen.
New installer/components/agents.py mirrors create_skill_symlinks / create_command_symlinks with two divergences: skip+warn on user-authored target files (don't clobber), and verify_source narrowing on uninstall. Tests cover install (idempotence, broken-symlink replace, user-file preservation, dry-run, mixed states, non-md filter, source resolution) and uninstall (spellbook-only narrowing, user-file preservation, missing-dir, idempotence). 17 cases total. Wiring into ClaudeCodeInstaller and the cleanup list lands in a separate commit (Task A2).
…cabulary Code review and fact-check on commit 3d88d0f surfaced three findings: - "replaced" action label was not recognized by installer/ui.py (which knows installed/upgraded/created/skipped/failed/unchanged) and fell through to a generic icon. Renamed to "upgraded". - Comment on the broken-symlink uninstall branch over-broadened the check to "raw target lies inside agents dir" when the code actually checks the raw target's parent dir; corrected. - Test name "test_non_md_files_in_source_ignored" only covered one of three behaviors the body verified; renamed (or split) for clarity.
Adds the agents step to ClaudeCodeInstaller.install() (after scripts, before CLAUDE.md) and the symmetric removal to .uninstall(). Adds a narrowed install-time pre-cleanup pass that purges stale agent symlinks (renamed/removed source files, including those pointing into a prior worktree's agents/) without clobbering currently-valid entries, so re-install remains idempotent. Integration tests verify: install creates resolvable symlinks at the target, dry-run is side-effect-free, the install step's InstallResult reports counts, uninstall preserves user-authored files, re-install is idempotent, and stale symlinks from a prior worktree are purged.
… message Code review and fact-check on commit 6525327 surfaced four findings: - The install-time cleanup pre-pass marked an entry stale when its basename was not in the current spellbook source set, which would unlink user-authored alias symlinks pointing to valid agent files under non-canonical names. Switched to a resolved-target-path set so aliases pointing to valid current sources are preserved. - The broken-symlink fallback heuristic relied on a "spellbook" substring match, which produced false positives for broken symlinks pointing at a *different* spellbook installation. Tightened to compare the broken symlink's parent dir against the current installation's agents/ dir via exact resolved-path equality. - The aggregation message dropped the failed count, leaving the user with success=False and no breakdown when a partial failure occurred. Now includes "{N} failed" when nonzero. - Reworded the uninstall-side comment block to accurately describe what cleanup_spellbook_symlinks does (it also removes any broken symlink unconditionally, not just spellbook-substring matches).
Adds the canonical narrowing-role agent (implementer.md) as the seed for the remaining 8 new agents. tools: frontmatter narrows to Edit, Write, Read, Grep, Glob, Bash — Bash is gated by the L4 PreToolUse hook for tier-based authorization. Body uses the 5-section schema (Purpose, Tools, Output Schema, Guardrails, Constraints). Schema test parametrizes over the 9 expected new agents (validating whichever currently exist) and the 7 existing agents (byte-identical snapshot). The snapshot file at tests/test_security/agent_snapshots.json guards against accidental modification of the existing 7 during WI-5 work. code-reviewer.md and justice-resolver.md are exempt from the tools-presence check pending a separate cleanup task.
Fact-check on commit 508665c found that implementer.md described the WI-6 L4 tier classifier as the bash gate, but that infrastructure doesn't exist on main yet (lives on security-architecture-phase-3). Rewrote the description, body, guardrails, and constraints to describe the spellbook PreToolUse bash gate that exists today — pattern-based denial of dangerous commands. WI-6 can later expand the language when tier-based narrowing lands on main. Code review additionally suggested: - Body section ordering test now matches headings as full lines, immunizing against future agent files that include literal '## X' inside fenced code blocks. - Replaced opaque "F3" reference in TOOLS_EXEMPT_EXISTING comment with inline rationale. - Clarified the test_tools_exempt docstring to enumerate the three steps required when removing the exemption (regenerate snapshot, delete entry, delete test).
Author the 8 remaining narrowing-role subagent files in agents/, each conforming to the canonical 5-section schema established by agents/implementer.md (Task B). Each file's `tools:` frontmatter is a narrowing list — the agent has access to those tools and only those tools, never more — and matches the canonical mapping in tests/test_security/test_agent_frontmatter.py. - web-researcher: WebFetch, WebSearch, Read. Quarantined web research returning structured findings. Flagged in Constraints as requiring WI-8 (devcontainer) before being safe to dispatch in production. - git-committer: Bash, Read. Local git only — read, status, diff, log, add, commit, branch, fetch, worktree. Never pushes. - git-pusher: Bash, Read. Only `git push`. Operator confirmation required for every push. - pr-creator: Bash, Read. Only `gh pr create`, `gh pr edit`, `gh pr view`, `gh pr diff`, `gh pr list`. No merge or ready-mark. - pr-merger: Bash, Read. Only `gh pr merge`, `gh pr ready`. Operator confirmation required for every merge and ready-mark. - jira-reader: Read (frontmatter); Atlassian MCP read tools at runtime. Read-only Jira inspection, no mutations. - jira-mutator: Read (frontmatter); Atlassian MCP read+write tools at runtime. Operator confirmation required for state transitions. - test-runner: Bash, Read, Grep. Test commands only (pytest, npm test, etc.); no source edits, no git side effects. The schema test parametrizes over `_existing_new_agents()` so all 9 narrowing-role agents (the canonical implementer plus these 8) now validate against the canonical contract: 55 passed, 1 skipped (regen mode). Full security suite remains green: 389 passed.
Two groups of fixes from code review + fact-check on the 8 narrowing-role
agent files added in the prior commit.
Group A — gate-attribution accuracy. Five files (git-committer, git-pusher,
pr-creator, pr-merger, test-runner) over-claimed that the spellbook
PreToolUse bash gate blocks specific git/gh subcommand patterns
(e.g. `git push`, `git reset --hard`, `gh pr merge`, `gh pr ready`,
`--force` to protected branches, `gh pr merge --admin`, "any git side
effects"). The actual gate blocks generic escalation/destructive shell
patterns, not per-agent subcommand allow-lists. Softened the gate-claim
language to match the canonical seed in agents/implementer.md
("destructive shell idioms, exfiltration shapes") and reframed the
guardrails so operator confirmation and agent role separation are
the primary enforcement, with the bash gate as defense-in-depth. The
role-contract MUST NOT bullets stay — only the false gate attribution
is removed.
Group B — reviewer wording fixes:
* jira-mutator description: "expected" → "REQUIRED" for state
transitions, aligning with sibling git-pusher / pr-merger phrasing
and the existing body Guardrail.
* git-pusher Output Schema: renamed `remote` → `remote_refspec` and
rewrote the description to clarify the `<remote>/<ref>` format with
examples that include nested refs. Updated the `required` array.
* git-pusher upstream-verify guardrail: rewrote to handle the
first-push case (no upstream yet) without ambiguity.
* test-runner Tools: dropped the `cat` example; `Read` handles file
content and `ls`/`find` cover read-only Bash verbs. Added a
sentence explaining why `Glob` is omitted (broader than the
test-runner's scoping discipline; `find` invocations inherit the
bash-gate's scoping constraints).
* jira-reader / jira-mutator descriptions: appended the
narrowing-list + MCP runtime-discovery semantics so the
declarable `tools: Read` is no longer surprising.
Verifications:
- tests/test_security/test_agent_frontmatter.py: 55 passed, 1 skipped
- tests/test_security/ full suite: 389 passed, 1 skipped
- tests/installer/ full suite: 61 passed
- Forbidden term scan (L4 / tier classifier / tier-based / capability
escalation / WI-4): zero matches.
- Forbidden claim scan (gate also blocks / gate blocks gh pr / gate
blocks git push|reset|checkout|stash): zero matches.
agents/implementer.md is the canonical seed and was not modified.
`TestStopHookHarvest::test_stop_hook_harvests_single_candidate` (and
sibling tests in the same class) was order-dependent in two ways:
1. `STOP_HARVEST_CACHE_PATH` resolves at import time to
`~/.local/spellbook/cache/last-stop-harvest.json`. The class's
methods that exercise the success path call `_record_stop_harvest`
on the real on-disk cache. Subsequent runs whose `tmp_path/
session.jsonl` collided with a cached entry got short-circuited at
the idempotency check, leaving `mock_http.calls == []` and tripping
the assertion. Reproduced deterministically by running with
`--basetemp=<previously-cached-pytest-N-dir>`.
2. `_handle_stop` calls `feature_enabled("transcript_harvest")` from
`spellbook.worker_llm.config`, which reads the user's real
`spellbook.json`. If any test (or env override) flipped that flag
on, the hook took the worker-LLM branch and never POSTed to
`/api/memory/unconsolidated`.
Add an `autouse` fixture on `TestStopHookHarvest` that:
- rebinds `STOP_HARVEST_CACHE_PATH` to a per-test path under
`tmp_path` (no leakage to or from `~/.local`);
- forces `feature_enabled` to return False so the regex path is
taken regardless of user config or leaked monkeypatches.
Tests that already monkeypatch `STOP_HARVEST_CACHE_PATH` themselves
(idempotent / records-sha / retries-on-failure / network-failure)
still override the autouse stub via `monkeypatch.setattr` ordering;
no change to their behavior.
No production code changes. Full suite: 4564 passed, 132 skipped.
…verage
Address findings F1-F4 from the green-mirage audit on the WI-5 surface.
F1: Add `test_all_expected_new_agents_exist` — set-equality assertion that
every key in EXPECTED_NEW_AGENTS has a matching file in agents/. Without
this, `_existing_new_agents()` would silently shrink its parametrize input
if an expected agent file were deleted, and the per-agent schema tests
would still pass on the smaller set.
F4: Remove `test_canonical_implementer_exists`. F1's set-equality check
strictly subsumes it (implementer.md is in EXPECTED_NEW_AGENTS).
F2: Strip fenced code blocks before scanning for required headings in
`test_new_agent_has_required_body_sections_in_order`. The prior
"\n{heading}\n" substring scan would still false-match a heading literal-
embedded in a fenced block; the docstring overclaimed immunity. New
helper `_strip_fenced_blocks` (DOTALL ``` ... ``` removal) plus updated
comment make the assertion match its stated intent.
F3: Add two tests in `TestUninstallAgents` covering the previously
unreached broken-symlink branch in `uninstall_agents`:
- `test_uninstall_removes_broken_symlink_into_spellbook`: positive case,
parent-dir resolves into $SPELLBOOK/agents/ -> link is removed.
- `test_uninstall_preserves_broken_symlink_pointing_elsewhere`: negative
case, parent-dir is OUTSIDE the spellbook -> link is preserved and
its readlink target is unchanged.
Verification:
- tests/test_security/test_agent_frontmatter.py + tests/installer/test_agents_symlink.py: 82 passed, 1 skipped (snapshot regen).
- Full suite: 4566 passed, 132 skipped, 531 deselected.
Document the user-facing surface introduced by WI-5: nine narrowing-role agents in agents/ and the per-config-dir symlink installer that lands them under $CLAUDE_CONFIG_DIR/agents/. Covers the narrowing-vs-granting semantics so users understand the tools: frontmatter contract.
- F1 contract test now also asserts `mcp` component runs after alias dispatch raises (per Phase 4.6.4 fact-check finding on test_install_does_not_abort_when_dispatch_raises) - Drop non-existent `docs/security/audits/sec_9_3_result.md` path from _install_claude_code_aliases docstring; the script header carries the canonical path
de6bfa3 to
4fa0f8c
Compare
…re-phase-5 Combine WI-7 (cco L5 sandbox wiring + ruff cleanup bundle) with WI-5 (narrowing-role subagents) into a single security-architecture PR (#284). WI-7 highlights: - Pin cco at SHA 9744b9f in scripts/spellbook-sandbox (Linux L5 PASS per Sec 9.3 audit) - Platform-aware Claude Code alias dispatch (LINUX/MACOS/WINDOWS) - macOS L5 documented noop (Sec 9.3 sandbox-exec audit FAIL) - Windows alias install + sandbox path TBD (Q-O deferral) with stub + tests - posix_only / windows_only pytest mark registration + auto-skip hook - Bundled ruff dev dep + 506-violation cleanup
Flips the audit-pinned SHA constant to the elijahr/cco fork tip (d7044ef) and adds EXPECTED_VANILLA_CCO_SHA (9744b9f) as the fallback constant for SPELLBOOK_USE_VANILLA_CCO=1 rollback coverage. Updates MACOS_RATIONALE_PHRASE to match the revised header phrasing landed by Task 6 in scripts/spellbook-sandbox. Renames the fail-closed-gate shim from cco to spellbook-cco so the three SHA-mismatch parametrize cases drive the gating binary the script now actually checks. Updates --help-runs-cleanly to look for the new error shape. Adds a new truth-table-row test (test_sandbox_both_skip_vars_set_to_one_warn_and_skip) covering the SKIP_PIN=1 + SKIP_CCO_PIN=1 case, which the existing class did not exercise: new var wins skip precedence (gate skipped), legacy var still triggers DEPRECATION because operator is using the deprecated name. Consolidates the Task-6-local EXPECTED_FORK_SHA / EXPECTED_VANILLA_SHA constants to alias the module-level constants so the SHA values live in exactly one place.
Introduce installer/components/spellbook_cco.py: clones the elijahr/cco fork at audit-pinned SHA d7044ef into ~/.local/spellbook/cco, writes ~/.local/bin/spellbook-cco with a managed-tag header, and verifies the pin via two-step check (git rev-parse + cco --version awk-parse). The wrapper exposes the hardened fork's macOS SBPL profile (DYLD scrub + file-read denies + scoped process-exec deny + mach-priv-task-port deny) to the rest of spellbook. emit_rollback_warning() is exported as the single source of truth for the SPELLBOOK_USE_VANILLA_CCO=1 rollback warning. SPELLBOOK_INSTALLER_SKIP_FORK_PIN=1 bypasses pin verification with a stderr WARNING.
Adds 22 tests covering Tier-0 constants, Tier-1 mocked-shape paths, and Tier-2 paths exercised against a file:// bare-repo fixture whose stub emits the fixture's actual head_sha. Tests cover dry-run shape, real clone-and-verify, rollback on each pin-verify step in isolation, idempotency, namespaced wrapper tag, untagged-wrapper warning, dirty/wrong-remote aborts, PATH warnings, SKIP_FORK_PIN bypass, vanilla-rollback env var, Windows shape-noop, uninstall happy path, untagged-preserve, idempotency, dry-run, and a darwin smoke.
…e platform Adds a once-globally install_spellbook_cco hook to ClaudeCodeInstaller.install() that runs before the per-config-dir loop, and a mirroring uninstall_spellbook_cco call in uninstall(). Merges the Linux and macOS branches in _install_claude_code_aliases into a shared POSIX path that gates on shutil.which(sandbox_binary), where sandbox_binary defaults to 'spellbook-cco' (or 'cco' under SPELLBOOK_USE_VANILLA_CCO=1). Updates the L5 macOS rationale comment block to cite the elijahr/cco fork's hardened SBPL profile, and rewrites all 'cco not on PATH' diagnostics to point at re-running install.py. The rollback WARNING is emitted via the imported emit_rollback_warning() so installer-side warnings stay byte-equal across call sites.
Replaces shutil.which('cco') in the TUI install offer with shutil.which(sandbox_binary), where sandbox_binary defaults to 'spellbook-cco' (or 'cco' under SPELLBOOK_USE_VANILLA_CCO=1). Updates the rationale string to reference the audit-pinned elijahr/cco fork and removes the inline nikvdp/cco install URL in favor of pointing operators at install.py. Emits the canonical rollback WARNING via emit_rollback_warning() under the env-var override.
Replaces shutil.which('cco') in the post-install alias offer with shutil.which(sandbox_binary), and extracts the offer block into _offer_sandbox_aliases() so the TUI and CLI flows share a single implementation. Under SPELLBOOK_USE_VANILLA_CCO=1 the offer falls back to gating on 'cco' and emits the canonical rollback WARNING via emit_rollback_warning().
Adds tests/installer/test_cco_entry_points.py with 7 tests covering both tui.py and install.py wiring: default-path detection, env-override routing, WARNING emission on env override, and WARNING-absence regression guards on the default path.
Rewrites scripts/spellbook-sandbox to gate on the elijahr/cco fork's spellbook-cco wrapper and verify the d7044ef pin via 'spellbook-cco --version'. SPELLBOOK_USE_VANILLA_CCO=1 falls back to vanilla 'cco' at the prior 9744b9f pin and emits a script-prefixed WARNING that the SBPL profile is bypassed.
Adds the SPELLBOOK_SANDBOX_SKIP_PIN env var; SPELLBOOK_SANDBOX_SKIP_CCO_PIN remains accepted for one release with a DEPRECATION warning per a documented truth table. The truth-table predicates use POSIX ${VAR+x} to distinguish unset-vs-empty so set-but-empty does not fall through to the legacy var.
install.py now writes ~/.local/bin/spellbook-cco automatically (audit-pinned elijahr/cco fork at d7044ef); the wrapper is the only spellbook-supported entry point. Vanilla nikvdp/cco is retained solely for the post-deploy rollback path via SPELLBOOK_USE_VANILLA_CCO=1. References docs/security.md for the full threat model and hardening details.
…spatch tests
WI-7's `_install_claude_code_aliases` calls `shutil.which("spellbook-cco")`
to gate the per-dir alias install on the wrapper's presence. Tripwire's
SubprocessPlugin always intercepts `shutil.which`; without an explicit
`assert_which` after the sandbox closes, the strict verifier raises
`UnassertedInteractionsError` at teardown when the binary is absent from
PATH.
This is the case in CI (ubuntu-latest and the non-dev macOS runner shape)
but NOT on the macOS dev environment, where `spellbook-cco` IS on PATH
during local runs -- masking the failure during pre-merge verification.
The dispatcher only runs on POSIX (LINUX/MACOS); Windows takes the
`install_aliases_windows` branch which does not call `shutil.which` for
the cco wrapper. Hence the new helper / inline assertion is gated on
`sys.platform != "win32"`, mirroring the existing
`_assert_powershell_which_if_windows` pattern.
Tests fixed (all in alias-dispatch sandboxes):
- tests/installer/test_claude_code_wires_default_mode_and_permissions.py:
- test_install_emits_default_mode_result
- test_install_emits_permissions_result
- test_install_writes_acceptedits_default_mode
- test_uninstall_emits_default_mode_and_permissions_results
- test_uninstall_clears_managed_default_mode_from_settings
- tests/installer/test_l2_derivation.py:
- test_claude_code_installer_uses_derived_deny
The assertion is exact-equality on both `name` and `returns`: any change
to the dispatcher's gating call (different binary name, dropped call,
duplicate call, real-path return) would fail the assertion.
The WI-5 byte-identity snapshot test (tests/test_security/test_agent_frontmatter.py:: test_existing_agent_byte_identical_to_snapshot) computes SHA-256 over the raw bytes of each existing agent file and compares against a committed snapshot generated on macOS (LF line endings). On Windows CI, git's default checkout converts LF to CRLF, so Path(...).read_bytes() returns CRLF-encoded bytes whose SHA does not match the committed LF-based snapshot. All 7 existing-agent snapshot parameter cases fail on Windows for this reason. Verified locally: the same file yields different SHAs under LF vs CRLF. Pin LF on the agent markdown files and on the snapshot file itself via .gitattributes. This makes byte-identity platform-agnostic by construction without changing the snapshot's semantics: "byte-identical to the committed snapshot" continues to mean exactly that, regardless of host OS or git autocrlf configuration. Cannot directly verify Windows behavior from macOS; the fix is mechanical (git's documented gitattributes contract) and the local test suite continues to pass (55 passed, 1 skipped).
…back
Rename '## Sandboxing with cco (macOS)' to '## Sandboxing with spellbook-cco (Linux + macOS)' so the auto-anchor matches the README link target ('#sandboxing-with-spellbook-cco-linux--macos'). Replace 'cco' with 'spellbook-cco' across the section body and CLI examples where the reference is to spellbook's own invocation; preserve 'cco' in the upstream nikvdp/cco attribution and in vanilla-fallback prose. Replace the now-stale 'curl | bash' install snippet with a one-liner noting that the spellbook installer handles spellbook-cco.
Add a new '### Rolling back to vanilla cco' subsection covering the SPELLBOOK_USE_VANILLA_CCO=1 env override: when to use it (catastrophic post-deploy breakage), the SBPL/DYLD hardening that the override bypasses, and how to remove the override afterward. The runtime WARNING wording remains owned by installer/components/spellbook_cco.py:_WARNING_USE_VANILLA_CCO; this doc paraphrases the *why* rather than duplicating the warning bytes.
Document the user-facing surface introduced by Task 13 of the spellbook-cco integration: new installer/components/spellbook_cco.py component, hardened SBPL profile (DYLD scrub + dylib-path denies + scoped process-exec + mach-priv-task-port deny), pin-bypass and rename of SPELLBOOK_SANDBOX_SKIP_PIN env var, macOS feature parity via install_spellbook_cco chained before install_aliases, and SPELLBOOK_USE_VANILLA_CCO=1 rollback path. Placed under the [0.57.0] banner adjacent to the prior Phase 5 narrowing-role subagents entry to keep all security architecture Phase 5 work grouped on this branch.
The install_spellbook_cco() function short-circuits to
{"action": "skipped", "skipped_reason": "spellbook-cco unavailable on
Windows"} when os.name == "nt". Eleven tests in
tests/installer/test_spellbook_cco.py drive the install path expecting
the POSIX result shape (action="installed", a clone on disk, a wrapper
file with shebang, etc.) and so fail on Windows CI under the
short-circuit.
Mark those eleven tests with @pytest.mark.posix_only so the existing
conftest.py skip rule excludes them on Windows. The Windows shape is
already covered indirectly by the constant tests at the top of the
file (which import the module and read its public attributes), and
the early-return is a one-line constant that does not need its own
test.
Tests deliberately left unmarked: the Tier-0 constant assertions
(test_pinned_sha_constant et al.), the dry-run short-circuits
(test_install_dry_run_does_not_clone, test_uninstall_dry_run_does_not_remove)
which return before the Windows check, and all uninstall tests since
uninstall_spellbook_cco has no Windows short-circuit.
…e duplicates Both Phase 5 entries (prior narrowing-role subagents from 4fa0f8c and the just-landed spellbook-cco fork integration from 6e00b2e) were appended under the released [0.57.0] - 2026-04-30 banner. The work landed in May 2026, well after [0.57.0] shipped and after seven subsequent banners (0.58.0 through 0.63.0) cut. Entries belong under [Unreleased] until the next version is cut. Per Keep-a-Changelog spec, each subsection appears at most once per release. Collapsed the duplicate ### Added and ### Changed subsections that resulted from stacking two separate entries under one banner. Dropped the non-standard ### Rollback heading; folded the SPELLBOOK_USE_VANILLA_CCO=1 escape-hatch bullet into ### Changed since it documents an operator-facing rollback knob shipped alongside the fork integration. Split the SKIP_PIN rename across two subsections: the new name SPELLBOOK_SANDBOX_SKIP_PIN is documented under ### Changed (it replaces the previous variable), and the legacy SPELLBOOK_SANDBOX_SKIP_CCO_PIN deprecation is documented under ### Deprecated with the one-release removal timeline preserved. [0.57.0] retains its original pre-Phase-5 content unchanged: PR Review Bot details (### Added), README/docs harness-augmentation positioning (### Changed), and the bigfoot -> tripwire migration (### Removed).
…-phase-5 # Conflicts: # CHANGELOG.md
Phase 4.6.3 green-mirage audit surfaced 6 IMPORTANT findings on the
spellbook-cco fork integration test suite — all substring-on-known-
canonical patterns where production code is correct but the assertions
were too lax to defend against silent prose drift.
- Import _WARNING_USE_VANILLA_CCO / _WARNING_PATH_NOT_SET /
_WARNING_SKIP_FORK_PIN as the chokepoint and assert full stderr
equality (or k * canonical for double-emit codepaths) instead of
fragmented "WARNING:" + "VAR=1" substring pairs.
- Replace partial InstallResult field-by-field assertions in chain-
wiring tests with full-record equality on the dataclass.
- Add a shared events list to test_install_chains_install_spellbook_cco
to verify that install_spellbook_cco runs strictly before the per-dir
alias dispatcher, matching the docstring contract.
- Tighten lax message substrings ("cco" / "spellbook-cco") to canonical
skipped_reason strings produced by _install_claude_code_aliases.
No production code changes. 63 tests pass, ruff clean.
|
Test-tightening commit |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements Phase 5 of the security architecture, introducing nine narrowing-role subagents and integrating a hardened spellbook-cco sandbox fork for Linux and macOS. Key changes include a rewrite of the spellbook-sandbox launcher, new installer components for agent and fork management, and extensive cleanup of unused imports and expressions across the codebase. Feedback identifies a high-severity omission of the required version bump in .version and multiple violations of the repository's strict mocking policy, which forbids unittest.mock and monkeypatch.setattr in favor of the tripwire framework. Additionally, several dead expressions were found in the admin and CLI routes where calculated results are discarded.
| The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), | ||
| and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). | ||
|
|
||
| ## [Unreleased] |
There was a problem hiding this comment.
The version bump in the .version file is missing from this pull request. Every PR must include a version bump following semantic versioning. Given the scope of changes (new agents and installer components), a minor bump (0.X.0) is likely required.
References
- Every PR must include a version bump in the .version file following semantic versioning. Flag as high severity if missing. (link)
| with ( | ||
| mock.patch("generate_diagrams.get_source_diff", return_value=the_diff), | ||
| mock.patch("generate_diagrams.get_agent_client", return_value=client) as mock_get_client, | ||
| mock.patch("generate_diagrams.get_agent_client", return_value=client), |
There was a problem hiding this comment.
This uses unittest.mock.patch, which is strictly forbidden in this repository. All mocking must be performed using the tripwire framework. Rewrite this test using tripwire's three-step flow: register the mock before the sandbox, execute the code under review inside a with tripwire: block, and assert the interaction after the sandbox.
References
- Flag as high severity any use of unittest.mock in any form. tripwire is the ONLY acceptable framework. (link)
| f"got args=({sb_dir!r}, dry_run={dry_run!r})" | ||
| ) | ||
|
|
||
| monkeypatch.setattr(platform_mod, "get_platform", lambda: Platform.LINUX) |
There was a problem hiding this comment.
monkeypatch.setattr is used here to replace a function (get_platform), which is forbidden for mocking purposes. Tripwire is the required framework for mocking functions. Convert this to use tripwire.mock("spellbook.core.compat:get_platform").returns(...).
References
- monkeypatch.setattr used to replace functions, methods, class attributes, or module-level callables is NOT allowed. tripwire is required. (link)
| location without writing to ``~/.local/bin``. | ||
| """ | ||
| wrapper_dir, wrapper_path = _empty_wrapper_dir(tmp_path) | ||
| monkeypatch.setattr(spellbook_cco, "SPELLBOOK_CCO_WRAPPER_DIR", wrapper_dir) |
There was a problem hiding this comment.
monkeypatch.setattr is used to replace module-level constants for mocking, which is forbidden. Use tripwire.mock.object or tripwire.mock to manage these dependencies within the tripwire sandbox.
References
- monkeypatch.setattr used to replace functions, methods, class attributes, or module-level callables is NOT allowed. (link)
| monkeypatch.setattr( | ||
| spellbook_hook, "STOP_HARVEST_CACHE_PATH", per_test_cache, | ||
| ) |
There was a problem hiding this comment.
Replacing module-level attributes like STOP_HARVEST_CACHE_PATH using monkeypatch.setattr is forbidden for mocking. Please refactor this to use tripwire to manage the mock state.
References
- monkeypatch.setattr used to replace functions, methods, class attributes, or module-level callables is NOT allowed. (link)
| which_calls.append(name) | ||
| return "/usr/local/bin/spellbook-cco" if name == "spellbook-cco" else None | ||
|
|
||
| monkeypatch.setattr(tui.shutil, "which", which_router) |
There was a problem hiding this comment.
monkeypatch.setattr is used to replace shutil.which, which is forbidden. Use the tripwire.subprocess domain plugin (e.g., tripwire.subprocess.mock_which(...)) to handle this interaction.
References
- monkeypatch.setattr used to replace functions, methods, class attributes, or module-level callables is NOT allowed. (link)
| count_result = await session.execute(count_query) | ||
| total = count_result.scalar_one() | ||
| pages = max(1, math.ceil(total / per_page)) | ||
| max(1, math.ceil(total / per_page)) |
| ) | ||
|
|
||
| result = await asyncio.to_thread(set_config_value, key, body.value) | ||
| await asyncio.to_thread(set_config_value, key, body.value) |
| def _run_export(args: argparse.Namespace) -> None: | ||
| """Execute ``spellbook session export SESSION_ID``.""" | ||
| json_mode = getattr(args, "json", False) | ||
| getattr(args, "json", False) |
| filename = str(body.get("filename", "")) | ||
| content = str(body["content"]) | ||
| is_primary = bool(body.get("is_primary", False)) | ||
| bool(body.get("is_primary", False)) |
What does this PR do?
Ships Work Item 5 of the security architecture: 9 narrowing-role subagent definitions in
agents/plus a symlink-based discovery installer that bridges$SPELLBOOK_DIR/agents/to$CLAUDE_CONFIG_DIR/agents/, since Claude Code 2.1.x discovers agents only from the latter.Related issue
Checklist
What this branch ships
agents/:implementer,web-researcher,git-committer,git-pusher,pr-creator,pr-merger,jira-reader,jira-mutator,test-runner. Each declares a narrowingtools:list — the effective tool set is(parent_tools ∩ frontmatter_tools)— so a subagent can never gain a capability the parent does not already hold.installer/components/agents.py): idempotentinstall_agents/uninstall_agentsthat symlink$SPELLBOOK_DIR/agents/*.mdinto$CLAUDE_CONFIG_DIR/agents/. Wired intoinstaller/platforms/claude_code.pyper-config-dir loop. User-authored agent files at the target path are preserved; only spellbook-pointing symlinks (including broken ones) are removed on uninstall.tests/test_security/test_agent_frontmatter.py): enforces the canonical 5-section body contract (Purpose / Tools / Output Schema / Guardrails / Constraints) and SHA-256-snapshots the existing 7 agents to catch unintended modification.tests/installer/test_agents_symlink.py): all idempotence branches (unchanged / upgraded / skipped-user-symlink / skipped-user-regular-file / installed) and uninstall safety (only removes spellbook-pointing symlinks; preserves user files).Why the scope was reframed
Two pre-implementation verification spikes (Sec 9.1 and 9.2) corrected the original framing:
tools:in agent frontmatter is a narrowing list, not a capability granter. The original "capability escalation" framing was dead on arrival; this branch ships narrowing-role agents instead.$SPELLBOOK_DIR/agents/— discovery is rooted at$CLAUDE_CONFIG_DIR/agents/. Hence the symlink installer step.Drive-by fix (test isolation)
tests/test_hooks/test_memory_auto_store.pyhad a latent test-order bug:STOP_HARVEST_CACHE_PATHwas bound to a real~/.local/spellbook/cache/...file at import time, which let pytest tmp-dir reuse silently short-circuit_handle_stopvia the idempotency cache. Added anautouse=Trueclass fixture that rebinds the cache path per-test and forcesfeature_enabled("transcript_harvest")to False, immunizing the regex-path tests from worker-LLM config leakage.Deferred / out of scope
claudesmoke test in a freshCLAUDE_CONFIG_DIR=$(mktemp -d)— operator post-merge verification.web-researcheris authored with the correcttools:list but explicitly flagged in its body as requires WI-8 (devcontainer) before being safe to dispatch in production.code-reviewerandjustice-resolveragents intotools:-frontmatter compliance — separate cleanup task per the brief.Verification
tests/test_security/test_agent_frontmatter.py+tests/installer/test_agents_symlink.py.tools:mapping for all 9 new agents (parametrized overEXPECTED_NEW_AGENTS); byte-snapshot test guards the 7 existing agents.L4,tier classifier,tier-based,capability escalation,WI-4) and forbidden-claim scans (no over-attribution of git/gh subcommand blocks to the bash gate) both clean.