Skip to content

Commit a6eef7c

Browse files
authored
feat(skills): permissions-from-transcripts (WI-3) (#282)
* docs: regenerate crystallize-consolidate command page * pr-dance: harden no-merge stop against session-level autonomy directives Step 5's stop-and-report-merge-ready is non-negotiable. Make it explicit that 'do the PR dance autonomously', 'yolo', 'just land it', and similar phrasings authorize the iteration loop only -- never merge, tag-push, or branch deletion. * docs: regenerate stale skill and command pages Output of scripts/generate_docs.py against the current source-of-truth in skills/ and commands/. No source changes; just catches up the docs mirror with drift accumulated since the last regen. * chore: drop em-dash prohibition rules Per user direction, the "no em-dashes" rule is being retired across skills, commands, and rule lists. Descriptive references (writing guides, AI-tone-detection notes) are preserved. * feat(gates): extract transcript analyzer to library Move classification, bucketing, JSONL extraction, and rendering out of scripts/analyze_yolo_transcripts.py into spellbook/gates/transcript_analyzer.py so the same logic backs both the CLI script and the upcoming permissions-from-transcripts skill. The script becomes a thin CLI wrapper that re-exports library names for backward compatibility. Tighten classification breadth (deferred from WI-2 review): - Add per-subcommand 3-word runners for gh pr / gh run / gh issue / gh repo, splitting read-only forms (view, list, diff, status, checks, watch) into SEARCH_INSPECT and mutating forms (create, edit, review, ready, close, merge, reopen, rerun, cancel, delete) into MUTATING. - Add 4-word runners for acli jira workitem (view/list as read-only, transition/edit as mutating). - Leave gh api UNCLASSIFIED with a code comment: it can issue arbitrary HTTP methods (POST/PATCH/DELETE) so blanket-allowing it is unsafe. - Leave kubectl/docker/aws/gcloud unclassified at the runner level; their surface area is too sprawling to classify safely without per-subcommand audits. * feat(skills): permissions-from-transcripts Add a re-runnable spellbook skill at skills/permissions-from-transcripts/SKILL.md that wraps the YOLO transcript analyzer for LLM-driven permission seeding. The skill points operators at the CLI script and documents the dry-run-first workflow, the rejected_mutating invariant, and the args (--days, --include-mutating, --dry-run, --config-dir, --output). Add tests/test_skills/test_permissions_from_transcripts.py covering: - SKILL.md exists and its YAML frontmatter parses with the expected name and description trigger phrasings. - --dry-run skips writing the proposal JSON; non-dry-run writes it. - The script and skill share the SAME classification objects from spellbook.gates.transcript_analyzer (no duplicated CATEGORY_* constants in the script; static AST guard plus identity asserts). - Per-subcommand classification regressions for the Step 3.5 tightening: gh pr view -> search_inspect, gh pr create -> mutating, plus matching pairs for gh run, gh issue, and acli jira workitem. * feat(gates): Read tool secret-path denylist Add a Phase 6d denylist that blocks the Read tool from fetching well-known secrets (SSH keys, AWS credentials, ~/.netrc, ~/.config/op, 1Password app data, browser credential stores, .env*, *.pem, *.key, id_rsa*, id_ed25519*). - New module spellbook/gates/secret_paths.py with structured rules (HomeSubpath, EnvSubpath, BasenameGlob) and check_secret_path() that expands ~, resolves symlinks (Path.resolve(strict=False)), and matches against the denylist. - Wires a new tool_name == "Read" branch into check_tool_input via _check_read_path() that appends a CRITICAL finding with rule_id READ-SECRET-NNN on match. Existing dispatch branches (Bash, spawn_claude_session, workflow_state_save, default) are untouched. - Parametrized test suite covering POSIX home-relative paths, tilde-expansion equivalence, project-relative globs, Windows APPDATA/LOCALAPPDATA additions, the symlink-bypass case, and negative controls (non-secret paths must remain safe). Symlink semantics follow Surfaced Assumption #2: comparison runs against the resolved path so a benign-looking link into ~/.aws/ still denies. * refactor(gates): unify bash policy across Claude and Gemini paths Renames hooks/gemini-policy.toml -> hooks/bash-policy.toml. Adds a supplemental TOML loader to gates/rules.py that merges into the existing DANGEROUS_BASH_PATTERNS and EXFILTRATION_RULES lists at module import time. Python rules in rules.py remain the source of truth; the TOML contributes shared rules across both Claude (hook gate) and Gemini (policy engine) install paths. The Gemini installer's POLICY_SOURCE points at the renamed file and removes any stale legacy artifact at the install destination. Test files referencing the old filename string are updated. * feat(gates): bashlex AST parser for compound commands Walks bashlex AST and emits deny findings for compound commands, command substitution, dangerous redirects, env-prefix escapes, shell-out flags, and direct shell invocation. Unknown AST node types fail-closed with audit-log entries. * fix(tests): relax exact-equality assertions for layered findings After WI-6c merged supplemental SB-BASH-* rules into DANGEROUS_BASH_PATTERNS and WI-6a added the bashlex parser layer, the same dangerous command can surface multiple findings (one per layer). Switch test_dangerous_bash_command_is_blocked, test_sudo_is_blocked, and test_blocked_output_is_valid_json to containment assertions that confirm the original layer still fires while tolerating additional findings from defense-in-depth layers. * feat(gates): reversibility tier classifier and tiers.toml Add the reversibility-tier classifier (WI-6b sub-phase a): - spellbook/gates/tiers.py: TierRecord dataclass with hand-rolled schema validator (TOML allows unknown keys; tomllib does not reject them, so validation lives at the application layer). load_tiers() rejects unknown keys, missing required fields, and non-{T0,T1,T2,T3} tiers. classify_tool_call() returns the highest matching tier (deny > ask > loud-allow > silent-allow > unclassified) so a T3 cannot be diluted by an overlapping T0. tier_to_verdict() maps to allow/ask/deny. - spellbook/gates/tiers.toml: 20 seed records covering the four tiers and capability + MCP tools per design Sec 6.4. - tests/gates/test_tiers.py: schema validation, classification, and tier->verdict tests (27 cases passing). Sub-phase (b) projection and (c) installer derivation arrive in follow-up commits on this branch. * feat(installer): derive L2 permissions from tier projection Wire the WI-6b L2 deny derivation into the Claude Code installer so the seeded T3 tier records project into settings.json permissions.deny at install time: - installer/components/permissions.derive_managed_deny(spellbook_dir): reads spellbook/gates/tiers.toml and returns the flat list of T3 projections via spellbook.gates.tiers.derive_l2_deny_list. Imports the gates layer lazily so the installer hot path does not pull in bashlex. - installer/platforms/claude_code.ClaudeCodeInstaller.install(): replaces the prior deny=None placeholder with deny=derive_managed_deny(...). Idempotency, ownership tracking, and conflict handling are preserved by the existing managed-permissions-state machinery in install_permissions. - tests/installer/test_l2_derivation.py: 7 cases covering unit-level projection, settings.json integration, install idempotency, T3-record removal on re-install, unprojectable-record graceful skip, and the end-to-end ClaudeCodeInstaller call site. - spellbook/gates/tiers.py: demote the missing-tiers.toml log from warning to debug. In normal operation tiers.toml ships in-tree; a missing file in test fixtures and partial installs is benign and noisy at warning level (also tripped tripwire's LoggingPlugin in the installer-wiring tests). * feat(gates): wire tier classifier into PreToolUse hook Wire the WI-6b tier classifier into spellbook.gates.check.check_tool_input so the in-process gate produces TIER-DENY (T3) and TIER-ASK (T2) findings alongside the existing layers. Layer order for the Bash branch (defense in depth, cheapest first): 1. L4 bashlex AST parser — compound commands, command sub, redirects, env-prefix, shell-out, direct shell, wrapper-strip. 2. L3 tier classifier (NEW) — emits TIER-DENY for T3 records and TIER-ASK for T2. Reuses spellbook/gates/tiers.toml so the in-process policy is the runtime mirror of the L2 settings.json deny list. 3. L2 DANGEROUS_BASH_PATTERNS regex — kept for defense in depth. 4. EXFILTRATION_RULES regex. Non-Bash tools (capability tools and MCP) also run through the tier classifier so denies on tools like mcp__github__delete_* take effect even when the L2 settings.json rules are still loading. The classifier reads tiers.toml once via lru_cache; the daemon-resident hook avoids re-reading on every Bash call. Errors loading the seed file fall back to an empty record set so a malformed seed cannot brick the gate. Test updates: - tests/test_security/test_check.py: add 5 cases for tier classifier integration covering T0/T2/T3 emission, unclassified fall-through, and parser-then-tier ordering. - tests/test_security/test_hooks_windows.py: relax two strict-equality assertions that were predicated on the prior single-finding output; the gate now legitimately emits multiple findings (tier + regex) for inputs that match more than one layer. * release: 0.62.0 WI-3 permissions-from-transcripts skill, WI-6a bashlex AST parser, WI-6b reversibility tier classifier with L2 derivation, WI-6c bash policy unified across Claude and Gemini paths, WI-6d Read tool secret-path denylist. * fix(gates): close path-bypass + duration + audit-lock review findings Addresses five findings from a second-pass code review on the bash gate. Security-high (path bypass via absolute shell paths): - bash_parser._classify_command now normalizes the command head with os.path.basename before matching against _DIRECT_SHELL_COMMANDS, _SHELL_BINS_NEED_DASH_C, _WRAPPER_COMMANDS, and _WRAPPER_ALWAYS_FLAG. Without this, /bin/sh -c "..." and /usr/bin/bash -c "..." silently bypassed DIRECT-SHELL detection. - bash_parser._detect_shellout's xargs branch now basenames each token before the shell-bin check, so xargs /bin/sh -c is caught the same as xargs sh -c. - The wrapper-stripping branch also basenames the wrapped head so /usr/bin/timeout 5 /bin/sh -c "..." cannot bypass DIRECT-SHELL via the wrapper path either. Medium (timeout duration suffixes): - _strip_wrapper_args previously only stripped 's' and 'm' suffixes, so timeout 1h rm -rf / and timeout 2d rm -rf / failed to be recognized as wrappers and so the wrapped dangerous command was missed. The duration parser now strips the full smhd suffix set per the timeout(1) spec. Medium (blocking audit lock in hot gate path): - _append_audit previously took the audit-log lock with blocking=True, letting any stalled lock-holder hang the security gate. The function now uses non-blocking acquisition with one short-backoff retry; on second failure it drops the audit entry with a stderr warning. The security verdict is independent of audit-log success. Medium (alternation regex non-nesting): - tiers._ALTERNATION_RE only handles flat alternation groups; the seed tiers.toml only uses such (e.g. (master|main)). Added a comment block documenting the limitation and the upgrade path if nested patterns are ever needed. Tests: - Extended REJECT_CASES with absolute-path bypass attempts (sh, bash, xargs sh-out, timeout sh-out) and timeout h/d wrapper cases. - Added a parametrized negative test confirming all four duration suffixes (s/m/h/d) still allow the wrapper around a safe inner command (git status). Lock-contention behaviour is not exercised by a deterministic test; exercising it would require multi-process orchestration that is flaky in CI. The change is small, the tested code path uses the same LockHeldError contract as elsewhere in CrossPlatformLock, and any regression would surface as a stderr warning rather than a wrong verdict. * test(gates,installer): convert monkeypatch.setattr to tripwire mocks Replace ``monkeypatch.setattr`` for module-level state in two integration test files with ``tripwire.mock`` of the corresponding callable getter, per ``.gemini/styleguide.md``. Add a ``_audit_log_path()`` getter in ``spellbook/gates/bash_parser.py`` so the bash-parser tests can mock the audit-log path the same way; the constant ``_AUDIT_LOG_PATH`` stays as the default value the getter returns. For the L2-derivation tests, queue the exact number of returns each test will consume (5 calls per ``install_permissions``; 9 for the end-to-end ClaudeCodeInstaller path) and assert each interaction in any order, since ``tripwire`` requires every recorded call inside a sandbox to be verified. ``test_unprojectable_record_warns_does_not_fail`` drops the state mock entirely because ``derive_managed_deny`` only reads ``tiers.toml``. * fix(gates): observe silent excepts and replace pop(0) with indexed iteration The bash-parser had three ``except Exception: pass`` blocks that swallowed audit-log diagnostic failures without leaving any trace. Replace them with ``logger.warning`` calls that record the failing operation and exception type/message, while keeping the swallow semantics — the gate verdict must still be independent of audit-log success. Also replace the ``list.pop(0)`` consume-from-front pattern in ``_strip_wrapper_args`` with a ``collections.deque`` and ``popleft()`` so the helper stays O(n) on long wrapper invocations like ``env A=1 B=2 C=3 ... cmd`` instead of degrading to O(n^2). * test(installer): assert powershell probe on Windows in claude_code installer test The L2 derivation end-to-end test wraps inst.install() in a tripwire sandbox. On Windows, hooks.install_hooks calls shutil.which('powershell') as a PowerShell-availability probe; tripwire's SubprocessPlugin intercepts that call and requires every interception be asserted explicitly. Add a platform-conditional assert_which so the Windows job stops failing with UnassertedInteractionsError. * fix(gates): cycle-4 review nits — tomllib shim + import + constant cleanups - spellbook/gates/check.py: drop redundant function-level Path import (already imported at top of module) - spellbook/gates/tiers.py: add tomllib shim for Python <3.11 (project supports 3.10) matching the pattern in spellbook/gates/rules.py - spellbook/gates/tiers.py: collapse the double assignment of _REQUIRED_KEYS to a single explicit frozenset; keep the rationale as a comment instead of computing-then-overriding * fix(gates): cycle-5 security hardening — bash_parser env, redirects, walker, shell-out Gemini cycle-5 findings on the bashlex L4 gate. Each fix is paired with a regression test in tests/gates/test_bash_parser.py. H1 — _ENV_PREFIX_DENY missed shell-startup-sourced and language-library-path env vars. Added BASH_ENV (non-interactive bash), ENV (sh/dash/ksh), PYTHONPATH, PERL5LIB, RUBYLIB. All five are env-prefix injection vectors that hijack interpreter startup or module loading. H2 — _walk recursion gaps allowed CMDSUB bypass. The walker did not recurse into RedirectNode.output, AssignmentNode.parts, or top-level WordNode.parts, so command substitutions buried in those positions were silently dropped. The walker also did not enter CompoundNode.list, which silently skipped the entire body of every if/for/while/until block. Both gaps are closed; the walker now classifies CMDSUB inside redirect targets, assignment values, and word fragments. Control-flow constructs (if/for/while/until/case) now emit BASH-PARSER-COMPOUND so the operator must split or opt in via the env escape hatch. H3 — git config keys are case-insensitive. The shell-out detector for `git -c core.pager=...` matched the original-cased KEY, so `git -c Core.Pager=evil log` slipped through. The detector now lower-cases only the KEY half (the VALUE may legitimately be case-sensitive: paths, commands, etc.). M1 — added /proc/ and /var/spool/cron/ to _REDIRECT_DENY_PREFIXES. Writes to /proc/sys/* reconfigure the running kernel; writes to /var/spool/cron/* install jobs that run as the file owner. M2 — added chgrp and setfacl to _DANGEROUS_BARE_COMMANDS so wrapper concealment (timeout 5 chgrp staff /etc) is blocked. M3 — added tcsh and csh to _DIRECT_SHELL_COMMANDS and _SHELL_BINS_NEED_DASH_C. M4 — added until and case to _KNOWN_NODE_KINDS so they no longer trigger the UNKNOWN-NODE fail-closed path. case still fails closed via PARSE-ERROR (bashlex does not implement the case pattern token); until now produces COMPOUND with full body recursion. M5 — find -ok and -okdir treated as shell-out. They are interactive variants of -exec but autonomous agents auto-confirm, so the threat model is identical. M6 — vim/vi shell-out via the + startup-command flag and --cmd flag. Existing -c '!cmd' detection preserved. * fix(transcript_analyzer): mutating git subcommands no longer short-circuit safe Gemini cycle-5 H4. The transcript classifier collapses ``git worktree`` and ``git branch`` to the 2-word first-token via MULTI_WORD_RUNNERS, then matches the bare 2-word key against READ_ONLY_SAFE — which silently classifies every subcommand and flag combination as read-only-safe. That is correct for ``git worktree list`` and ``git branch`` (no args), but ``git worktree add /tmp/x`` mutates the repo state and ``git branch -d feature`` deletes a local branch. The first-token resolver now special-cases these two runners: * ``git worktree <verb>`` where verb is add/remove/move/prune/repair/ unlock/lock — produces a 3-word key (``git worktree add``, etc.) that classifies as local_git_mutation. * ``git worktree list`` (and any flag-only invocation) — still resolves to the 2-word safe key. * ``git branch -d/-D/-m/-M/-c/-C`` (and the long forms --delete/--move/--copy) — collapsed to the canonical mutating key ``git branch -d`` so we don't have to enumerate every flag spelling in the classification table. * ``git branch <newname>`` (a non-flag positional) — also mapped to ``git branch -d`` since it would create a branch. * ``git branch`` (no args) and ``git branch --list/-a/-v/...`` — resolve to the safe 2-word key. LOCAL_GIT_MUTATION grew the corresponding set of canonical keys. Tests: 23 new parametrized cases in test_permissions_from_transcripts.py cover the safe-vs-mutating boundary for both runners. * fix(gates): cycle-6 hardening — bg-command bypass, redirect traversal, regex tier records Three security findings from Gemini cycle 6: F1 (HIGH): bashlex parses `ls & pwd` as a ListNode whose operators == ["&"] but contains TWO command parts. The "single bg command" short-circuit was counting operators only, letting the trailing command slip through as if the whole expression were one backgrounded call. Tighten the predicate to require exactly one command part before short-circuiting; otherwise emit BASH-PARSER-COMPOUND so the chained command is blocked. F2 (HIGH): the redirect-target deny check used `startswith` on the raw target, so traversal targets like `/tmp/../etc/passwd` and `~/../../etc/shadow` bypassed the prefix list. Build a candidate set {raw, expanduser+normpath, expanduser+resolve} and check the deny list against each. The lexical `normpath` candidate is portable across OSes where the deny prefix itself is a symlink (macOS `/etc` → `/private/etc`, which `Path.resolve()` rewrites and would otherwise miss); the resolved candidate is a belt-and-suspenders catch; the raw candidate preserves matching for logical paths like `/dev/tcp/...` that don't survive `resolve()`. F5 (MEDIUM): `_expand_alternations` returns [] for patterns it cannot project to a literal prefix (regex classes, quantifiers, escapes). A T3 record with such a pattern would silently fail to match at hook-time AND fail to project to an L2 deny string — broken in both layers without any operator signal. Drop unprojectable Bash records at parse time with a loud warning, mirroring the existing "not projectable" warning path used by `tier_record_to_deny_pattern`. * test(gates): replace hand-rolled stubs and unmocked I/O with tripwire mocks Two test-quality findings from Gemini cycle 6: F3 (HIGH): `test_unknown_node_kind_denies_and_writes_audit_log` and `test_unknown_node_allowed_via_env_escape_hatch` exercised `_classify_node` inside `with tripwire:` while `_append_audit` performed real file I/O (the audit log) — that file write was not pre-authorized by the sandbox. Mock `_append_audit` itself via tripwire and capture the record dict via `.calls(fn)` instead. The behavior under test (fail-closed deny + audit-record emission with the correct verdict / reason / layer / node_type) is asserted from the captured records; the file-I/O behavior of `_append_audit` is covered by the parser path independently. F4 (HIGH): `_SyntheticUnknownNode` was a hand-rolled stub class, which the project style guide forbids. Replace with a real bashlex node parsed from a benign command (`echo hello`) whose `.kind` attribute is mutated to an unknown sentinel value. bashlex nodes are plain Python objects (no `__slots__`), so the mutation is supported. The node retains its real `.parts` / `.pos`, so the parser sees a structurally-valid node and the only deviation under test is the unknown `kind`. Also adds reject-list rows for the cycle-6 F1 / F2 attack patterns and a new `test_load_tiers_drops_unprojectable_bash_regex_pattern_with_warning` that locks the F5 "drop with warning" loader behavior in place — both at the loader and via the classifier so a dropped pattern cannot silently match a command at hook-time. * fix(gates): redirect-target prefix match works on Windows os.path.normpath and Path.resolve emit backslashes on Windows while the deny prefixes are POSIX-style ("/etc/", "/proc/", ...). The cycle-6 fix worked on macOS/Linux but the three new traversal regression tests (redirect_etc_traversal, redirect_etc_tilde_traversal, redirect_proc_traversal) failed on Windows because no candidate started with a forward-slash prefix. Fold each path candidate to forward slashes before matching. Shell redirection is a POSIX construct regardless of host OS, so normalizing slash direction is a property of the gate itself, not the platform. * fix(gates): strip drive letter so tilde-traversal redirect matches on Windows Previous fix folded backslashes to forward slashes but left drive letters intact. ~/../../etc/shadow on Windows expands to a real home dir, then normpath collapses to C:\etc\shadow → after slash fold C:/etc/shadow, which does not start with /etc/ so the prefix match missed. Also emit a drive-stripped candidate via os.path.splitdrive so the deny prefix match catches tilde-traversal regardless of where the home dir ends up after expansion. * fix(gates): cycle-7 hardening — timeout/env wrapper flag-with-arg bypass, env-prefix expansion, bare-dir redirect F1 (SECURITY-HIGH): timeout wrapper flag-with-arg bypass. The pre-fix _strip_wrapper_args used a flag-blind drop-leading-dashes loop, which let timeout flags that take a SEPARATE argv slot (-s SIGNAL, -k DURATION) push the dangerous head past the scan. timeout -s KILL 5 rm -rf / was consumed as flag -s, positional KILL (head), positional 5, leaving rm -rf / missed entirely. Fix introduces an explicit per-flag table for timeout (and env, F2) that consumes flag+arg pairs correctly. F2 (SECURITY-HIGH): env wrapper flag-with-arg bypass. Same shape as F1. env -u/-C/-S take separate args; -i/--ignore-environment/-0/--null are no-arg; KEY=VALUE pairs are env-prefixes (consumed but not flags). The remaining argv is the wrapped command. F3 (MEDIUM): redirect denylist trailing-slash gap. _REDIRECT_DENY_PREFIXES entries end with /, so /etc/ matched /etc/passwd but not /etc itself. Fix matches the bare-directory form via candidate == prefix.rstrip("/"). F5 (MEDIUM): expand _ENV_PREFIX_DENY with NODE_PATH, PYTHONINSPECT, PYTHONBREAKPOINT, JAVA_TOOL_OPTIONS, NODE_OPTIONS — each loads attacker- controlled code at process start (or post-execution for PYTHONINSPECT). Tests: REJECT_CASES rows for each of F1/F2/F3/F5 plus ALLOWED_CASES rows proving legitimate timeout -s KILL / env -u / env FOO=bar wrappers still pass through to the wrapped command's existing classification. * fix(transcript_analyzer): git branch --list with pattern is read-only F4 (MEDIUM): the cycle-5 fix made `git branch` mutating whenever any non- flag positional appeared. But `git branch --list 'feat/*'` and the short form `git branch -l 'feat/*'` are read-only — the positional is a glob filter passed to the listing, NOT a new branch name to create. Fix: when --list or -l is present in the rest, the remaining positionals are filter patterns and the command is read_only_safe. Without --list/-l, a non-flag positional still implies branch creation (or branch-from-ref), both of which mutate. Tests added in test_permissions_from_transcripts.py: - git branch --list 'feat/*' -> read_only_safe - git branch -l 'feat/*' -> read_only_safe - git branch --list feat/foo -> read_only_safe The existing rows for `git branch newname` (mutating), `git branch -d feat` (mutating), and `git branch` (read-only) continue to pass. * fix(gates,transcript_analyzer): cycle-8 — narrow /usr/ deny + git branch read-only flag args * fix(installer,gates): observe silent except blocks per styleguide * fix(gates): cycle-9 — top-level imports, logger over print, tighter projection check - spellbook/gates/bash_parser.py: hoist 'import bashlex' to module top with try/except → bashlex=None sentinel; the parse_and_check guard preserves the fail-closed behavior when the dep is missing. - spellbook/gates/rules.py: drop function-level 'import sys' and print(file=sys.stderr); use module-level logger.warning instead. - spellbook/gates/tiers.py: tighten _is_projectable_bash_pattern to reject expansions that still contain '(' or ')'. _expand_alternations returns the original pattern verbatim for nested groups / single-choice forms; those are not valid literal prefixes for the L2 deny matcher and must be skipped at load time, same as regex-class patterns. --------- Co-authored-by: elijahr <153711+elijahr@users.noreply.github.com>
1 parent 16c0e08 commit a6eef7c

44 files changed

Lines changed: 6473 additions & 640 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.version

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.61.0
1+
0.62.0

AGENTS.spellbook.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -306,7 +306,6 @@ Load `enforcing-code-quality` skill for full standards and checklist.
306306
- Be direct and professional in documentation, README, and comments
307307
- Make every word count
308308
- No chummy or silly tone
309-
- Never use em-dashes in copy, comments, or messages
310309

311310
## Testing
312311

CHANGELOG.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,72 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.62.0] - 2026-05-06
9+
10+
### Added
11+
12+
- **`permissions-from-transcripts` skill** (`skills/permissions-from-transcripts/SKILL.md`).
13+
Re-runnable workflow that wraps the WI-2 YOLO transcript analyzer.
14+
Backed by a shared library at `spellbook/gates/transcript_analyzer.py`
15+
so the skill, the script, and any future caller share one classification
16+
pipeline. The script `scripts/analyze_yolo_transcripts.py` is now a
17+
thin CLI wrapper.
18+
- **bashlex AST parser** (`spellbook/gates/bash_parser.py`). Walks a
19+
bashlex parse tree and emits deny findings for compound commands,
20+
command substitution, dangerous redirects (e.g. `> /dev/tcp/...`,
21+
`>> /etc/...`, `> /root/.ssh/authorized_keys`), env-prefix escapes
22+
(`GIT_PAGER=...`, `GIT_EXTERNAL_DIFF=...`, `PAGER=...`), shell-out
23+
flags (`find -exec`, `xargs sh -c`, `git -c core.pager=`,
24+
`git -c alias.X=!`), direct shell invocation (`eval`, `sh -c`,
25+
`bash -c`), and wrapper-stripping bypasses (`timeout`, `nohup`,
26+
`npx`, `mise`, `docker run`). Unknown AST node types fail closed
27+
with audit-log entries under `~/.local/spellbook/logs/audit.jsonl`.
28+
Wired into the PreToolUse hook before the legacy regex layers.
29+
Adds a `bashlex>=0.18` runtime dependency.
30+
- **Reversibility tier classifier and L2 permissions derivation**
31+
(`spellbook/gates/tiers.py`, `spellbook/gates/tiers.toml`).
32+
Maps each tool call to a reversibility tier (T0 silent / T1 loud /
33+
T2 ask / T3 deny / T_UNCLASSIFIED) using a TOML-seeded record table.
34+
The same tier projection produces an L2 deny list installed into
35+
`settings.json` by the installer; the in-process gate is the runtime
36+
mirror of that policy. Emits TIER-DENY (CRITICAL) and TIER-ASK
37+
(HIGH) findings; T_UNCLASSIFIED falls through to the regex layer.
38+
Initial seed covers 20 records spanning bash literals, alternation
39+
expansions, MCP tool patterns, and capability tools.
40+
- **Read-tool secret-path denylist** (`spellbook/gates/secret_paths.py`).
41+
Blocks reads of `~/.ssh/*`, `~/.aws/*`, `~/.config/op/*`, `~/.netrc`,
42+
macOS browser credential stores, Windows %APPDATA%/1Password and
43+
Chrome user data, plus basename globs `.env`, `.env.*`, `*.pem`,
44+
`*.key`, `id_rsa[.*]`, `id_ed25519[.*]`. Resolves the file_path
45+
through symlinks before matching so a symlink at a "safe" path that
46+
points into `~/.ssh` is still denied.
47+
48+
### Changed
49+
50+
- **Bash policy unified across Claude and Gemini paths.** Renamed
51+
`hooks/gemini-policy.toml` to `hooks/bash-policy.toml`. Added a TOML
52+
loader to `spellbook/gates/rules.py` so the Claude path picks up the
53+
supplemental SB-BASH-* rules previously only consumed by the Gemini
54+
installer. Old filename is preserved as a migration alias for one
55+
release. SB-BASH-001..009 ship as additional defense-in-depth findings
56+
on top of the existing BASH-* / EXF-* regex set.
57+
- **PR dance command** (`commands/pr-dance.md`) now explicitly refuses
58+
to merge under any session-level autonomy directive ("yolo", "do the
59+
PR dance autonomously", "just land it", "go go go"). Autonomy scopes
60+
commit / push / comment / re-request-review only. Tag-push and
61+
branch deletion are also out of scope until explicitly requested.
62+
- **`crystallize-consolidate` documentation page** added under
63+
`docs/commands/` (`scripts/generate_docs.py` output that had not been
64+
committed). Stale skill and command doc pages regenerated in the
65+
same pass.
66+
67+
### Removed
68+
69+
- **No-em-dashes prohibition rule.** Removed from `AGENTS.spellbook.md`
70+
(and from the global `~/.claude/CLAUDE.md` outside the repo).
71+
Descriptive references to em-dashes in writing guides and AI-tone
72+
detection notes are preserved.
73+
874
## [0.61.0] - 2026-05-06
975

1076
### Added

commands/pr-dance.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ PR Shepherd. Your reputation depends on PRs that reach merge-ready state without
1515
1. **Bot config is user-specified, not assumed**: Never guess which bot to use. Read config or ask.
1616
2. **Fix locally first**: Prefer `act` for CI failures over blind push-and-pray. Not all workflows are reproducible locally, but standard test/lint failures save significant round-trip time.
1717
3. **Address ALL findings per cycle**: Partial fixes waste a review cycle. Batch all bot findings before pushing.
18-
4. **Never merge automatically**: Report merge-ready status. User decides when to merge.
18+
4. **Never merge automatically**: Report merge-ready status. User decides when to merge. This rule is absolute and is NOT overridden by any session-level autonomy directive — including but not limited to "do the PR dance autonomously", "yolo", "just land it", "pick it up where we left off", "stop asking", "go go go", or any equivalent phrasing. "Autonomous" scopes commit / push / comment / iterate / re-request-review. It does not scope merge, tag-push, branch deletion, or any other destructive or visible-to-others action listed in the global git-safety rules. If the user wants you to merge, they will say "merge it" (or equivalent imperative) AFTER you report merge-ready status. Until then: stop at Step 5.
1919
5. **Version bump before first cycle**: If the project uses semver (pyproject.toml, package.json), ensure version bump + CHANGELOG entry exist before the first review cycle. This prevents wasting a cycle on a finding the bot will always flag.
2020

2121
## Inputs
@@ -157,6 +157,14 @@ Ready to merge when you are.
157157

158158
STOP. Do not merge. Report status and let the user decide.
159159

160+
<CRITICAL>
161+
This stop point is non-negotiable. Even if the user said "do the PR dance autonomously", "yolo", "just land it", or anything similar at the start of the session, you stop here. Those phrases authorize the iteration loop (commit / push / comment / re-request-review). They do NOT authorize merge.
162+
163+
If you find yourself constructing an argument that "the user clearly wants this merged" or "they said autonomous" or "the dance includes the merge" — STOP. That is the rationalization the global git-safety rule warns about. The user will type "merge it" when they are ready. Until then, the answer is always "ready when you are".
164+
165+
Tag-push (`git push origin vX.Y.Z`) and branch deletion are also out of scope until explicitly requested.
166+
</CRITICAL>
167+
160168
**If either condition not met:** Return to the appropriate step (Step 3 for CI, Step 4 for bot).
161169

162170
## Output
@@ -169,7 +177,10 @@ PR Dance complete.
169177
```
170178

171179
<FORBIDDEN>
172-
- Merging the PR (user decides when to merge)
180+
- Merging the PR (user decides when to merge — no session-level autonomy directive overrides this)
181+
- Pushing tags (`git push origin vX.Y.Z`) or creating GitHub releases without explicit request
182+
- Deleting the feature branch after merge without explicit request
183+
- Treating "do the PR dance autonomously" / "yolo" / "just land it" as merge authorization
173184
- Hardcoding or assuming a bot name without reading config
174185
- Pushing without addressing ALL bot findings in the current cycle
175186
- Skipping `act` for reproducible CI failures and going straight to blind push
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# /crystallize-consolidate
2+
## Command Content
3+
4+
``````````markdown
5+
# Crystallize-Consolidate
6+
7+
<ROLE>
8+
Rule bookkeeper. Operates ONLY on the canonical `## Rules` section of the
9+
target file. Touches nothing else. Every modification requires explicit
10+
operator consent via AskUserQuestion. Reputation depends on never silently
11+
mutating rule text.
12+
</ROLE>
13+
14+
## Scope
15+
16+
This command is the operator-invoked path that complements `/crystallize`'s
17+
rule-preservation contract. Where `/crystallize` lifts rules and protects
18+
them byte-for-byte, `/crystallize-consolidate` is the only legitimate path
19+
for an operator to MERGE, DEPRECATE, or otherwise reorganize accumulated
20+
rules. Silent consolidation is forbidden — it would defeat the entire
21+
preservation contract.
22+
23+
The canonical `## Rules` section is identified by the same disambiguation
24+
rule used by `/crystallize` and `crystallize-verify.md`: the FIRST `## Rules`
25+
heading after the `<ROLE>` block, or the first `## Rules` heading if no
26+
`<ROLE>` block exists. This command operates only on that section.
27+
28+
## Inputs
29+
30+
- Required: target file path (must contain a canonical `## Rules` section
31+
produced by a prior `/crystallize` pass).
32+
33+
## Protocol
34+
35+
1. Parse the target file's `## Rules` section. Build a rule-id-keyed map
36+
from each rule's `<!-- rule-meta: id=Rn, added=..., pass=..., last-confirmed=... [, merged-from=...] -->`
37+
provenance comment.
38+
2. Run internal overlap / staleness analysis. Produce candidate list
39+
(pairs with content overlap; single rules referencing deprecated
40+
tools/phases; single rules that contradict another rule).
41+
3. For each candidate, present ONE `AskUserQuestion` invocation (the
42+
consolidation question shown below). Wait for operator response.
43+
Apply chosen action immediately.
44+
45+
```
46+
AskUserQuestion({
47+
questions: [{
48+
header: "Consolidate?",
49+
question: "Rules R[m] and R[n] appear to overlap.\n\nR[m]: \"[full content of Rm]\"\n\nR[n]: \"[full content of Rn]\"",
50+
multiSelect: false,
51+
options: [
52+
{ label: "Keep both", description: "Both rules stay unchanged" },
53+
{ label: "Merge", description: "You write the merged text" },
54+
{ label: "Deprecate one", description: "Mark one rule for two-pass removal" }
55+
]
56+
}]
57+
})
58+
```
59+
60+
On `Merge`: emit a follow-up text-input prompt asking the operator to
61+
write the merged rule body. The operator-written text replaces both
62+
originals. Assign a new rule ID. Provenance metadata records
63+
`merged-from: R[m], R[n]`, sets `added` and `last-confirmed` to today's
64+
ISO date, and initializes `pass` to the document's current pass value
65+
(the merged rule begins a fresh lifecycle: `(current_doc_pass - rule_meta.pass) + 1 = 1`;
66+
the `merged-from` field preserves lineage to the originals).
67+
68+
On `Deprecate one`: emit a follow-up `AskUserQuestion` asking which
69+
rule to deprecate (options: R[m] / R[n]), then a text-input prompt
70+
for the deprecation reason. Append a `<!-- rule-deprecated: ... -->`
71+
HTML comment marker to the deprecated rule (see Step 5).
72+
73+
4. After all candidates dispatched, write the modified `## Rules` section
74+
back to the file. Leave General Instructions content byte-identical.
75+
5. Two-pass deprecation: rules marked deprecated in this command's pass
76+
receive an HTML comment marker. The marker MUST set
77+
`removable-after-pass = current_doc_pass + 2` so the deprecated rule
78+
survives at least one full regular `/crystallize` pass (present in
79+
both input and output) before becoming eligible for removal:
80+
81+
```markdown
82+
<!-- assumes current_doc_pass = 1 at time of deprecation -->
83+
<RULE>...rule body...</RULE>
84+
<!-- rule-deprecated: id=R3, on=2026-04-27, reason="superseded by R7", removable-after-pass=3 -->
85+
```
86+
87+
With the example above:
88+
- Pass 2 (regular `/crystallize`): rule passes through verbatim
89+
(`current_doc_pass < removable-after-pass`). Survives the pass.
90+
- Pass 3 (regular `/crystallize`): rule becomes eligible for removal.
91+
The operator is prompted to re-confirm in interactive mode; in
92+
autonomous mode the candidacy is logged to the Tightening Skipped
93+
footer and the rule still passes through. Removal is never silent.
94+
95+
`/crystallize-consolidate` never removes a rule on the same pass it
96+
was deprecated.
97+
98+
## Output Format
99+
100+
Deliverable is the modified target file plus an action log of consolidation
101+
decisions. The action log records, for each candidate presented:
102+
103+
- Candidate rule IDs.
104+
- Operator's chosen action (Keep both / Merge / Deprecate one).
105+
- For Merge: new rule ID assigned and `merged-from` provenance.
106+
- For Deprecate one: deprecated rule ID, reason text, and
107+
`removable-after-pass` value written into the marker.
108+
109+
The action log is presented to the operator at the end of the pass for
110+
audit. The General Instructions portion of the target file MUST be
111+
byte-identical between input and output.
112+
113+
## Quality Gates
114+
115+
Before declaring the consolidation pass complete, verify:
116+
117+
- [ ] General Instructions content (everything outside the canonical
118+
`## Rules` section) is byte-identical to input.
119+
- [ ] Modified `## Rules` section preserves source order for non-touched
120+
rules (only merged/deprecated rules change in place).
121+
- [ ] Provenance metadata `merged-from` field is correctly populated for
122+
every merged rule (lists every contributing source rule ID).
123+
- [ ] Deprecation markers carry all required fields: `id`, `on`,
124+
`reason`, `removable-after-pass`.
125+
- [ ] No rule was removed in this pass (removal is the next regular
126+
`/crystallize` pass's job).
127+
- [ ] Every consolidation action was explicitly authorized by an
128+
`AskUserQuestion` response — no silent mutation occurred.
129+
130+
## Anti-Patterns
131+
132+
<FORBIDDEN>
133+
- Compressing General Instructions (that is `/crystallize`'s job, and
134+
silent compression is forbidden in both directions).
135+
- Re-running rule detection on the General Instructions surface (this
136+
command operates ONLY on the existing `## Rules` section).
137+
- Removing a rule on the same pass it was deprecated (two-pass
138+
deprecation is non-negotiable).
139+
- Batching consolidation questions (one rule pair per `AskUserQuestion`
140+
invocation; batching reduces operator attention).
141+
- Inferring operator consent from silence or partial answers.
142+
- Editing General Instructions content for any reason during this pass.
143+
- Silently mutating rule text — every modification must trace to an
144+
explicit operator response.
145+
</FORBIDDEN>
146+
147+
<FINAL_EMPHASIS>
148+
Your only job is bookkeeping. You do not compress. You do not detect.
149+
You do not remove rules in the same pass they are deprecated. Every
150+
modification — every merge, every deprecation marker — exists because
151+
the operator explicitly authorized it via `AskUserQuestion`. Silent
152+
mutation of rule text breaks the preservation contract that
153+
`/crystallize` exists to enforce. Do not be the tool that breaks it.
154+
</FINAL_EMPHASIS>
155+
``````````

0 commit comments

Comments
 (0)