feat(gates): opt-in to re-enable BASH-PARSER-COMPOUND deny by elijahr · Pull Request #288 · axiomantic/spellbook

elijahr · 2026-05-08T16:06:57Z

What does this PR do?

Adds an opt-in to re-enable BASH-PARSER-COMPOUND deny findings, restoring the pre-0.63.2 policy for operators with stricter threat models. Default behavior is unchanged: with neither opt-in active, the bash gate still allows benign compound commands like ls | head and wc -l file && ls.

Two opt-in paths:

Environment variable SPELLBOOK_BASH_DENY_COMPOUND=1 (truthy values: 1, true, yes, case-insensitive, whitespace-tolerant).
Passing security_mode="paranoid" to spellbook.gates.bash_parser.parse_and_check (call-site control).

Either path re-emits BASH-PARSER-COMPOUND for list/pipeline nodes AND for if/for/while/until/case/function control-flow constructs.

Implementation notes

New _compound_deny_enabled(security_mode) helper centralizes the truthy-env-var parse and the paranoid-mode check.
security_mode is threaded through private _walk, _classify_node, and _classify_compound. Public parse_and_check signature is unchanged.
function added to the control-flow kind set in _classify_node for symmetry with the other control-flow kinds.
CHANGELOG [Unreleased] section documents the opt-in; the 0.63.2 entry gets a back-reference framing the prior change as a deliberate policy relaxation.
Version is intentionally not bumped on this branch; the [Unreleased] section will be promoted in the next chore(release): commit.

Tests

7 new tests in TestCompoundDenyOptIn cover env-var truthy values, case/whitespace tolerance, paranoid mode, control-flow node coverage, and default-off behavior.
New autouse fixture _scrub_compound_deny_env prevents env-var pollution from breaking unrelated tests.
Full suite: 132 passed in tests/gates/test_bash_parser.py (verified both with and without SPELLBOOK_BASH_DENY_COMPOUND=1 set in the parent process).

Related issue

None.

Checklist

Tests pass locally
Documentation updated (if applicable)

The PreToolUse hook collapsed T2 (TIER-ASK) findings into the same exit-2 deny path as T3 (TIER-DENY), so commands like git push and gh pr merge got silently blocked instead of surfacing Claude Code's yellow permission prompt. check_tool_input now also returns a verdict field with values allow, ask, or deny. This is a non-breaking superset: the existing safe boolean is unchanged for the 7 callers that read it. _gate_bash, _gate_spawn, and _gate_state_sanitize short-circuit on verdict == ask to emit hookSpecificOutput.permissionDecision = ask and exit 0, letting the harness render its native permission prompt. Mixed TIER-ASK plus non-ask findings still resolve to deny (deny-wins invariant). 12 new tests cover pure-T2, pure-T3, mixed, and safe input cases plus the deny-wins invariant.

The 0.63.2 compound-allow change deliberately widened the bash gate's allowlist so the L4 parser stopped emitting a CRITICAL deny on every pipe, double-ampersand, double-pipe, semicolon, and control-flow construct. Operators with stricter threat models can now restore the pre-0.63.2 policy via either of two opt-ins: - Environment variable SPELLBOOK_BASH_DENY_COMPOUND=1 (truthy values: 1, true, yes, case-insensitive, whitespace-tolerant). - Passing security_mode="paranoid" to spellbook.gates.bash_parser.parse_and_check (call-site control). Either path re-emits BASH-PARSER-COMPOUND for list/pipeline nodes AND for if/for/while/until/case/function control-flow constructs. With neither opt-in active, default behavior is unchanged from 0.63.2. Implementation: - New _compound_deny_enabled(security_mode) helper centralizes the truthy-env-var parse and the paranoid-mode check. - security_mode is threaded through private _walk, _classify_node, and _classify_compound. Public parse_and_check signature unchanged. - function added to the control-flow kind set in _classify_node for symmetry with the other control-flow kinds. - CHANGELOG Unreleased section documents the opt-in; the 0.63.2 entry gets a back-reference framing the prior change as a deliberate policy relaxation. Tests: - 7 new tests in TestCompoundDenyOptIn cover env-var truthy values, case/whitespace tolerance, paranoid mode, control-flow node coverage, and default-off behavior. - New autouse fixture _scrub_compound_deny_env prevents env-var pollution from breaking unrelated tests. - Full suite: 132 passed (with and without the env var set in the parent process).

axiomantic-momus · 2026-05-08T16:07:39Z

❌ Momus review failed — phase phase2 failed: provider error: 402 This request requires more credits, or fewer max_tokens. You requested up to 8192 tokens, but can only afford 5957. To increase, visit https://openrouter.ai/settings/keys and create a key with a h...

run log

gemini-code-assist

Code Review

This pull request introduces an opt-in mechanism to re-enable BASH-PARSER-COMPOUND deny findings for Bash control-flow constructs and pipelines, controllable via the SPELLBOOK_BASH_DENY_COMPOUND environment variable or a security_mode="paranoid" parameter. Feedback correctly identifies that the current default security_mode="paranoid" in parse_and_check inadvertently enables these denials by default, contradicting the PR's stated intent. Additionally, improvements were suggested for the operator extraction logic in _classify_compound to correctly handle pipe nodes and deduplicate operators in finding messages. Finally, the PR is missing a required version bump in the .version file, which is a high-severity violation of the repository style guide.

gemini-code-assist · 2026-05-08T16:11:30Z

    findings: list[dict] = []
    for tree in trees:
-        findings.extend(_walk(tree, command))
+        findings.extend(_walk(tree, command, security_mode))


The parse_and_check function (at line 362) defaults security_mode to "paranoid". Since _compound_deny_enabled now returns True for "paranoid", this change effectively enables compound-command denial by default for direct callers of this function, which contradicts the PR description's claim that default behavior is unchanged. Additionally, the docstring at line 367 is now inaccurate as it states the modes are treated equally. Consider changing the default to "standard" and updating the docstring.

gemini-code-assist · 2026-05-08T16:11:30Z

+    operators = [
+        getattr(p, "op", None)
+        for p in parts
+        if getattr(p, "kind", None) == "operator"
+    ]
+    op_text = ", ".join(op for op in operators if op) or "|"


The current implementation only extracts operators from operator nodes, which means pipeline nodes (using pipe nodes in bashlex) always fall back to the default |. Additionally, the finding message can contain redundant operators if the same one is used multiple times (e.g., Compound command (&&, &&) is not allowed). Deduplicating while preserving order and explicitly handling pipe nodes would be more robust.

parts = getattr(node, "parts", ()) or () operators = [ getattr(p, "op", "|") if getattr(p, "kind", None) == "pipe" else getattr(p, "op", None) for p in parts if getattr(p, "kind", None) in {"operator", "pipe"} ] op_text = ", ".join(dict.fromkeys(op for op in operators if op)) or "|"

Remove redundant str() wraps on rule_id values; rule_ids are always string literals constructed inside the gate code. Aligns with the existing pattern elsewhere in the codebase (e.g. tests use f["rule_id"].startswith(...) directly). - hooks/spellbook_hook.py:363 (_emit_ask_and_exit TIER-ASK filter) - spellbook/gates/check.py:250 (_compute_verdict TIER-ASK check) Skipped the suggestion to remove the `if non_low` guard in _compute_verdict: the guard preserves correct behavior when the function is called directly with safe=False but an empty findings list (e.g. defensive callers, future tests). The redundancy gemini identified is real but the defensive check is cheap and intentional.

…string - tests/unit/test_stint_hooks.py: read proc.stderr instead of proc.stdout for gate-error JSON (matches the stderr routing introduced in 324cab5 but missed by that commit) - hooks/spellbook_hook.py: docstring at _gate_bash referenced stdout where the implementation already routes to stderr

…llow default The tests imported by the fix-t2-ask-prompt merge assumed BASH-PARSER-COMPOUND fires as CRITICAL on `git push ... && ...` style compounds — true before 0.63.2, but compound deny is now opt-in. The tests' real intent is to exercise the verdict computation with a CRITICAL finding alongside a T2 ask finding, regardless of which layer produces the CRITICAL. The merge also added a `verdict` field to `check_tool_input()` results without updating strict-equality assertions in `test_hooks_windows.py`, breaking 5 tests there. Updated: - test_check.py::test_critical_bashlex_finding_is_deny: opt into `SPELLBOOK_BASH_DENY_COMPOUND=1` via monkeypatch so the bashlex layer can produce a CRITICAL alongside the T2 TIER-ASK. - test_hooks.py::test_pure_t3_still_exits_2: switch error-JSON read from `proc.stdout` to `proc.stderr` (post-324cab5b stderr routing). - test_hooks.py::test_mixed_t2_and_critical_exits_2: same stderr switch, plus opt into compound deny via the `_run_bash_gate` `env_overrides` parameter. - test_hooks_windows.py: extend 5 strict-equality assertions (test_safe_bash_command_is_allowed, test_safe_spawn_prompt_is_allowed, test_injection_prompt_is_blocked, test_safe_workflow_state_is_allowed, test_injected_workflow_state_is_blocked) to include the new `verdict` field with the expected `allow` / `deny` value.

# Conflicts: # CHANGELOG.md # tests/unit/test_stint_hooks.py

elijahr · 2026-05-08T16:43:40Z

@gemini-code-assist @axiomantic-momus please re-review.

New commits on this branch since the last pass (push c3325d0d..11a10c3f):

Merge of fix-t2-ask-prompt (PR fix(hooks): emit permissionDecision "ask" for T2 findings #285's branch — T2 ask-verdict + verdict field plumbing)
Gemini's PR fix(hooks): emit permissionDecision "ask" for T2 findings #285 suggestions applied here (str() removals on rule_id)
CI fixes: tests/unit/test_stint_hooks.py switched to proc.stderr (per Claude Code hook protocol); docstring on _gate_bash updated to match
Reconciled fix-t2-ask-prompt's verdict tests with the new compound-allow default introduced earlier on this branch (tests now opt into compound-deny via env var or use a non-compound CRITICAL trigger, depending on intent)
Merge of origin/main (resolved CHANGELOG stacking and the duplicate stint-hook stderr fix)

/ai-review

gemini-code-assist

Code Review

This pull request introduces an opt-in mechanism to re-enable BASH-PARSER-COMPOUND security findings using either the SPELLBOOK_BASH_DENY_COMPOUND environment variable or a "paranoid" security mode. It also implements a new "ask" verdict across the security gates, enabling interactive permission prompts for specific actions like git push. Feedback suggests refactoring the repetitive result-handling logic within the hook gate functions into a shared helper to improve maintainability and reduce redundancy.

gemini-code-assist · 2026-05-08T16:48:18Z

+    if result.get("verdict") == "ask":
+        _emit_ask_and_exit(result["findings"])


This new logic, combined with the following if not result["safe"] block, creates a pattern that is duplicated across _gate_bash, _gate_spawn, and _gate_state_sanitize. To improve maintainability and avoid redundancy, consider extracting the entire logical block into a shared helper function.

For example, you could create a helper like this:

def _handle_check_result(result: dict) -> None: """Process the result from check_tool_input, exiting if necessary.""" if result.get("verdict") == "ask": _emit_ask_and_exit(result["findings"]) if not result["safe"]: reasons = "; ".join(f["message"] for f in result["findings"]) print(json.dumps({"error": f"Security check failed: {reasons}"}), file=sys.stderr) sys.exit(2)

Then, _gate_bash and the other gate functions can be simplified to call this helper:

def _gate_bash(data: dict) -> None: # ... (try/except and tool_input check) result = check_tool_input("Bash", tool_input) _handle_check_result(result)

- HIGH: change parse_and_check default security_mode from "paranoid" to "standard" so direct callers get the same default behavior as the public check_tool_input API (compound allowed by default). Update docstring to reflect the new mode semantics. - MEDIUM: fix _classify_compound operator extraction to also inspect pipeline `pipe` nodes (previously pipelines always fell back to the literal "|" default) and deduplicate duplicate operators in the finding message. - MEDIUM: extract _handle_check_result helper from _gate_bash / _gate_spawn / _gate_state_sanitize to reduce duplication of the ask/deny result handling.

elijahr · 2026-05-08T17:06:14Z

@gemini-code-assist @axiomantic-momus please re-review.

Addressed all three findings from the previous gemini pass (commit f5bb726c):

HIGH parse_and_check default mode → changed from "paranoid" to "standard" so direct callers get the same default as the public check_tool_input API. Docstring updated.
MEDIUM _classify_compound operator extraction → now also inspects kind == "pipe" nodes (pipelines previously always defaulted to |) and dedupes via dict.fromkeys (no more (&&, &&)). Two new tests added.
MEDIUM Extracted _handle_check_result helper from _gate_bash / _gate_spawn / _gate_state_sanitize.

Local suite: 915 passing across tests/test_security/, tests/gates/test_bash_parser.py, tests/unit/test_stint_hooks.py.

/ai-review

gemini-code-assist

Code Review

This pull request introduces a 'verdict' system for security gates, allowing the harness to distinguish between allowing, denying, or asking for user permission (TIER-ASK). It refactors the bash parser to support an opt-in for compound command restrictions and updates the hook logic to handle these new states. A review comment suggests improving error message clarity by filtering out low-severity findings. Additionally, the PR is missing the mandatory version bump in the .version file as specified in the repository style guide.

gemini-code-assist · 2026-05-08T17:16:40Z

+    if result.get("verdict") == "ask":
+        _emit_ask_and_exit(result["findings"])
+    if not result["safe"]:
+        reasons = "; ".join(f["message"] for f in result["findings"])


The reasons string is constructed from the messages of all findings in the result, which can include LOW severity findings. Since a block is triggered only by non-LOW severity findings, including messages from LOW severity findings could add noise to the error message.

To make the error message more precise, consider filtering for non-LOW severity findings when building the reasons string.

Suggested change

reasons = "; ".join(f["message"] for f in result["findings"])

reasons = "; ".join(f["message"] for f in result["findings"] if f.get("severity") != "LOW")

Three real failure modes hit this session prompted explicit guidance: T2 ask gate that subagents cannot relay to the operator, bashlex parse errors on inline heredocs, and high-entropy detection on command-line payloads. Documents the layered pipeline, common block messages with mapped responses, the subagent escape hatch (surface to orchestrator, do not retry), and the heredoc-to-file workaround pattern. The content lands in AGENTS.spellbook.md so it propagates into every user's global CLAUDE.md, making the guidance authoritative for all spellbook-augmented harnesses (Claude Code, OpenCode, Codex, Gemini CLI, ForgeCode).

The _handle_check_result helper built the reasons string from all findings, but a block is only triggered by non-LOW severities. LOW findings (e.g., advisory rule_ids that should never block) added noise to the user-facing error message. Filter them out. Addresses gemini-code-assist review on PR #288.

elijahr · 2026-05-08T17:25:52Z

@gemini-code-assist @axiomantic-momus please re-review.

Addressed the MEDIUM finding from your last pass on 593698cb (commit 14218241):

MEDIUM _handle_check_result was joining ALL finding messages into the gate's error reason, including LOW-severity ones that should not surface as errors. Now filtered to severity != LOW (matches the existing safe flag's "above LOW" logic in check.py).

Note on the version-bump question raised in the last review: deferring that decision to the merge step — happy to bump if you (or the operator) consider it a hard block on this PR.

Local suite: 781 passing across tests/test_security/, tests/gates/test_bash_parser.py, tests/unit/test_stint_hooks.py.

/ai-review

gemini-code-assist · 2026-05-08T17:26:30Z

Thanks for the update, @elijahr. I've reviewed the changes in commit 14218241 and the updated test suite. The logic in _handle_check_result now correctly filters out LOW severity findings, ensuring that only relevant security issues surface as gate errors. This aligns with the existing behavior in check.py and addresses the previous concern. The implementation looks solid and the test coverage is comprehensive. I have no further findings on this PR.

elijahr added 2 commits May 7, 2026 16:48

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

elijahr added 5 commits May 8, 2026 11:25

merge: fix-t2-ask-prompt — T2 ask-verdict + verdict field plumbing

1b794f0

merge: main

11a10c3

# Conflicts: # CHANGELOG.md # tests/unit/test_stint_hooks.py

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

elijahr added 2 commits May 8, 2026 12:18

elijahr merged commit bb40e5d into main May 8, 2026
8 checks passed

elijahr deleted the gates/compound-deny-opt-in branch May 8, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gates): opt-in to re-enable BASH-PARSER-COMPOUND deny#288

feat(gates): opt-in to re-enable BASH-PARSER-COMPOUND deny#288
elijahr merged 10 commits into
mainfrom
gates/compound-deny-opt-in

elijahr commented May 8, 2026

Uh oh!

axiomantic-momus Bot commented May 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

elijahr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

elijahr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

elijahr commented May 8, 2026

Uh oh!

gemini-code-assist Bot commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if result.get("verdict") == "ask":
		_emit_ask_and_exit(result["findings"])

	reasons = "; ".join(f["message"] for f in result["findings"])
	reasons = "; ".join(f["message"] for f in result["findings"] if f.get("severity") != "LOW")

Conversation

elijahr commented May 8, 2026

What does this PR do?

Implementation notes

Tests

Related issue

Checklist

Uh oh!

axiomantic-momus Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

elijahr commented May 8, 2026

Uh oh!

gemini-code-assist Bot commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

axiomantic-momus Bot commented May 8, 2026 •

edited

Loading