|
| 1 | +# block/goose #8842 — feat: lifecycle hooks system |
| 2 | + |
| 3 | +- **Repo**: block/goose |
| 4 | +- **PR**: #8842 |
| 5 | +- **Author**: michaelneale (Michael Neale) |
| 6 | +- **Head SHA**: 83c0dcfebf305c8843e4263c716aa0475a5ef401 |
| 7 | +- **Base**: main |
| 8 | +- **Size**: +550 / −10 — bulk in new `crates/goose/src/hooks.rs` |
| 9 | + (+~444), with 6 emit sites added across `crates/goose/src/agents/agent.rs` |
| 10 | + (+~80) and a sprinkle of glue in the agent constructor and reply loop. |
| 11 | + |
| 12 | +## What it changes |
| 13 | + |
| 14 | +Adds a config-driven lifecycle hooks subsystem with 6 events: |
| 15 | +`before_tool_call`, `after_tool_call`, `on_session_start`, |
| 16 | +`on_session_end`, `before_reply`, `after_reply` |
| 17 | +(`crates/goose/src/hooks.rs:14-25`). Each `HookEntry` pairs an |
| 18 | +optional `matcher` regex (matched against tool name) with one or |
| 19 | +more `HookHandler`s, each of which is either a shell `command` |
| 20 | +that receives the JSON `HookContext` on stdin, or an HTTP `url` |
| 21 | +that receives it as a POST body, with a `timeout` (default 10s). |
| 22 | + |
| 23 | +Hooks are loaded from `goose config.yaml` via |
| 24 | +`HookManager::from_config()` (`hooks.rs:107-116`). The agent |
| 25 | +constructor (`agent.rs:251-255`) instantiates a manager, and three |
| 26 | +emit sites are wired: |
| 27 | + |
| 28 | +- `before_tool_call` (`agent.rs:594-624`): synchronously gates the |
| 29 | + tool dispatch — a hook returning `block: true` short-circuits |
| 30 | + with `INVALID_REQUEST` and the supplied reason. |
| 31 | +- `after_tool_call` (`agent.rs:692-708`): fire-and-forget, no |
| 32 | + blocking semantics. |
| 33 | +- `before_reply` (`agent.rs:1118-1135`): emitted at the top of the |
| 34 | + reply loop with the user message text in `HookContext.message`. |
| 35 | + |
| 36 | +## Strengths |
| 37 | + |
| 38 | +- Event enum is exhaustive and `serde(rename_all = "snake_case")` |
| 39 | + (`hooks.rs:16`) so the YAML keys are predictable and unsurprising. |
| 40 | +- The `HookDecision` short-circuit at `hooks.rs:218-221` returns |
| 41 | + on the *first* `block: true` hook — important so that an |
| 42 | + early-deny rule can't be silently overridden by a later |
| 43 | + permissive rule. Matches the principle-of-least-surprise for an |
| 44 | + authorization layer. |
| 45 | +- `has_hooks()` gate at `agent.rs:594, 694, 1119` avoids the |
| 46 | + serialization cost of building `HookContext` when no hooks are |
| 47 | + registered for that event. With the hash-map check at |
| 48 | + `hooks.rs:163-167` returning `false` on empty/missing entries, |
| 49 | + the no-hooks code path is essentially free. |
| 50 | +- Matcher regex compilation failure is logged and the hook is |
| 51 | + *skipped*, not panicked (`hooks.rs:194-200`). Right call — |
| 52 | + one bad regex shouldn't take down all the other hooks for that |
| 53 | + event. |
| 54 | +- Timeout default of 10s (`hooks.rs:217`) is sane for a synchronous |
| 55 | + `before_tool_call` gate; long enough for a network-bound policy |
| 56 | + check, short enough that a stuck script can't hang the agent. |
| 57 | + |
| 58 | +## Concerns / asks |
| 59 | + |
| 60 | +- **`after_tool_call` is documented as "fire-and-forget" but |
| 61 | + `agent.rs:707-708` actually `await`s it.** That means a slow |
| 62 | + after-hook still blocks the reply loop just like a before-hook. |
| 63 | + Either: |
| 64 | + 1. Spawn it via `tokio::spawn` so the loop returns immediately |
| 65 | + (matching the docstring), or |
| 66 | + 2. Update the docstring and the comment at `:693` to say |
| 67 | + "synchronously emitted; respects timeout". |
| 68 | + As-is, the contract and the implementation disagree, and the |
| 69 | + consequence is that a misconfigured after-hook with a 30s |
| 70 | + timeout will pause the chat between every tool call. |
| 71 | +- `tool_result: None` at every emit site (`agent.rs:601, 700, |
| 72 | + 1126`) — the `HookContext` *has* a `tool_result` field but |
| 73 | + no caller populates it. For `after_tool_call` this is the most |
| 74 | + valuable field a hook would want (e.g. "log the result of every |
| 75 | + shell command", "block on PII in tool output"). The result is |
| 76 | + available right there at `agent.rs:692` (just after the |
| 77 | + `WAITING_TOOL_END` debug log). Wire it through. |
| 78 | +- The `before_tool_call` block at `agent.rs:614-622` returns |
| 79 | + `INVALID_REQUEST` with the reason in the error message. That |
| 80 | + means the *model* sees the block reason in its tool-call result, |
| 81 | + which is the right behavior for a guardrail — but the reason is |
| 82 | + also the only thing logged. For audit purposes (a hook denied a |
| 83 | + tool call), worth emitting a structured tracing event with the |
| 84 | + `tool_name`, `session_id`, and `reason` fields rather than just |
| 85 | + the existing `tracing::info!` line at `:611`. |
| 86 | +- `Config::global()` at `hooks.rs:121` is a process-wide singleton. |
| 87 | + For a long-running `goose serve` whose config file gets edited |
| 88 | + at runtime, the hooks won't reload — `HookManager::from_config()` |
| 89 | + is only called once in the agent constructor. Worth either |
| 90 | + documenting "edit config, restart goose serve" or wiring a |
| 91 | + config-watcher reload path. |
| 92 | +- Two handler types (`command` shell + `url` HTTP) but |
| 93 | + `HookHandler` (`hooks.rs:53-60`) makes both `Option<String>` — |
| 94 | + nothing prevents a config from setting *both* `command` and |
| 95 | + `url`, or *neither*. Convert to a tagged enum or at least |
| 96 | + validate at load time and surface a config error, otherwise |
| 97 | + ambiguous configs fail silently or behave inconsistently. |
| 98 | +- The `matcher` regex at `hooks.rs:189-199` is recompiled on |
| 99 | + every emit. For a high-frequency tool-call workload, that's |
| 100 | + measurable. Compile once at config-load time and store the |
| 101 | + `Regex` in `HookEntry`. |
| 102 | +- No tests in this PR. For a feature that: |
| 103 | + - executes user-supplied shell commands, |
| 104 | + - can block tool dispatch, |
| 105 | + - has a non-obvious "first block wins" precedence rule, |
| 106 | + - has a regex-matching subsystem with skip-on-error semantics, |
| 107 | + the absence of any unit tests is a real gap. At minimum: |
| 108 | + - block-on-first-deny precedence, |
| 109 | + - regex-skip-on-invalid behavior, |
| 110 | + - timeout enforcement, |
| 111 | + - per-event has_hooks gating. |
| 112 | + |
| 113 | +## Verdict |
| 114 | + |
| 115 | +**request-changes** — the design and the integration points are |
| 116 | +sound, but four concrete issues need addressing before merge: |
| 117 | +(1) the `after_tool_call` docstring/impl disagreement, |
| 118 | +(2) `tool_result` never populated, |
| 119 | +(3) `command`/`url` mutual-exclusivity not enforced, |
| 120 | +(4) zero tests for a security-sensitive subsystem. |
| 121 | + |
| 122 | +The matcher recompilation, structured audit logging, and |
| 123 | +config-watcher reload can be follow-ups, but tests for the |
| 124 | +deny-precedence path and the regex-skip behavior should land |
| 125 | +with this PR — those are the cases an operator would assume |
| 126 | +Just Work, and they're cheap to cover. |
| 127 | + |
| 128 | +## What I learned |
| 129 | + |
| 130 | +A "lifecycle hooks" feature looks like a UX feature but is |
| 131 | +actually a security/policy feature: any hook with `block: true` |
| 132 | +is implicitly an authorization layer. That changes the bar for |
| 133 | +review — config malformation, error handling on hook failure, |
| 134 | +audit logging, and ordering semantics all become |
| 135 | +correctness-critical, not nice-to-have. The same pattern recurs |
| 136 | +in MCP servers, agent permissions, and webhook-style policy |
| 137 | +plugins; the temptation to "just call user code at lifecycle |
| 138 | +points" almost always evolves into a guardrail/audit subsystem, |
| 139 | +and it's cheaper to design for that on day one. |
0 commit comments