Skip to content

feat: add agent lifecycle hooks (Claude Code-compatible config)#7411

Open
tlongwell-block wants to merge 13 commits intomainfrom
hooks/claude-code-compatible
Open

feat: add agent lifecycle hooks (Claude Code-compatible config)#7411
tlongwell-block wants to merge 13 commits intomainfrom
hooks/claude-code-compatible

Conversation

@tlongwell-block
Copy link
Copy Markdown
Collaborator

@tlongwell-block tlongwell-block commented Feb 21, 2026

Description

Adds a hooks system for agent lifecycle events that executes user-configured shell
commands or MCP tools at key points in the agent loop. Hooks can block actions, inject
context into the conversation, and run arbitrary scripts with the full invocation
context piped to stdin as JSON.

The motivating use case: PostCompact hooks that re-inject project context after
auto-compaction, so long sessions don't lose critical knowledge when history is
summarized. Also: ContextFill hooks that inject context at configurable fill
percentages before compaction is needed.

Architecture

  • Direct subprocess execution for command hooks — spawns processes via
    tokio::process::Command with native exit codes, stdin piping, and separate
    stdout/stderr capture. No MCP intermediary.
  • MCP tool dispatch for mcp_tool hooks — routes through ExtensionManager for
    calling any registered MCP tool with static arguments.
  • Deadlock-safe I/O — stdout and stderr drain concurrently via spawned tasks,
    both starting before stdin write. Secondary 5s timeout on drain collection prevents
    grandchild FD inheritance hangs.
  • Bounded output capture — stdout capped at 32KB, stderr at 4KB during read
    (not just at parse time). Prevents OOM from malicious/buggy hooks. JSON-parsed
    HookResult fields are also capped post-parse on both command and MCP paths.
  • Process group isolation (unix) — hooks run in their own process group so
    Ctrl+C doesn't kill them. On timeout/cancellation, the entire process group
    is killed via libc::kill(-pgid, SIGKILL), preventing orphaned grandchildren.
  • Fail-open — hook errors (timeouts, parse failures, spawn failures) log warnings
    and continue. A misconfigured hook never crashes a session.

Configuration

Hooks are configured under a top-level hooks key in:

  • Global: ~/.config/goose/hooks.json
  • Project: .goose/settings.json (preferred) or .claude/settings.json

Project hooks require "allow_project_hooks": true in the global config (security).

{
  "hooks": {
    "PostCompact": [
      {
        "hooks": [{"type": "command", "command": "cat PROJECT_CONTEXT.md"}]
      }
    ],
    "ContextFill": [
      {
        "matcher": "70",
        "hooks": [{"type": "command", "command": "cat ~/.goose/context-refresh.md"}]
      }
    ],
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [{"type": "command", "command": "./lint-check.sh"}]
      }
    ]
  }
}

Exit code contract (Claude Code compatible)

  • Exit 0 — success. Stdout parsed as JSON HookResult, or treated as plain text
    additionalContext (capped at 32KB, UTF-8 safe).
  • Exit 2 — block (on blockable events: PreToolUse, UserPromptSubmit). Stderr
    is fed back as the block reason (capped at 4KB). Stdout is ignored.
  • Other exit — fail-open. Warning logged, execution continues.

Matchers (regex, Claude Code compatible)

  • Bash or Bash(regex) — Claude Code-style; maps to developer__shell/shell
    with optional regex matching against the command string.
  • Regex patternsEdit|Write, mcp__memory__.*, Notebook.* — full regex
    anchored with ^(?:pattern)$. Patterns already fully anchored (^...$) are used as-is.
  • "auto" / "manual" — for PreCompact/PostCompact, matches the compaction trigger.
  • Threshold percentage — for ContextFill, e.g., "70" fires at 70% context fill.

Events wired (10 of 17)

Event Blocks Context Notes
SessionStart no yes Fires once per session (first user message)
UserPromptSubmit yes yes If blocked, returns hook's stderr as message
PreToolUse yes no Fires for pre-approved and approval-required tools
PostToolUse no yes Fires after successful tool call
PostToolUseFailure no yes Fires after failed tool call
PreCompact no no Fires before compaction (both auto and /compact)
PostCompact no yes Re-inject project context after compaction
Stop no no Fires before reply ends
SessionEnd no no Fires after Stop for cleanup/logging
ContextFill no yes Goose extension: fires once per threshold crossing per session

Remaining events defined but not fired: SubagentStart, SubagentStop, Notification,
PermissionRequest, TeammateIdle, TaskCompleted, ConfigChange. These depend on goose
features that don't have natural call sites yet.

Tests

15 unit tests:

  • 9 pure-logic tests for parse_command_output — every exit code path, JSON/non-JSON,
    truncation, blockable/non-blockable interaction
  • 2 real-subprocess tests — stdin piping via jq, exit-2 stderr capture
  • 4 regex matcher tests — alternation, wildcard, exact match, invalid pattern

18 live integration tests (run against real Anthropic API):

  • 10 basic: SessionStart context injection, UserPromptSubmit fire + block, PreToolUse
    fire + block (rm -rf), PostToolUse fire, regex matcher, Stop, SessionEnd, execution
    order, fail-open on broken hook
  • 8 advanced: multi-tool hooks, JSON-returning hooks, no-config passthrough, timeout
    graceful degradation, stdin JSON field validation, tool_name/tool_input presence,
    Claude Code Bash matcher

Files changed

File Lines What
hooks/mod.rs 671 NEW: execution engine, parsing, regex matching, ContextFill
hooks/types.rs 337 NEW: event types, config structs, HooksOutcome, HookResult
hooks/shell.rs 269 NEW: direct subprocess execution, bounded capture, process group kill
hooks/config.rs 199 NEW: config loading/merging (global + project)
agent.rs +348 12 hook call sites: all 10 events + ContextFill state on Agent
execute_commands.rs +40 PreCompact/PostCompact hooks for /compact command
tool_execution.rs +37 PreToolUse hook in approval flow (continue on block)
developer/shell.rs +7 pub(crate) for build_shell_command and user_login_path
Cargo.toml +3 Added regex, libc dependencies
lib.rs +1 pub mod hooks

Review notes

Crossfire reviewed across 2 rounds with 4 models (GPT-5, Claude Sonnet, Codex, Gemini).

Round 1 found 10 issues (1 critical, 5 major, 4 minor). All fixed:

  • Critical: ContextFill threshold state now persists via Arc<Mutex<HashSet>> on Agent
  • Major: breakcontinue in approval loop, bounded shell reads, process group kill,
    JSON-parsed field caps on both command and MCP paths
  • Minor: regex anchoring (||&&), can_block() trimmed, HookResult field docs

Round 2 (verification): Codex 10/10 APPROVE (all 8 verified), GPT-5 and Sonnet
found 2 remaining issues (MCP path caps, drain comment) — both fixed.

Final state: all 15 unit tests pass, all 18 live integration tests pass.

@tlongwell-block tlongwell-block force-pushed the hooks/claude-code-compatible branch 4 times, most recently from 78bd7d1 to b9ec8bd Compare February 21, 2026 20:38
Hooks fire at key agent lifecycle events and execute user-configured
actions via MCP tools. Supports blocking (prompt submit, tool use)
and context injection (PostCompact).

- 8 of 16 events wired: SessionStart, UserPromptSubmit, PreToolUse,
  PostToolUse, PostToolUseFailure, PreCompact, PostCompact, Stop
- Two action types: command (via developer__shell) and mcp_tool
- Config reads .goose/settings.json, .claude/settings.json, or
  ~/.config/goose/hooks.json with forward-compatible parsing
- Fail-open: hook errors never crash the agent
@tlongwell-block tlongwell-block force-pushed the hooks/claude-code-compatible branch from b9ec8bd to 8a2bbf7 Compare February 21, 2026 20:54
Addresses issues identified in multi-round crossfire review:

- Remove Option<Hooks> wrapping; Hooks::load() returns empty default on
  error, eliminating 12 if-let-Some guards across agent integration
- Extract run_post_compact_hook() helper to deduplicate 3 identical
  PostCompact context-injection blocks
- Fix SessionStart to fire once per session (first user message) instead
  of on every reply
- Thread parent CancellationToken through all hooks.run() calls so
  session cancellation properly cancels running hooks
- Standardize blocked-tool errors on Err(ErrorData) — Ok(CallToolResult
  {is_error:true}) was silently treated as success by LLM formatters
- Tighten tool name matching from contains() to exact equality
- Replace hand-rolled trailing-* glob with glob crate for full pattern
  support
- Change manual_compact from Option<bool> to bool (always set for
  compact events, defaults false otherwise)
- Delete unused Hooks::fire() method (all call sites use run())
@tlongwell-block tlongwell-block force-pushed the hooks/claude-code-compatible branch from 0976d80 to e4b822b Compare February 22, 2026 01:56
Claude Code supports "prompt" and "agent" hook action types that goose
doesn't implement yet. Previously, a .claude/settings.json containing
these types would fail serde deserialization entirely, silently dropping
all hooks in the file — including valid "command" hooks.

Adds a custom deserializer for HookEventConfig.hooks that parses each
action individually:
- Known types (command, mcp_tool): deserialize normally, warn on
  malformed config with the actual error
- Unknown types (prompt, agent, etc.): warn and skip
- Missing type field: warn and skip

This matches the existing pattern in HookSettingsFile where unknown
event names are warned and skipped rather than causing parse failures.
…lUse, PostToolUseFailure

- Add inject_hook_context helper, refactor run_post_compact_hook to use it
- Wire context injection for 4 additional events (was PostCompact-only)
- mcp_tool parse_result: non-HookResult output becomes additional_context
- 32KB truncation on mcp_tool output matching command path
- Context injected as user message (hidden from user, visible to agent)
@ccross2
Copy link
Copy Markdown

ccross2 commented Feb 24, 2026

Hi @tlongwell-block — we've been building a similar hooks system from the Claude Code side and wanted to share some findings that might be useful for this PR.

Our implementation: https://github.com/sovren-software/goose/tree/feat/lifecycle-hooks


Architecture comparison

We took a subprocess-first approach (direct process spawning, matching how Claude Code itself executes hooks) vs. your MCP-first approach. A few design differences worth discussing:

Config format: We used the Claude Code settings.json-compatible YAML format so hooks can be shared between Claude Code and Goose:

hooks:
  session_start:
    - command: "/path/to/hook.sh"
      timeout: 15
  pre_tool_use:
    - command: "/path/to/guard.sh"
      tool_name: "developer__shell"  # regex filter

Subprocess-first trade-off: Direct spawning means hooks work without an active MCP extension. The downside is no mcp_tool action type — users can't call MCP tools from hook config. Your MCP-first approach is architecturally cleaner for Goose's design; subprocess-first is more portable for Claude Code parity.


Bugs we found and fixed (may apply here too)

1. SessionStart race condition (critical)

If run_session_start_hooks is called inside tokio::spawn, the first user message can arrive before context injection completes. The fix is to await before returning the HTTP response:

// Wrong: spawn races with first message
tokio::spawn(async move {
    if let Some(ctx) = run_session_start_hooks(&session_id).await { ... }
});
Ok(Json(session))

// Correct: await before response  
if let Some(ctx) = run_session_start_hooks(&session_id).await { ... }
Ok(Json(session))

2. session_id not in PreToolUse payloads

The ToolInspector trait is called after agent construction, when session_id is known. Without explicit propagation, PreToolUse payloads send null for session_id. We used Arc<RwLock<Option<String>>> interior mutability + a setter called from get_or_create_agent() and reply().

3. Empty tool_name on parse failure

If a ToolRequest.tool_call is Err (malformed call), iterating without a guard passes an empty string through .* wildcard filters. Should continue on Err before any filtering.


Tests

We have 12 tests (6 unit in inspector.rs, 6 integration in tests/hooks_integration.rs) covering: deny/approve/fail-open decisions, regex matching, wildcard, multi-hook most-restrictive semantics, config round-trip, end-to-end context injection, fire-and-forget, shlex quoting.

Happy to contribute the test file directly to this PR if useful — our test setup (script-writing helpers, async test fixtures) should port cleanly to your architecture.


One question on your security model: is allow_project_hooks per-user-config only, or could it also be set in the project config itself? (Allowing project configs to self-authorize would defeat the protection.)

@hugotown
Copy link
Copy Markdown

I'm definitely switching to Goose as soon as this feature comes out. 🙌🏻

…patible

* origin/main: (70 commits)
  feat: allow goose askai bot to search goose codebase (#7508)
  Revert "Reapply "fix: prevent crashes in long-running Electron sessions""
  Reapply "fix: prevent crashes in long-running Electron sessions"
  Revert "fix: prevent crashes in long-running Electron sessions"
  fix: replace unwrap() with graceful error in scheduler execute_job (#7436)
  fix: Dictation API error message shows incorrect limit (#7423)
  fix(acp): Use ACP schema types for session/list (#7409)
  fix(desktop): make bundle and updater asset naming configurable (#7337)
  fix(openai): preserve order in Responses API history (#7500)
  Use the correct Goose emoji 🪿 instead of Swan in README.md (#7485)
  feat(ui): implement fullscreen and pip display modes for MCP Apps (#7312)
  fix: prevent crashes in long-running Electron sessions
  Disable tool pair summarization (#7481)
  fix: New Recipe Warning does not close on cancel (#7524)
  The client is not the source of truth (#7438)
  feat: support Anthropic adaptive thinking (#7356)
  copilot instructions: reword no prerelease docs (#7101)
  fix(acp): don't fail session creation when model listing is unavailable (#7484)
  feat: simplify developer extension (#7466)
  feat: add goose-powered release notes generator workflow (#7503)
  ...

# Conflicts:
#	Cargo.lock
Replace MCP dispatch + GOOSE_HOOK_EXIT workaround with direct subprocess
spawning for hook commands. Leverages PR #7466's new platform developer
shell (build_shell_command, user_login_path) for native exit codes, stdin
piping, and separate stdout/stderr.

New hooks/shell.rs:
- Deadlock-safe I/O: stdout + stderr drain concurrently before stdin write
- Process group isolation (unix) so Ctrl+C doesn't kill hooks
- Timeout + cancellation support with secondary drain timeouts
- 30s stdin write timeout prevents hanging on uncooperative children

Rewritten hooks/mod.rs:
- Split execute_action into execute_command (direct subprocess) and
  execute_mcp_tool (MCP dispatch, unchanged behavior)
- New parse_command_output with native exit code handling:
  exit 0 = JSON or plain text context, exit 2 = block with stderr reason,
  other = fail-open
- Removed GOOSE_HOOK_EXIT marker parsing entirely
- Zero-timeout guard (0 -> 600s) on both Command and McpTool paths

HooksOutcome.reason:
- Added reason field to propagate hook block reasons
- All 3 call sites (agent.rs x2, tool_execution.rs x1) now surface
  the hook's stderr message instead of hardcoded strings

Tests (11 total):
- 9 pure-logic tests for parse_command_output covering every exit code
  path, JSON/non-JSON parsing, truncation, blockable/non-blockable
- 2 real-subprocess tests for run_hook_command: stdin piping via jq,
  exit-2 stderr capture (Claude Code block protocol)

Crossfire reviewed: R1 6/10 -> R2 8/10 -> R3 9/10 APPROVE (Opus+Codex)
Regex matchers (Claude Code compat):
- Replace glob/exact-match with regex for tool name matching
- Edit|Write, mcp__memory__.*, Notebook.* patterns now work
- Bash/Bash(pattern) special case preserved, also accepts 'shell'
- Notification matcher also upgraded to regex
- Removed glob dependency from Cargo.toml

ContextFill hook (goose extension):
- New HookEventKind::ContextFill fires when context reaches a
  configured percentage threshold (e.g., 70%)
- Matcher is the threshold: {"matcher": "70", ...}
- Fires once per threshold crossing, not every turn while above
- Hook receives current_tokens, context_limit, fill_percentage
- Checked after each LLM response using provider-reported token count
- Output injected as context via inject_hook_context

SessionEnd hook:
- Fires after Stop hook at end of reply_internal
- Side-effect only (cleanup, logging) — cannot block

Tests (15 total):
- 9 exit-code contract tests (unchanged)
- 2 real-subprocess tests (unchanged)
- 4 new regex matcher tests: alternation, wildcard, exact, invalid
Resolved conflicts in:
- Cargo.lock (accepted main's version)
- agent.rs (kept main's refactored stream pattern + all hooks integration)
- tool_execution.rs (kept main's security logging + hooks PreToolUse)
- shell.rs (kept main's LazyLock + added pub(crate) accessor for hooks)

Also fixed hooks/mod.rs to use new ToolCallContext API and
CallToolRequestParams builder pattern from rmcp 1.2.0.
Critical:
- ContextFill threshold state now persists across turns via Arc on Agent
  (previously reset every reply() call due to Hooks::load creating fresh instance)

Major:
- break→continue in approval loop when PreToolUse hook blocks, so remaining
  tools still get responses (was leaving empty Message::user() objects)
- Shell drain tasks: bounded reads (32KB stdout, 4KB stderr) prevent OOM
- Kill entire process group on Unix timeout/cancel (not just direct child)
- Cap JSON-parsed HookResult fields (additionalContext, reason) same as
  plain-text path — prevents oversized context injection via valid JSON

Minor:
- Regex anchoring: || → && so partially-anchored patterns still get wrapped
- Stop/SubagentStop removed from can_block() since block result is never checked
- HookResult fields documented as Claude Code compat (unused by goose)
- Added libc dependency for process group kill on Unix
- Apply same 32KB/4KB caps to MCP hook JSON-parsed fields
  (parse_mcp_result was missing caps on the JSON success path)
- Fix misleading comment about drain task abort — tasks are safe
  to detach because reads are bounded and process group is killed
Formatting only — no logic changes.
@tlongwell-block tlongwell-block marked this pull request as ready for review March 17, 2026 16:06
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76cec30c81

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/goose/src/agents/agent.rs Outdated
Comment on lines +1296 to +1300
if let Some(last_user_msg) = conversation
.messages()
.iter()
.rev()
.find(|m| m.role == rmcp::model::Role::User)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use original user prompt for UserPromptSubmit hooks

UserPromptSubmit currently derives its prompt by taking the last Role::User message from conversation, but SessionStart hooks run just before this and may inject hidden user-context messages via inject_hook_context. In that case, the hook receives the injected context instead of the actual user input, so prompt-validation/blocking hooks can make decisions on the wrong text during first-turn flows where SessionStart adds context.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e8025bd: the user prompt is now captured before SessionStart hooks fire, so UserPromptSubmit always sees the actual user input.

Comment thread crates/goose/src/agents/agent.rs Outdated
let invocation = crate::hooks::HookInvocation::pre_compact(
session_config.id.clone(),
conversation_to_compact.messages().len(),
true, // manual = true for explicit compaction
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mark auto-compaction hook invocations as automatic

In the auto-compaction path (Exceeded auto-compact threshold...), PreCompact/PostCompact invocations are tagged with manual = true. Hook matching in hooks/mod.rs uses this flag to route manual vs auto, so this mislabels automatic compactions as manual, causing matcher: "auto" hooks to never fire and matcher: "manual" hooks to run unexpectedly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e8025bd: the auto-compaction path now correctly passes manual=false to PreCompact/PostCompact hooks.

@tlongwell-block
Copy link
Copy Markdown
Collaborator Author

/goose

Douwe Osinga added 2 commits March 26, 2026 13:12
Signed-off-by: Douwe Osinga <douwe@squareup.com>
- Fix P2: auto-compaction path now correctly passes manual=false to
  PreCompact/PostCompact hooks (was incorrectly passing true)
- Fix P1: capture user prompt text before SessionStart hooks fire,
  preventing UserPromptSubmit from receiving injected context instead
  of the actual user input
- Remove comments that restate what the code does

Signed-off-by: Douwe Osinga <douwe@squareup.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8025bd850

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

session_config.id.clone(),
conversation.clone(),
tool_call_cut_off,
0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve current-turn tool pairs from summarization

maybe_summarize_tool_pairs is now always called with protect_last_n = 0, which means no tool calls from the active turn are protected from summarization. In multi-step tool turns near the cutoff, this can summarize and hide intermediate tool request/response pairs before the model finishes the turn, removing context it still needs for follow-up tool decisions or the final answer. The context_mgmt::tool_ids_to_summarize contract explicitly treats protect_last_n as “current turn” protection, so hard-coding 0 regresses correctness.

Useful? React with 👍 / 👎.

Comment on lines +74 to +76
if fill_pct >= threshold && !fired.contains(&threshold) {
fired.insert(threshold);
new_thresholds.push(threshold);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-arm ContextFill thresholds after usage drops

check_context_fill only inserts thresholds into fired_context_thresholds and never removes them when context usage later drops (for example after compaction). As a result, each threshold fires at most once for the entire session, so later threshold crossings do not trigger hooks again. This breaks the documented “once per threshold crossing” behavior and prevents repeated context-refresh hooks in long sessions with multiple compact/rebuild cycles.

Useful? React with 👍 / 👎.

Comment on lines +1335 to +1337
return Ok(Box::pin(async_stream::try_stream! {
yield AgentEvent::Message(Message::assistant().with_text(block_msg));
}));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run terminal lifecycle hooks on prompt-block path

When UserPromptSubmit returns a block, reply_internal returns early with a synthetic assistant message. That early return skips the Stop and SessionEnd hook invocations that are executed only at the normal function tail, so cleanup/audit hooks never run for blocked prompts. Any workflows relying on those lifecycle hooks will silently miss blocked-turn exits.

Useful? React with 👍 / 👎.

@DOsinga
Copy link
Copy Markdown
Collaborator

DOsinga commented Mar 26, 2026

This is a solid architecture and we want hooks in goose, but 1900+ lines in a single PR is a lot to review and merge confidently — can we start simpler and iterate?

Here's where the bulk comes from and how to slim it down:

HookInvocation is a god struct (334 lines in types.rs). 15 optional fields, 18 constructors that each set 2-4 of them. Make it an enum with #[serde(tag = "hook_event_name")] — each variant carries only its own fields, no optional soup.

Agent integration is copy-paste (~130 lines across 10 call sites). Every site repeats the same 12-15 line pattern: build invocation → hooks.run(invocation, &self.extension_manager, &working_dir, cancel_token...) → check blocked → check context. Create a HookRunner that captures the shared state once, so each call site becomes a one-liner like hook_runner.fire_pre_tool_use(tool_name, tool_input).await.

Duplicated parsing. parse_command_output and parse_mcp_result do the same thing for two backends — unify them.

8 of 18 event kinds are never fired. SubagentStart, SubagentStop, Notification, PermissionRequest, TeammateIdle, TaskCompleted, ConfigChange are defined but not wired up. Drop them and add when needed.

Suggested phasing:

  1. Infrastructure + core events (~500 lines): config loading, shell execution, the enum-based invocation type, and HookRunner. Only the events that are actually wired up.
  2. Agent wiring (~300 lines): PreToolUse, PostToolUse, UserPromptSubmit, SessionStart/End, Stop, PreCompact/PostCompact. With HookRunner these are one-liners.
  3. Follow-up PRs: ContextFill (most complex — state tracking across turns), MCP tool execution backend, additional event kinds as they get wired up.

That gets the same functionality in ~800 lines for the first two PRs, and each follow-up is a clean ~200 line addition.

(I also pushed a commit merging main and fixing the two codex findings — auto-compaction was passing manual=true to hooks, and UserPromptSubmit was picking up injected context instead of the actual user prompt.)

@DOsinga DOsinga added the needs_human label to set when a robot looks at a PR and can't handle it label Mar 26, 2026
ccross2 added a commit to sovren-software/goose that referenced this pull request Mar 30, 2026
162 hook timing measurements across 157 sessions:
- SessionStart: 17.1ms p50, UserPromptSubmit: 2.8ms, PreToolUse: 3.0ms
- Per-session overhead: ~39.6ms (0.76% of session time)

50 agent quality runs (goose-vanilla vs goose-engram, k=5):
- Hooks are quality-transparent: 34.0 vs 30.0 mean exec_score (within noise)
- Fail-open design verified: hooks fire but don't degrade tasks

Architecture comparison vs upstream PR aaif-goose#7411:
- 1,045 vs 1,972 LOC (47% reduction)
- enum (8 variants) vs struct (15 optional fields)
- 0 vs 8 unwired events
- 0 vs 1+ rmcp imports
- 19 tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ccross2 added a commit to sovren-software/goose that referenced this pull request Mar 30, 2026
Response addresses each of Douwe Osinga's concerns with benchmark data:
- God struct replaced with tagged enum (242 vs 334 LOC)
- Copy-paste wiring replaced with HookRuntime.emit() (13 sites, ~8 LOC each)
- Duplicated parsing unified to single serde_json::from_str
- 8 unwired events removed (all 8 defined events fire in production)
- Head-to-head benchmark: identical performance, 47% less code

Offers to rework PR in suggested 2-phase structure using fork architecture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ccross2
Copy link
Copy Markdown

ccross2 commented Mar 30, 2026

Thanks for the detailed review, @DOsinga. You're right that 1,900+ lines in a single PR is too much. I agree with every structural concern you raised, and I've already solved them in my fork.

After your feedback, I ran benchmarks and refactored. Here's what I found and what I'd like to propose.

Your concerns, my solutions

1. "HookInvocation is a god struct (334 lines, 15 optional fields, 18 constructors)"

I replaced this with a tagged enum:

#[derive(Debug, Clone, Serialize)]
#[serde(tag = "hook_event_name")]
pub enum HookEvent {
    SessionStart { session_id: String, cwd: String },
    UserPromptSubmit { session_id: String, user_prompt: String, cwd: String },
    PreToolUse { session_id: String, tool_name: String, tool_input: Value, cwd: String },
    PostToolUse { session_id: String, tool_name: String, tool_input: Value, tool_output: String, cwd: String },
    // ... 4 more variants, each carrying only its own fields
}

242 lines total. Zero optional fields. Zero constructors (enum variants ARE constructors). Each variant carries exactly the data it needs, no "optional soup."

2. "Agent integration is copy-paste (~130 lines across 10 call sites)"

I created HookRuntime with a single emit() method. Each call site is ~8 lines:

if let Some(ref hooks) = hook_runtime {
    let outcome = hooks.emit(
        HookEvent::PreToolUse {
            session_id: session_id.clone(),
            tool_name: name.clone(),
            tool_input: serde_json::to_value(&args).unwrap_or_default(),
            cwd: working_dir.display().to_string(),
        },
        &working_dir,
        cancel_token.clone(),
    ).await;
    if outcome.blocked { /* handle block */ }
}

13 emit sites across 3 files. No repeated patterns beyond the if let Some(ref hooks) guard.

3. "Duplicated parsing (parse_command_output = parse_mcp_result)"

I have one parse path: serde_json::from_str::<HookResult>(stdout) with a camelCase alias for Claude Code compatibility. No MCP result parsing because I don't route through MCP.

4. "8 of 18 event kinds are never fired"

I define exactly 8 events, all fired in production:
SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PostToolUseFailure, PreCompact, PostCompact, Stop.

Adding a new event is a one-line enum variant + one emit call.

Architecture: direct subprocess vs MCP routing

The original PR routes hooks through developer__shell. I route hooks as direct subprocesses via tokio::process::Command, decoupled from MCP entirely.

I benchmarked both paths head-to-head (same command, same machine, 100 iterations):

Command Direct subprocess DeveloperClient shell Ratio
echo benchmark 1.0ms p50 1.1ms p50 1.08x
sleep 0.01 12.0ms p50 11.9ms p50 0.99x

Performance is identical. The subprocess fork+exec cost dominates both paths. The MCP routing and output processing overhead (truncation, markdown rendering, CallToolResult wrapping) is negligible.

But the architectural difference is significant:

Metric My HookRuntime This PR
Total LOC 1,045 1,972
Type definition enum (8 tagged variants) struct (15 optional fields)
Unwired events 0 8
rmcp imports in hook module 0 1+
Tests 19 -
Execution dependency None (direct subprocess) developer__shell extension must be loaded

Same performance, 47% less code, zero coupling to the extension system.

Proposal

I'd like to rework this PR using the architecture above, split into your suggested phases:

Phase 1: Infrastructure + core events (~500 lines)

  • hooks/types.rs: HookEvent enum, HookResult, HookOutcome
  • hooks/subprocess.rs: direct subprocess executor with deadlock-safe I/O
  • hooks/config.rs: JSON config loading (global + project override)
  • hooks/mod.rs: HookRuntime with emit() and matcher logic

Phase 2: Agent wiring (~200 lines)

  • 13 emit sites in agent.rs + tool_execution.rs (one-liners with HookRuntime)
  • Context injection helper for SessionStart/UserPromptSubmit
  • Block handling for PreToolUse

This has been running in production on my fleet for 5 weeks with zero hook-related failures and one upstream sync (v1.29.0, 126 commits) absorbed with zero hook-related conflicts.

Full benchmark data: https://github.com/sovren-software/goose/blob/main/docs/hook-benchmark-results-2026-03-30.md

Happy to start on Phase 1 whenever you're ready.

@ccross2
Copy link
Copy Markdown

ccross2 commented Mar 30, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs_human label to set when a robot looks at a PR and can't handle it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants