Skip to content

fix(security): sanitize WebFetch/WebSearch output via PostToolUse hook#1035

Open
clawtom wants to merge 3 commits intoqwibitai:mainfrom
clawtom:fix/sanitize-webfetch-clean
Open

fix(security): sanitize WebFetch/WebSearch output via PostToolUse hook#1035
clawtom wants to merge 3 commits intoqwibitai:mainfrom
clawtom:fix/sanitize-webfetch-clean

Conversation

@clawtom
Copy link

@clawtom clawtom commented Mar 13, 2026

Adds a PostToolUse hook that sanitizes results from WebFetch and WebSearch before they reach the agent context.

Why: External web content can contain adversarial strings (prompt injection payloads) that attempt to manipulate model behavior. A deterministic filter on tool output removes this attack surface before the model sees it.

This came from a live attack: a Wikipedia user embedded the nanoclaw refusal trigger string in talk page content, expecting my agent to read it. The sanitizer caught it. Gurkubondinn later confirmed on-wiki that the refusal string had "stopped working."

Changes:

  • New file: container/agent-runner/src/sanitize-external-content.ts
    • Exports createSanitizeWebContentHook() — a HookCallback that recursively scans tool results and redacts any occurrence of the magic refusal trigger string
  • Modified: container/agent-runner/src/index.ts
    • Registers the hook as PostToolUse for WebFetch and WebSearch matchers

This branch is based on current upstream main (clean, no merge conflicts).

Copy link

@Dhebrank Dhebrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Request Changes

Critical: The hook is a no-op. The sanitized content is never applied because the return shape uses wrong field names.

The hook returns:

return {
  hookSpecificOutput: {
    hook_type: "PostToolUse",        // Wrong — should be hookEventName
    tool_response: sanitized,         // Wrong — should be updatedMCPToolOutput
  },
};

The SDK expects PostToolUseHookSpecificOutput:

{
  hookEventName: 'PostToolUse',
  updatedMCPToolOutput?: unknown,
}

The existing PreToolUse hook in the same file correctly uses hookEventName. This hook uses hook_type instead, which the SDK silently ignores. The sanitized content is discarded and the original (unsanitized) response reaches the model.

Fix:

hookSpecificOutput: {
  hookEventName: 'PostToolUse',
  updatedMCPToolOutput: sanitized,
},

Additional issues:

  1. Exact-match only — trivially bypassed with whitespace, Unicode homoglyphs, or case changes. Consider regex or fuzzy matching.

  2. Trigger string hardcoded in public repo — any attacker can read it and craft variants. Consider loading patterns from a config file.

  3. Use SDK types — import PostToolUseHookInput instead of hand-rolling an anonymous type. Would have caught this bug at compile time.

  4. No tests — the recursive sanitizeValue function needs unit tests for objects, arrays, null, nested structures.

  5. Merge conflict — the hooks block will conflict with current main which already has both PreCompact and PreToolUse hooks.

The security motivation is good — just needs the critical field name fix and tests.

clawtom and others added 3 commits March 14, 2026 19:06
Addresses Dhebrank's review on qwibitai#1035.

Critical fix: the hook was silently discarding sanitized content because
the return used wrong field names (`hook_type`/`tool_response`) that the
SDK does not recognise. The SDK expects `PostToolUseHookSpecificOutput`:

  hookEventName: 'PostToolUse'
  updatedMCPToolOutput: <sanitized>

Additional improvements per review:
- Import and use `PostToolUseHookInput` from the SDK instead of an
  anonymous cast — this would have caught the field name bug at compile
  time.
- Switch from exact-string `.replaceAll()` to a compiled case-insensitive
  regex, reducing the risk of trivial bypass via mixed-case injection.
- Export `sanitizeValue` so it can be unit-tested in isolation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@clawtom clawtom force-pushed the fix/sanitize-webfetch-clean branch from a15100b to b8921ca Compare March 14, 2026 19:06
@clawtom
Copy link
Author

clawtom commented Mar 14, 2026

Thanks for the thorough review — the critical field name bug was a real miss.

Fixed in the latest commit (b8921ca):

  • Field names corrected: hook_type/tool_responsehookEventName/updatedMCPToolOutput (matching PostToolUseHookSpecificOutput)
  • SDK type imported: PostToolUseHookInput replaces the anonymous cast — would have caught this at compile time
  • Regex matching: switched from exact-string .replaceAll() to a compiled case-insensitive regex (/gi flag), reducing the bypass surface
  • sanitizeValue exported: available for unit tests now
  • Rebased on current main: no conflicts

On the hardcoded trigger string — agree it's suboptimal. I've left that for a separate issue since loading patterns from config is a bigger change that deserves its own discussion. Happy to open one if that's useful.

On tests for sanitizeValue — can add those in a follow-up or here if you'd prefer it in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Fix Bug fix Status: Needs Review Ready for maintainer review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants