fix(guardrails): scan Anthropic tool_result content blocks by Anai-Guo · Pull Request #29594 · BerriAI/litellm

Anai-Guo · 2026-06-03T16:12:41Z

Summary

The Generic Guardrail API's _extract_input_text_and_images() (in litellm/llms/anthropic/chat/guardrail_translation/handler.py) only reads the "text" key when iterating list content blocks. Anthropic tool_result blocks carry their text under "content" (not "text"), so tool outputs were silently skipped and never added to texts_to_check.

This means content returned by tools — file reads, API responses, database rows, etc. — bypassed all guardrails built on the Generic Guardrail API. In agentic workflows (Cursor, coding agents, MCP file reads) this is a PII/secret-leak gap: a tool result containing an API key or PII would never be scanned.

Change

In the list-content branch, after the existing text extraction, also handle tool_result blocks:

content as a string → appended directly.
content as a list of blocks → each text block's text is appended.

Both forms are part of the Anthropic tool_result spec. Images and other block types are untouched. 18 lines added, no existing behavior changed.

Test plan

Guardrail with a tool_result (string content) now scans the tool output.
Guardrail with a tool_result whose content is a list of text blocks scans each block.
Plain text / image blocks behave exactly as before (no regression).

🤖 Generated with Claude Code

The Generic Guardrail API's _extract_input_text_and_images only read the "text" key from list content blocks, so Anthropic tool_result blocks (whose text lives under "content") were silently skipped. Tool outputs such as file reads and API responses bypassed all guardrail scanning, a PII/secret-leak gap in agentic workflows. Also handles the list form of tool_result content (blocks of type text).

codspeed-hq · 2026-06-03T16:15:10Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing Anai-Guo:fix/guardrail-tool-result-redaction (2a4847e) with main (5be0797)}

greptile-apps · 2026-06-03T16:20:30Z

Greptile Summary

This PR fixes a gap where Anthropic tool_result content blocks were silently skipped by the Generic Guardrail API's _extract_input_text_and_images(), allowing tool outputs (file reads, API responses, etc.) to bypass all guardrail scanning.

Adds tool_result handling in the list-content branch: both string content and list-of-text-blocks content are now extracted and appended to texts_to_check.
The extraction logic is correct, but the write-back in _apply_guardrail_responses_to_input always writes guardrailed text to block[\"text\"] — for tool_result blocks the actual field is block[\"content\"], so any guardrail that modifies or redacts content (rather than just blocking) will silently leave the original sensitive data in the message payload unchanged.
No automated tests were added for the new code path, leaving the string-content case, the list-of-blocks case, and the regression guard for plain text/images without coverage.

Confidence Score: 3/5

The scanning fix is a genuine improvement, but a write-back mismatch means redacting guardrails will not actually update tool_result content — a silent correctness failure on the modified code path.

Extraction of tool_result text for scanning is correct and closes a real gap. However, _apply_guardrail_responses_to_input unconditionally writes the guardrailed text to ["text"] on the content-list item, while tool_result blocks store their text under ["content"]. For any guardrail that rewrites content (PII redaction, etc.) the fix leaves the original sensitive text intact and adds a spurious key — a present, observable data-integrity defect on the newly added code path. No tests were added to catch this or future regressions.

litellm/llms/anthropic/chat/guardrail_translation/handler.py — both the new extraction logic and the existing _apply_guardrail_responses_to_input write-back need attention.

Important Files Changed

Filename	Overview
litellm/llms/anthropic/chat/guardrail_translation/handler.py	Adds `tool_result` content extraction to guardrail scanning; extraction logic is correct, but the write-back path in `_apply_guardrail_responses_to_input` writes to `["text"]` instead of `["content"]`, so redacting guardrails would silently fail to update the actual `tool_result` payload.

_{Reviews (1): Last reviewed commit: "fix(guardrails): scan Anthropic tool_res..." | Re-trigger Greptile}

greptile-apps · 2026-06-03T16:20:34Z

+                if content_item.get("type") == "tool_result":
+                    tool_result_content = content_item.get("content")
+                    if isinstance(tool_result_content, str):
+                        texts_to_check.append(tool_result_content)
+                        task_mappings.append((msg_idx, int(content_idx)))
+                    elif isinstance(tool_result_content, list):
+                        for block in tool_result_content:
+                            if isinstance(block, dict):
+                                block_text = block.get("text")
+                                if block_text is not None:
+                                    texts_to_check.append(block_text)
+                                    task_mappings.append(
+                                        (msg_idx, int(content_idx))
+                                    )


Write-back mismatch for tool_result blocks

The extraction correctly reads from content_item["content"], but the write-back in _apply_guardrail_responses_to_input (lines 307-311) always writes to content_item["text"]. For tool_result blocks, task_mappings stores (msg_idx, content_idx) pointing at the outer tool_result dict — so the write-back executes tool_result_block["text"] = guardrail_response, silently adding a spurious key while leaving tool_result_block["content"] (the actual sensitive text) untouched. Any guardrail that modifies or redacts content (rather than just raising an exception) will appear to succeed but will actually leave the original tool_result content in the payload unchanged.

greptile-apps · 2026-06-03T16:20:34Z

+                if content_item.get("type") == "tool_result":
+                    tool_result_content = content_item.get("content")
+                    if isinstance(tool_result_content, str):
+                        texts_to_check.append(tool_result_content)
+                        task_mappings.append((msg_idx, int(content_idx)))
+                    elif isinstance(tool_result_content, list):
+                        for block in tool_result_content:
+                            if isinstance(block, dict):
+                                block_text = block.get("text")
+                                if block_text is not None:
+                                    texts_to_check.append(block_text)
+                                    task_mappings.append(
+                                        (msg_idx, int(content_idx))
+                                    )


No tests provided for the bug fix

The PR description's test plan checkboxes are all unchecked ([ ]), and there are no new test cases in tests/test_litellm/llms/anthropic/chat/guardrail_translation/test_anthropic_guardrail_handler.py covering tool_result scanning. Without automated tests for the string-content case, the list-of-blocks case, and a regression check on plain text/image blocks, a future refactor of this extraction path has no safety net.

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

codecov · 2026-06-03T16:20:41Z

Codecov Report

❌ Patch coverage is 12.50000% with 21 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ms/anthropic/chat/guardrail_translation/handler.py	12.50%	21 Missing ⚠️

📢 Thoughts on this report? Let us know!

veria-ai · 2026-06-03T16:25:07Z

PR overview

This pull request updates Anthropic chat guardrail translation so tool_result content blocks are included in the guardrail scanning path. The change is focused on handling Anthropic’s tool output representation inside user messages.

There is one open issue remaining: the configured option to skip tool messages is not yet applied to Anthropic tool_result blocks, so tool output may still be sent through guardrail processing when the caller intended to exclude it. This mainly affects deployments that rely on that skip setting, and one prior issue has already been addressed.

Open issues (1)

Medium: Tool-message skip is not applied to Anthropic tool results — litellm/llms/anthropic/chat/guardrail_translation/handler.py:245

Fixed/addressed: 1 · PR risk: 4/10

…rriAI#29593)

Anai-Guo · 2026-06-03T19:15:08Z

Addressed the write-back gap flagged in review: sanitized tool_result text was previously written back to content_item["text"], but Anthropic tool_result text lives under content (string or list of blocks). That meant redactions in sanitize-mode were silently dropped for tool outputs.

The task mapping is now a (msg_idx, content_idx, block_idx) tuple so _apply_guardrail_responses_to_input can route each sanitized segment to the correct location:

tool_result with string content → written back to content
tool_result with list-of-blocks content → written back to content[block_idx]["text"]
regular text blocks → unchanged (["text"])

The output-path mapping is untouched. Net change is +31/-12 in handler.py.

🤖 Generated with Claude Code

veria-ai · 2026-06-03T19:42:36Z

+                # outputs (file reads, API responses) bypass guardrail scanning.
+                # The block index is tracked so sanitized text is written back to
+                # the correct location (see _apply_guardrail_responses_to_input).
+                if content_item.get("type") == "tool_result":


Medium: Tool-message skip is not applied to Anthropic tool results

skip_tool_message_in_guardrail only returns early for messages whose role is tool, but Anthropic tool outputs are tool_result blocks inside a user message. With this extraction path, a caller that configured tool messages to be skipped can still have tool_result["content"] sent to apply_guardrail; skip these blocks before extracting their text when skip_tool_message is true.

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

veria-ai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread litellm/llms/anthropic/chat/guardrail_translation/handler.py Outdated

Anai-Guo added 2 commits June 3, 2026 12:10

style: black-format handler.py (collapse task_mappings.append call)

1600d40

fix(guardrails): write sanitized tool_result text back to content (Be…

2a4847e

…rriAI#29593)

veria-ai Bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(guardrails): scan Anthropic tool_result content blocks#29594

fix(guardrails): scan Anthropic tool_result content blocks#29594
Anai-Guo wants to merge 3 commits into
BerriAI:mainfrom
Anai-Guo:fix/guardrail-tool-result-redaction

Anai-Guo commented Jun 3, 2026

Uh oh!

codspeed-hq Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 3, 2026

Important Files Changed

Uh oh!

greptile-apps Bot Jun 3, 2026

Uh oh!

greptile-apps Bot Jun 3, 2026

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

veria-ai Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Anai-Guo commented Jun 3, 2026

Uh oh!

veria-ai Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Anai-Guo commented Jun 3, 2026

Summary

Change

Test plan

Uh oh!

codspeed-hq Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps Bot commented Jun 3, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

veria-ai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Open issues (1)

Uh oh!

Anai-Guo commented Jun 3, 2026

Uh oh!

veria-ai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq Bot commented Jun 3, 2026 •

edited

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading

veria-ai Bot commented Jun 3, 2026 •

edited

Loading