Skip to content

fix(bedrock_guardrails): use Bedrock OUTPUT source for apply_guardrail when scanning model responses#26144

Open
shivamrawat1 wants to merge 2 commits intolitellm_internal_stagingfrom
litellm_post_call_non_streaming
Open

fix(bedrock_guardrails): use Bedrock OUTPUT source for apply_guardrail when scanning model responses#26144
shivamrawat1 wants to merge 2 commits intolitellm_internal_stagingfrom
litellm_post_call_non_streaming

Conversation

@shivamrawat1
Copy link
Copy Markdown
Collaborator

Cause
BedrockGuardrail.apply_guardrail always called make_bedrock_api_request with source="INPUT", even when input_type="response" (post-call / model output). Bedrock guardrails often apply different policies for input vs output (e.g. PII/name rules only on output). Sending assistant text as INPUT led to action=NONE and no block.

Non-streaming completions go through unified_guardrail → OpenAIChatCompletionsHandler.process_output_response → apply_guardrail(..., input_type="response"), so they hit this bug. Streaming worked because that path already used source="OUTPUT" on the Bedrock call.

Fix
Map input_type to the Bedrock source: "request" → INPUT (messages), "response" → OUTPUT. For the OUTPUT path, build a synthetic ModelResponse whose choices carry the text to scan, and call make_bedrock_api_request(source="OUTPUT", response=synthetic_response, ...) so Bedrock evaluates output policies and blocks consistently with streaming.

… scans

BedrockGuardrail.apply_guardrail hardcoded source="INPUT" regardless of the
input_type parameter. On the non-streaming post-call path (unified_guardrail
-> OpenAIChatCompletionsHandler.process_output_response -> apply_guardrail),
the model response text was sent to Bedrock as INPUT, so guardrail policies
configured for Output (e.g. PII/NAME blocking) returned action=NONE and the
response passed through unblocked. The streaming path was unaffected because
it calls make_bedrock_api_request(source="OUTPUT", ...) directly.

Map input_type to the correct Bedrock source ("request" -> INPUT,
"response" -> OUTPUT) and build a synthetic ModelResponse for the OUTPUT
path so _create_bedrock_output_content_request produces the correct payload.

Made-with: Cursor
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 21, 2026

Greptile Summary

This PR fixes BedrockGuardrail.apply_guardrail always passing source="INPUT" to the Bedrock API even when scanning model responses (input_type="response"), causing output-specific guardrail policies (e.g. PII redaction on model output) to return action=NONE silently. The fix maps input_type to the correct Bedrock source and builds a synthetic ModelResponse carrying role="assistant" choices for the OUTPUT path. Two new regression tests directly validate the source routing for both the INPUT and OUTPUT code paths.

Confidence Score: 5/5

Safe to merge — the fix is correct, well-targeted, and covered by new regression tests.

All remaining considerations (the pre-existing experimental_use_latest_role_message_only role-filter limitation for the OUTPUT path) were flagged in a prior review comment. No new P0/P1 issues found; the core fix is logically sound and the two new tests directly validate the INPUT/OUTPUT source routing.

No files require special attention.

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py Core fix: apply_guardrail now routes input_type="response" to Bedrock source="OUTPUT" with a synthetic ModelResponse, and input_type="request" to source="INPUT" with user messages as before.
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_bedrock_guardrails.py Two new mock-only tests added to verify INPUT/OUTPUT source routing; existing tests unchanged. All network calls are properly mocked.

Sequence Diagram

sequenceDiagram
    participant UG as unified_guardrail
    participant AG as apply_guardrail
    participant BR as make_bedrock_api_request

    UG->>AG: apply_guardrail(input_type="request")
    AG->>AG: build mock_messages (role="user")
    AG->>BR: source="INPUT", messages=[...]
    BR-->>AG: BedrockGuardrailResponse
    AG-->>UG: processed inputs

    UG->>AG: apply_guardrail(input_type="response")
    AG->>AG: build mock_messages (role="user")
    AG->>AG: build synthetic ModelResponse (choices with role="assistant")
    AG->>BR: source="OUTPUT", response=synthetic_response
    BR-->>AG: BedrockGuardrailResponse
    AG-->>UG: processed inputs
Loading

Reviews (2): Last reviewed commit: "test(bedrock_guardrails): assert apply_g..." | Re-trigger Greptile

Comment on lines +1576 to +1602
if bedrock_source == "OUTPUT":
# Build a synthetic ModelResponse whose choices carry the
# text(s) to scan, so _create_bedrock_output_content_request
# can produce the correct Bedrock OUTPUT payload.
synthetic_response = ModelResponse(
choices=[
Choices(
index=_idx,
message=Message(
role="assistant",
content=str(_msg.get("content") or ""),
),
finish_reason="stop",
)
for _idx, _msg in enumerate(filtered_messages)
]
)
bedrock_response = await self.make_bedrock_api_request(
source="OUTPUT",
response=synthetic_response,
request_data=request_data,
)
else:
bedrock_response = await self.make_bedrock_api_request(
source="INPUT",
messages=filtered_messages,
request_data=request_data,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No test coverage for the primary fix path

The existing test for apply_guardrail with input_type="response" (test_bedrock_apply_guardrail_with_only_tool_calls_response) uses an empty texts list, so make_bedrock_api_request is never actually called — it doesn't exercise the new OUTPUT branch. A test that passes non-empty texts with input_type="response" and asserts make_bedrock_api_request is called with source="OUTPUT" and a synthetic ModelResponse would validate the core fix and guard against regressions.

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

…PUT source

Add regression tests that mock make_bedrock_api_request and verify
input_type=request uses source=INPUT with user messages, and
input_type=response uses source=OUTPUT with synthetic ModelResponse.

Made-with: Cursor
@shivamrawat1 shivamrawat1 temporarily deployed to integration-postgres April 21, 2026 02:54 — with GitHub Actions Inactive
@shivamrawat1 shivamrawat1 temporarily deployed to integration-postgres April 21, 2026 02:54 — with GitHub Actions Inactive
@shivamrawat1 shivamrawat1 temporarily deployed to integration-postgres April 21, 2026 02:54 — with GitHub Actions Inactive
@shivamrawat1
Copy link
Copy Markdown
Collaborator Author

@greptile review again with new commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant