fix(bedrock_guardrails): use Bedrock OUTPUT source for apply_guardrail when scanning model responses#26144
Conversation
… scans
BedrockGuardrail.apply_guardrail hardcoded source="INPUT" regardless of the
input_type parameter. On the non-streaming post-call path (unified_guardrail
-> OpenAIChatCompletionsHandler.process_output_response -> apply_guardrail),
the model response text was sent to Bedrock as INPUT, so guardrail policies
configured for Output (e.g. PII/NAME blocking) returned action=NONE and the
response passed through unblocked. The streaming path was unaffected because
it calls make_bedrock_api_request(source="OUTPUT", ...) directly.
Map input_type to the correct Bedrock source ("request" -> INPUT,
"response" -> OUTPUT) and build a synthetic ModelResponse for the OUTPUT
path so _create_bedrock_output_content_request produces the correct payload.
Made-with: Cursor
Greptile SummaryThis PR fixes Confidence Score: 5/5Safe to merge — the fix is correct, well-targeted, and covered by new regression tests. All remaining considerations (the pre-existing No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py | Core fix: apply_guardrail now routes input_type="response" to Bedrock source="OUTPUT" with a synthetic ModelResponse, and input_type="request" to source="INPUT" with user messages as before. |
| tests/test_litellm/proxy/guardrails/guardrail_hooks/test_bedrock_guardrails.py | Two new mock-only tests added to verify INPUT/OUTPUT source routing; existing tests unchanged. All network calls are properly mocked. |
Sequence Diagram
sequenceDiagram
participant UG as unified_guardrail
participant AG as apply_guardrail
participant BR as make_bedrock_api_request
UG->>AG: apply_guardrail(input_type="request")
AG->>AG: build mock_messages (role="user")
AG->>BR: source="INPUT", messages=[...]
BR-->>AG: BedrockGuardrailResponse
AG-->>UG: processed inputs
UG->>AG: apply_guardrail(input_type="response")
AG->>AG: build mock_messages (role="user")
AG->>AG: build synthetic ModelResponse (choices with role="assistant")
AG->>BR: source="OUTPUT", response=synthetic_response
BR-->>AG: BedrockGuardrailResponse
AG-->>UG: processed inputs
Reviews (2): Last reviewed commit: "test(bedrock_guardrails): assert apply_g..." | Re-trigger Greptile
| if bedrock_source == "OUTPUT": | ||
| # Build a synthetic ModelResponse whose choices carry the | ||
| # text(s) to scan, so _create_bedrock_output_content_request | ||
| # can produce the correct Bedrock OUTPUT payload. | ||
| synthetic_response = ModelResponse( | ||
| choices=[ | ||
| Choices( | ||
| index=_idx, | ||
| message=Message( | ||
| role="assistant", | ||
| content=str(_msg.get("content") or ""), | ||
| ), | ||
| finish_reason="stop", | ||
| ) | ||
| for _idx, _msg in enumerate(filtered_messages) | ||
| ] | ||
| ) | ||
| bedrock_response = await self.make_bedrock_api_request( | ||
| source="OUTPUT", | ||
| response=synthetic_response, | ||
| request_data=request_data, | ||
| ) | ||
| else: | ||
| bedrock_response = await self.make_bedrock_api_request( | ||
| source="INPUT", | ||
| messages=filtered_messages, | ||
| request_data=request_data, |
There was a problem hiding this comment.
No test coverage for the primary fix path
The existing test for apply_guardrail with input_type="response" (test_bedrock_apply_guardrail_with_only_tool_calls_response) uses an empty texts list, so make_bedrock_api_request is never actually called — it doesn't exercise the new OUTPUT branch. A test that passes non-empty texts with input_type="response" and asserts make_bedrock_api_request is called with source="OUTPUT" and a synthetic ModelResponse would validate the core fix and guard against regressions.
Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)
…PUT source Add regression tests that mock make_bedrock_api_request and verify input_type=request uses source=INPUT with user messages, and input_type=response uses source=OUTPUT with synthetic ModelResponse. Made-with: Cursor
|
@greptile review again with new commit |
Cause
BedrockGuardrail.apply_guardrail always called make_bedrock_api_request with source="INPUT", even when input_type="response" (post-call / model output). Bedrock guardrails often apply different policies for input vs output (e.g. PII/name rules only on output). Sending assistant text as INPUT led to action=NONE and no block.
Non-streaming completions go through unified_guardrail → OpenAIChatCompletionsHandler.process_output_response → apply_guardrail(..., input_type="response"), so they hit this bug. Streaming worked because that path already used source="OUTPUT" on the Bedrock call.
Fix
Map input_type to the Bedrock source: "request" → INPUT (messages), "response" → OUTPUT. For the OUTPUT path, build a synthetic ModelResponse whose choices carry the text to scan, and call make_bedrock_api_request(source="OUTPUT", response=synthetic_response, ...) so Bedrock evaluates output policies and blocks consistently with streaming.