fix(security): fix prompt injection detection — async heuristics + inheritance chain#24346
fix(security): fix prompt injection detection — async heuristics + inheritance chain#24346xr843 wants to merge 1 commit intoBerriAI:mainfrom
Conversation
…heritance chain Fixes BerriAI#19499 Two critical bugs in prompt injection detection: 1. Heuristics check blocks event loop: check_user_input_similarity() is a CPU-bound O(n*m) SequenceMatcher operation called synchronously inside async_pre_call_hook(). This blocks the FastAPI event loop for 60-90s on large inputs, causing K8s health probes to fail and pods to restart. Fix: offload to asyncio.to_thread(). 2. LLM API check never executes: _OPTIONAL_PromptInjectionDetection extends CustomLogger, but the proxy dispatcher (during_call_hook in utils.py) only invokes async_moderation_hook() for CustomGuardrail instances. The llm_api_check code path was therefore unreachable. Fix: change base class to CustomGuardrail and call super().__init__() with guardrail_name and default_on=True. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes two genuine bugs in
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/hooks/prompt_injection_detection.py | Base class changed from CustomLogger to CustomGuardrail (fixing llm_api_check dispatch), and check_user_input_similarity now offloaded to asyncio.to_thread — but default_on=True without event_hook restriction makes the guardrail run for all event types, and the dispatch reroute silently lets disable_global_guardrail bypass the pre-call injection check. |
| tests/test_litellm/proxy/hooks/test_prompt_injection_detection.py | Three new unit tests added: isinstance check for CustomGuardrail inheritance, should_run_guardrail return value for during_call, and parametrized asyncio.to_thread offload verification for both code paths. All tests are pure mock-based with no real network calls, complying with the no-network-calls rule. |
Sequence Diagram
sequenceDiagram
participant Proxy as Proxy Request
participant PHook as pre_call_hook dispatcher<br/>(proxy/utils.py)
participant PIID as _OPTIONAL_PromptInjectionDetection
participant Thread as asyncio.to_thread<br/>(ThreadPoolExecutor)
participant DHook as during_call_hook dispatcher<br/>(proxy/utils.py)
Note over PHook,PIID: BEFORE: CustomLogger path (elif branch)<br/>AFTER: CustomGuardrail path (if branch) via _process_guardrail_callback
Proxy->>PHook: pre_call_hook(data)
PHook->>PIID: should_run_guardrail(pre_call) [NEW check]
PIID-->>PHook: True (default_on=True)
PHook->>PIID: async_pre_call_hook(data)
PIID->>Thread: check_user_input_similarity(user_input) [asyncio.to_thread — FIXED]
Thread-->>PIID: is_prompt_attack: bool
alt prompt attack detected
PIID-->>Proxy: raise HTTPException(400)
else safe input
PIID-->>PHook: data
end
Proxy->>DHook: during_call_hook(data)
Note over DHook,PIID: BEFORE: skipped (not CustomGuardrail)<br/>AFTER: now reachable — FIXED
DHook->>PIID: should_run_guardrail(during_call)
PIID-->>DHook: True (default_on=True)
DHook->>PIID: async_moderation_hook(data)
alt llm_api_check=True
PIID->>PIID: llm_router.acompletion(model, messages)
alt LLM flags injection
PIID-->>Proxy: raise HTTPException(400)
else safe
PIID-->>DHook: False
end
else llm_api_check not set
PIID-->>DHook: None (early return)
end
Last reviewed commit: "fix(security): fix p..."
| super().__init__( | ||
| guardrail_name="prompt_injection_detection", | ||
| default_on=True, | ||
| ) |
There was a problem hiding this comment.
disable_global_guardrail flag now silently bypasses prompt injection detection
Changing the base class to CustomGuardrail reroutes the pre-call dispatch through _process_guardrail_callback (lines 1369–1390 in proxy/utils.py), which calls callback.should_run_guardrail(data=data, event_type=pre_call) before invoking async_pre_call_hook.
With default_on=True, should_run_guardrail returns False when the request has disable_global_guardrail=True in its metadata. Before this PR, _OPTIONAL_PromptInjectionDetection was a CustomLogger and its async_pre_call_hook was invoked directly (via the elif branch at line 1392), meaning that flag had no effect on prompt injection checks.
This means any request that sets disable_global_guardrail=True will now completely skip the async_pre_call_hook for this guardrail — a silent security gap. This was not the case before. If this bypass is intentional it should be explicitly documented; if not, the should_run_guardrail result should be overridden (or default_on=False used with explicit guardrail registration).
Rule Used: What: avoid backwards-incompatible changes without... (source)
| super().__init__( | ||
| guardrail_name="prompt_injection_detection", | ||
| default_on=True, | ||
| ) |
There was a problem hiding this comment.
event_hook scope unrestricted — guardrail runs for all future event types
event_hook is not passed to super().__init__(), so it defaults to None. Combined with default_on=True, _event_hook_is_event_type in the base class returns True for every GuardrailEventHooks value. This means should_run_guardrail will return True for any new event type added in the future (e.g. post_call, pre_mcp_call, during_mcp_call, etc.) — running async_moderation_hook in contexts where prompt-injection detection is not meaningful.
Explicitly setting event_hook would constrain the scope and make intent clear:
| super().__init__( | |
| guardrail_name="prompt_injection_detection", | |
| default_on=True, | |
| ) | |
| super().__init__( | |
| guardrail_name="prompt_injection_detection", | |
| default_on=True, | |
| event_hook=[GuardrailEventHooks.pre_call, GuardrailEventHooks.during_call], | |
| ) |
| class _OPTIONAL_PromptInjectionDetection(CustomGuardrail): | ||
| # Class variables or attributes | ||
| def __init__( | ||
| self, | ||
| prompt_injection_params: Optional[LiteLLMPromptInjectionParams] = None, | ||
| ): | ||
| super().__init__( | ||
| guardrail_name="prompt_injection_detection", | ||
| default_on=True, | ||
| ) |
There was a problem hiding this comment.
llm_api_check=True users will now incur unexpected LLM costs — no opt-in flag
Before this PR async_moderation_hook was unreachable because during_call_hook in proxy/utils.py only processes CustomGuardrail instances (not plain CustomLogger). Switching the base class directly enables the LLM check for all users who have llm_api_check=True configured, with no feature flag to opt in.
Per this repository's backward-compatibility rule, behavior-changing fixes that can introduce unexpected costs or latency for existing users should be gated behind a feature flag (e.g. litellm.enable_prompt_injection_llm_moderation_hook = False by default, or a proxy config toggle). As written, any user with llm_api_check=True will start generating additional LLM API calls and costs immediately upon upgrading, with no way to stage the rollout.
Rule Used: What: avoid backwards-incompatible changes without... (source)
Supersedes #24284 (rebased onto latest main to resolve 267-commit drift and merge conflicts).
Summary
Fixes #19499
Two critical bugs in the built-in prompt injection detection feature:
Heuristics check blocks the event loop:
check_user_input_similarity()performs CPU-boundSequenceMatcheroperations (triple nested loop, O(n*m)) synchronously insideasync_pre_call_hook(). With large inputs this blocks the FastAPI event loop for 60-90s, causing K8s health probes to fail and pods to restart. Fix: offload toasyncio.to_thread().LLM API check never executes:
_OPTIONAL_PromptInjectionDetectionextendsCustomLogger, but the proxy dispatcher (during_call_hookinutils.py) only invokesasync_moderation_hook()forCustomGuardrailinstances — making thellm_api_checkcode path unreachable. Fix: change base class toCustomGuardrailand callsuper().__init__()withguardrail_name="prompt_injection_detection"anddefault_on=True.Breaking change note
Switching the base class to
CustomGuardrailwithdefault_on=Truemeans the proxy'sduring_call_hook()dispatcher will now invokeasync_moderation_hook()for every request where this guardrail is registered. For most users this is a no-op (async_moderation_hookreturnsNoneimmediately whenprompt_injection_params is Noneorllm_api_checkis not enabled). However, users who hadllm_api_check=Trueconfigured will now start seeing actual LLM API calls during the moderation phase — this was the intended behavior all along, but the previous bug silently skipped it. If you relied on the broken behavior (no LLM moderation despitellm_api_check=True), be aware that enabling this fix will introduce additional latency and LLM API costs per request.Test plan
test_inherits_from_custom_guardrail— verifies the class is now aCustomGuardrailinstancetest_dispatch_reachable_via_should_run_guardrail— verifiesshould_run_guardrailreturnsTrueforpre_callandduring_callevents (not justisinstance)test_heuristics_check_does_not_block_event_loop— parametrized test verifying both code paths (heuristics_check=Truebranch andelsebranch) offload toasyncio.to_threadtest_acompletion_call_type_rejects_prompt_injection,test_acompletion_call_type_allows_safe_prompt) continue to pass