Skip to content

fix(security): fix prompt injection detection — async heuristics + inheritance chain#24346

Open
xr843 wants to merge 1 commit intoBerriAI:mainfrom
xr843:fix/prompt-injection-detection-v2
Open

fix(security): fix prompt injection detection — async heuristics + inheritance chain#24346
xr843 wants to merge 1 commit intoBerriAI:mainfrom
xr843:fix/prompt-injection-detection-v2

Conversation

@xr843
Copy link
Contributor

@xr843 xr843 commented Mar 22, 2026

Supersedes #24284 (rebased onto latest main to resolve 267-commit drift and merge conflicts).

Summary

Fixes #19499

Two critical bugs in the built-in prompt injection detection feature:

  • Heuristics check blocks the event loop: check_user_input_similarity() performs CPU-bound SequenceMatcher operations (triple nested loop, O(n*m)) synchronously inside async_pre_call_hook(). With large inputs this blocks the FastAPI event loop for 60-90s, causing K8s health probes to fail and pods to restart. Fix: offload to asyncio.to_thread().

  • LLM API check never executes: _OPTIONAL_PromptInjectionDetection extends CustomLogger, but the proxy dispatcher (during_call_hook in utils.py) only invokes async_moderation_hook() for CustomGuardrail instances — making the llm_api_check code path unreachable. Fix: change base class to CustomGuardrail and call super().__init__() with guardrail_name="prompt_injection_detection" and default_on=True.

Breaking change note

Switching the base class to CustomGuardrail with default_on=True means the proxy's during_call_hook() dispatcher will now invoke async_moderation_hook() for every request where this guardrail is registered. For most users this is a no-op (async_moderation_hook returns None immediately when prompt_injection_params is None or llm_api_check is not enabled). However, users who had llm_api_check=True configured will now start seeing actual LLM API calls during the moderation phase — this was the intended behavior all along, but the previous bug silently skipped it. If you relied on the broken behavior (no LLM moderation despite llm_api_check=True), be aware that enabling this fix will introduce additional latency and LLM API costs per request.

Test plan

  • test_inherits_from_custom_guardrail — verifies the class is now a CustomGuardrail instance
  • test_dispatch_reachable_via_should_run_guardrail — verifies should_run_guardrail returns True for pre_call and during_call events (not just isinstance)
  • test_heuristics_check_does_not_block_event_loop — parametrized test verifying both code paths (heuristics_check=True branch and else branch) offload to asyncio.to_thread
  • Existing tests (test_acompletion_call_type_rejects_prompt_injection, test_acompletion_call_type_allows_safe_prompt) continue to pass

…heritance chain

Fixes BerriAI#19499

Two critical bugs in prompt injection detection:

1. Heuristics check blocks event loop: check_user_input_similarity() is
   a CPU-bound O(n*m) SequenceMatcher operation called synchronously
   inside async_pre_call_hook(). This blocks the FastAPI event loop for
   60-90s on large inputs, causing K8s health probes to fail and pods to
   restart. Fix: offload to asyncio.to_thread().

2. LLM API check never executes: _OPTIONAL_PromptInjectionDetection
   extends CustomLogger, but the proxy dispatcher (during_call_hook in
   utils.py) only invokes async_moderation_hook() for CustomGuardrail
   instances. The llm_api_check code path was therefore unreachable.
   Fix: change base class to CustomGuardrail and call super().__init__()
   with guardrail_name and default_on=True.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 22, 2026 8:01am

Request Review

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 22, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing xr843:fix/prompt-injection-detection-v2 (f78fac6) with main (c89496f)

Open in CodSpeed

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This PR fixes two genuine bugs in _OPTIONAL_PromptInjectionDetection: the CPU-bound SequenceMatcher loop was blocking the FastAPI event loop (now offloaded via asyncio.to_thread), and the llm_api_check code path was unreachable because the class extended CustomLogger instead of CustomGuardrail (the during_call_hook dispatcher only processes CustomGuardrail instances). The core logic and test coverage are sound, but the inheritance change carries two under-documented side effects.

  • asyncio.to_thread fix: Both branches that call check_user_input_similarity (heuristics-check and else/default) are correctly offloaded. Tests verify both paths.
  • Inheritance fix: CustomGuardrail base class correctly routes async_moderation_hook for during_call. Tests verify isinstance and should_run_guardrail return values.
  • disable_global_guardrail bypass (P1): Before this PR, async_pre_call_hook was dispatched via the CustomLogger elif branch in proxy/utils.py which does not check should_run_guardrail. Now it goes through _process_guardrail_callback, which returns early when disable_global_guardrail=True is present in request metadata. Requests with this flag will silently bypass prompt injection detection after this change.
  • Backward-incompatible llm_api_check activation (P1): Users who previously configured llm_api_check=True will now see actual LLM API calls and costs during the during_call phase, with no feature flag to stage the rollout. The PR acknowledges this but does not provide opt-in/opt-out controls as recommended by the repository's backward-compatibility rule.
  • Unrestricted event_hook scope (P2): Passing event_hook=None with default_on=True means should_run_guardrail returns True for all present and future GuardrailEventHooks values. Explicitly setting event_hook=[GuardrailEventHooks.pre_call, GuardrailEventHooks.during_call] would tighten the scope.

Confidence Score: 3/5

  • Fixes are technically correct but the inheritance change introduces two undocumented behavioral side effects that should be addressed before merging.
  • The asyncio.to_thread fix and CustomGuardrail inheritance fix are both correct and well-tested. However, the rerouting of async_pre_call_hook through _process_guardrail_callback silently allows requests with disable_global_guardrail=True to bypass prompt injection detection entirely (a new security gap not present before), and the default_on=True change activates llm_api_check LLM calls for users who previously had broken-but-silent behavior, with no feature flag for gradual rollout.
  • litellm/proxy/hooks/prompt_injection_detection.py — specifically the super().init() call and its interaction with the proxy dispatch logic in proxy/utils.py

Important Files Changed

Filename Overview
litellm/proxy/hooks/prompt_injection_detection.py Base class changed from CustomLogger to CustomGuardrail (fixing llm_api_check dispatch), and check_user_input_similarity now offloaded to asyncio.to_thread — but default_on=True without event_hook restriction makes the guardrail run for all event types, and the dispatch reroute silently lets disable_global_guardrail bypass the pre-call injection check.
tests/test_litellm/proxy/hooks/test_prompt_injection_detection.py Three new unit tests added: isinstance check for CustomGuardrail inheritance, should_run_guardrail return value for during_call, and parametrized asyncio.to_thread offload verification for both code paths. All tests are pure mock-based with no real network calls, complying with the no-network-calls rule.

Sequence Diagram

sequenceDiagram
    participant Proxy as Proxy Request
    participant PHook as pre_call_hook dispatcher<br/>(proxy/utils.py)
    participant PIID as _OPTIONAL_PromptInjectionDetection
    participant Thread as asyncio.to_thread<br/>(ThreadPoolExecutor)
    participant DHook as during_call_hook dispatcher<br/>(proxy/utils.py)

    Note over PHook,PIID: BEFORE: CustomLogger path (elif branch)<br/>AFTER: CustomGuardrail path (if branch) via _process_guardrail_callback

    Proxy->>PHook: pre_call_hook(data)
    PHook->>PIID: should_run_guardrail(pre_call) [NEW check]
    PIID-->>PHook: True (default_on=True)
    PHook->>PIID: async_pre_call_hook(data)
    PIID->>Thread: check_user_input_similarity(user_input) [asyncio.to_thread — FIXED]
    Thread-->>PIID: is_prompt_attack: bool
    alt prompt attack detected
        PIID-->>Proxy: raise HTTPException(400)
    else safe input
        PIID-->>PHook: data
    end

    Proxy->>DHook: during_call_hook(data)
    Note over DHook,PIID: BEFORE: skipped (not CustomGuardrail)<br/>AFTER: now reachable — FIXED
    DHook->>PIID: should_run_guardrail(during_call)
    PIID-->>DHook: True (default_on=True)
    DHook->>PIID: async_moderation_hook(data)
    alt llm_api_check=True
        PIID->>PIID: llm_router.acompletion(model, messages)
        alt LLM flags injection
            PIID-->>Proxy: raise HTTPException(400)
        else safe
            PIID-->>DHook: False
        end
    else llm_api_check not set
        PIID-->>DHook: None (early return)
    end
Loading

Last reviewed commit: "fix(security): fix p..."

Comment on lines +35 to +38
super().__init__(
guardrail_name="prompt_injection_detection",
default_on=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 disable_global_guardrail flag now silently bypasses prompt injection detection

Changing the base class to CustomGuardrail reroutes the pre-call dispatch through _process_guardrail_callback (lines 1369–1390 in proxy/utils.py), which calls callback.should_run_guardrail(data=data, event_type=pre_call) before invoking async_pre_call_hook.

With default_on=True, should_run_guardrail returns False when the request has disable_global_guardrail=True in its metadata. Before this PR, _OPTIONAL_PromptInjectionDetection was a CustomLogger and its async_pre_call_hook was invoked directly (via the elif branch at line 1392), meaning that flag had no effect on prompt injection checks.

This means any request that sets disable_global_guardrail=True will now completely skip the async_pre_call_hook for this guardrail — a silent security gap. This was not the case before. If this bypass is intentional it should be explicitly documented; if not, the should_run_guardrail result should be overridden (or default_on=False used with explicit guardrail registration).

Rule Used: What: avoid backwards-incompatible changes without... (source)

Comment on lines +35 to +38
super().__init__(
guardrail_name="prompt_injection_detection",
default_on=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 event_hook scope unrestricted — guardrail runs for all future event types

event_hook is not passed to super().__init__(), so it defaults to None. Combined with default_on=True, _event_hook_is_event_type in the base class returns True for every GuardrailEventHooks value. This means should_run_guardrail will return True for any new event type added in the future (e.g. post_call, pre_mcp_call, during_mcp_call, etc.) — running async_moderation_hook in contexts where prompt-injection detection is not meaningful.

Explicitly setting event_hook would constrain the scope and make intent clear:

Suggested change
super().__init__(
guardrail_name="prompt_injection_detection",
default_on=True,
)
super().__init__(
guardrail_name="prompt_injection_detection",
default_on=True,
event_hook=[GuardrailEventHooks.pre_call, GuardrailEventHooks.during_call],
)

Comment on lines +29 to +38
class _OPTIONAL_PromptInjectionDetection(CustomGuardrail):
# Class variables or attributes
def __init__(
self,
prompt_injection_params: Optional[LiteLLMPromptInjectionParams] = None,
):
super().__init__(
guardrail_name="prompt_injection_detection",
default_on=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 llm_api_check=True users will now incur unexpected LLM costs — no opt-in flag

Before this PR async_moderation_hook was unreachable because during_call_hook in proxy/utils.py only processes CustomGuardrail instances (not plain CustomLogger). Switching the base class directly enables the LLM check for all users who have llm_api_check=True configured, with no feature flag to opt in.

Per this repository's backward-compatibility rule, behavior-changing fixes that can introduce unexpected costs or latency for existing users should be gated behind a feature flag (e.g. litellm.enable_prompt_injection_llm_moderation_hook = False by default, or a proxy config toggle). As written, any user with llm_api_check=True will start generating additional LLM API calls and costs immediately upon upgrading, with no way to stage the rollout.

Rule Used: What: avoid backwards-incompatible changes without... (source)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Prompt Injection Detection Issues

1 participant