fix(security): fix prompt injection detection — async heuristics + inheritance chain by xr843 · Pull Request #24346 · BerriAI/litellm

xr843 · 2026-03-22T07:59:13Z

Supersedes #24284 (rebased onto latest main to resolve 267-commit drift and merge conflicts).

Summary

Fixes #19499

Two critical bugs in the built-in prompt injection detection feature:

Heuristics check blocks the event loop: check_user_input_similarity() performs CPU-bound SequenceMatcher operations (triple nested loop, O(n*m)) synchronously inside async_pre_call_hook(). With large inputs this blocks the FastAPI event loop for 60-90s, causing K8s health probes to fail and pods to restart. Fix: offload to asyncio.to_thread().
LLM API check never executes: _OPTIONAL_PromptInjectionDetection extends CustomLogger, but the proxy dispatcher (during_call_hook in utils.py) only invokes async_moderation_hook() for CustomGuardrail instances — making the llm_api_check code path unreachable. Fix: change base class to CustomGuardrail and call super().__init__() with guardrail_name="prompt_injection_detection" and default_on=True.

Breaking change note

Switching the base class to CustomGuardrail with default_on=True means the proxy's during_call_hook() dispatcher will now invoke async_moderation_hook() for every request where this guardrail is registered. For most users this is a no-op (async_moderation_hook returns None immediately when prompt_injection_params is None or llm_api_check is not enabled). However, users who had llm_api_check=True configured will now start seeing actual LLM API calls during the moderation phase — this was the intended behavior all along, but the previous bug silently skipped it. If you relied on the broken behavior (no LLM moderation despite llm_api_check=True), be aware that enabling this fix will introduce additional latency and LLM API costs per request.

Test plan

test_inherits_from_custom_guardrail — verifies the class is now a CustomGuardrail instance
test_dispatch_reachable_via_should_run_guardrail — verifies should_run_guardrail returns True for pre_call and during_call events (not just isinstance)
test_heuristics_check_does_not_block_event_loop — parametrized test verifying both code paths (heuristics_check=True branch and else branch) offload to asyncio.to_thread
Existing tests (test_acompletion_call_type_rejects_prompt_injection, test_acompletion_call_type_allows_safe_prompt) continue to pass

…heritance chain Fixes BerriAI#19499 Two critical bugs in prompt injection detection: 1. Heuristics check blocks event loop: check_user_input_similarity() is a CPU-bound O(n*m) SequenceMatcher operation called synchronously inside async_pre_call_hook(). This blocks the FastAPI event loop for 60-90s on large inputs, causing K8s health probes to fail and pods to restart. Fix: offload to asyncio.to_thread(). 2. LLM API check never executes: _OPTIONAL_PromptInjectionDetection extends CustomLogger, but the proxy dispatcher (during_call_hook in utils.py) only invokes async_moderation_hook() for CustomGuardrail instances. The llm_api_check code path was therefore unreachable. Fix: change base class to CustomGuardrail and call super().__init__() with guardrail_name and default_on=True. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel · 2026-03-22T07:59:19Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 22, 2026 8:01am

codspeed-hq · 2026-03-22T08:01:19Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing xr843:fix/prompt-injection-detection-v2 (f78fac6) with main (c89496f)}

greptile-apps · 2026-03-22T08:04:05Z

Greptile Summary

This PR fixes two genuine bugs in _OPTIONAL_PromptInjectionDetection: the CPU-bound SequenceMatcher loop was blocking the FastAPI event loop (now offloaded via asyncio.to_thread), and the llm_api_check code path was unreachable because the class extended CustomLogger instead of CustomGuardrail (the during_call_hook dispatcher only processes CustomGuardrail instances). The core logic and test coverage are sound, but the inheritance change carries two under-documented side effects.

asyncio.to_thread fix: Both branches that call check_user_input_similarity (heuristics-check and else/default) are correctly offloaded. Tests verify both paths.
Inheritance fix: CustomGuardrail base class correctly routes async_moderation_hook for during_call. Tests verify isinstance and should_run_guardrail return values.
disable_global_guardrail bypass (P1): Before this PR, async_pre_call_hook was dispatched via the CustomLogger elif branch in proxy/utils.py which does not check should_run_guardrail. Now it goes through _process_guardrail_callback, which returns early when disable_global_guardrail=True is present in request metadata. Requests with this flag will silently bypass prompt injection detection after this change.
Backward-incompatible llm_api_check activation (P1): Users who previously configured llm_api_check=True will now see actual LLM API calls and costs during the during_call phase, with no feature flag to stage the rollout. The PR acknowledges this but does not provide opt-in/opt-out controls as recommended by the repository's backward-compatibility rule.
Unrestricted event_hook scope (P2): Passing event_hook=None with default_on=True means should_run_guardrail returns True for all present and future GuardrailEventHooks values. Explicitly setting event_hook=[GuardrailEventHooks.pre_call, GuardrailEventHooks.during_call] would tighten the scope.

Confidence Score: 3/5

Fixes are technically correct but the inheritance change introduces two undocumented behavioral side effects that should be addressed before merging.
The asyncio.to_thread fix and CustomGuardrail inheritance fix are both correct and well-tested. However, the rerouting of async_pre_call_hook through _process_guardrail_callback silently allows requests with disable_global_guardrail=True to bypass prompt injection detection entirely (a new security gap not present before), and the default_on=True change activates llm_api_check LLM calls for users who previously had broken-but-silent behavior, with no feature flag for gradual rollout.
litellm/proxy/hooks/prompt_injection_detection.py — specifically the super().init() call and its interaction with the proxy dispatch logic in proxy/utils.py

Important Files Changed

Filename	Overview
litellm/proxy/hooks/prompt_injection_detection.py	Base class changed from CustomLogger to CustomGuardrail (fixing llm_api_check dispatch), and check_user_input_similarity now offloaded to asyncio.to_thread — but default_on=True without event_hook restriction makes the guardrail run for all event types, and the dispatch reroute silently lets disable_global_guardrail bypass the pre-call injection check.
tests/test_litellm/proxy/hooks/test_prompt_injection_detection.py	Three new unit tests added: isinstance check for CustomGuardrail inheritance, should_run_guardrail return value for during_call, and parametrized asyncio.to_thread offload verification for both code paths. All tests are pure mock-based with no real network calls, complying with the no-network-calls rule.

Sequence Diagram

sequenceDiagram
    participant Proxy as Proxy Request
    participant PHook as pre_call_hook dispatcher<br/>(proxy/utils.py)
    participant PIID as _OPTIONAL_PromptInjectionDetection
    participant Thread as asyncio.to_thread<br/>(ThreadPoolExecutor)
    participant DHook as during_call_hook dispatcher<br/>(proxy/utils.py)

    Note over PHook,PIID: BEFORE: CustomLogger path (elif branch)<br/>AFTER: CustomGuardrail path (if branch) via _process_guardrail_callback

    Proxy->>PHook: pre_call_hook(data)
    PHook->>PIID: should_run_guardrail(pre_call) [NEW check]
    PIID-->>PHook: True (default_on=True)
    PHook->>PIID: async_pre_call_hook(data)
    PIID->>Thread: check_user_input_similarity(user_input) [asyncio.to_thread — FIXED]
    Thread-->>PIID: is_prompt_attack: bool
    alt prompt attack detected
        PIID-->>Proxy: raise HTTPException(400)
    else safe input
        PIID-->>PHook: data
    end

    Proxy->>DHook: during_call_hook(data)
    Note over DHook,PIID: BEFORE: skipped (not CustomGuardrail)<br/>AFTER: now reachable — FIXED
    DHook->>PIID: should_run_guardrail(during_call)
    PIID-->>DHook: True (default_on=True)
    DHook->>PIID: async_moderation_hook(data)
    alt llm_api_check=True
        PIID->>PIID: llm_router.acompletion(model, messages)
        alt LLM flags injection
            PIID-->>Proxy: raise HTTPException(400)
        else safe
            PIID-->>DHook: False
        end
    else llm_api_check not set
        PIID-->>DHook: None (early return)
    end

_{Last reviewed commit: "fix(security): fix p..."}

greptile-apps · 2026-03-22T08:04:08Z

litellm/proxy/hooks/prompt_injection_detection.py

+        super().__init__(
+            guardrail_name="prompt_injection_detection",
+            default_on=True,
+        )


disable_global_guardrail flag now silently bypasses prompt injection detection

Changing the base class to CustomGuardrail reroutes the pre-call dispatch through _process_guardrail_callback (lines 1369–1390 in proxy/utils.py), which calls callback.should_run_guardrail(data=data, event_type=pre_call) before invoking async_pre_call_hook.

With default_on=True, should_run_guardrail returns False when the request has disable_global_guardrail=True in its metadata. Before this PR, _OPTIONAL_PromptInjectionDetection was a CustomLogger and its async_pre_call_hook was invoked directly (via the elif branch at line 1392), meaning that flag had no effect on prompt injection checks.

This means any request that sets disable_global_guardrail=True will now completely skip the async_pre_call_hook for this guardrail — a silent security gap. This was not the case before. If this bypass is intentional it should be explicitly documented; if not, the should_run_guardrail result should be overridden (or default_on=False used with explicit guardrail registration).

Rule Used: What: avoid backwards-incompatible changes without... (source)

greptile-apps · 2026-03-22T08:04:09Z

litellm/proxy/hooks/prompt_injection_detection.py

+        super().__init__(
+            guardrail_name="prompt_injection_detection",
+            default_on=True,
+        )


event_hook scope unrestricted — guardrail runs for all future event types

event_hook is not passed to super().__init__(), so it defaults to None. Combined with default_on=True, _event_hook_is_event_type in the base class returns True for every GuardrailEventHooks value. This means should_run_guardrail will return True for any new event type added in the future (e.g. post_call, pre_mcp_call, during_mcp_call, etc.) — running async_moderation_hook in contexts where prompt-injection detection is not meaningful.

Explicitly setting event_hook would constrain the scope and make intent clear:

Suggested change

super().__init__(

guardrail_name="prompt_injection_detection",

default_on=True,

)

super().__init__(

guardrail_name="prompt_injection_detection",

default_on=True,

event_hook=[GuardrailEventHooks.pre_call, GuardrailEventHooks.during_call],

)

greptile-apps · 2026-03-22T08:04:10Z

litellm/proxy/hooks/prompt_injection_detection.py

+class _OPTIONAL_PromptInjectionDetection(CustomGuardrail):
    # Class variables or attributes
    def __init__(
        self,
        prompt_injection_params: Optional[LiteLLMPromptInjectionParams] = None,
    ):
+        super().__init__(
+            guardrail_name="prompt_injection_detection",
+            default_on=True,
+        )


llm_api_check=True users will now incur unexpected LLM costs — no opt-in flag

Before this PR async_moderation_hook was unreachable because during_call_hook in proxy/utils.py only processes CustomGuardrail instances (not plain CustomLogger). Switching the base class directly enables the LLM check for all users who have llm_api_check=True configured, with no feature flag to opt in.

Per this repository's backward-compatibility rule, behavior-changing fixes that can introduce unexpected costs or latency for existing users should be gated behind a feature flag (e.g. litellm.enable_prompt_injection_llm_moderation_hook = False by default, or a proxy config toggle). As written, any user with llm_api_check=True will start generating additional LLM API calls and costs immediately upon upgrading, with no way to stage the rollout.

Rule Used: What: avoid backwards-incompatible changes without... (source)

xr843 mentioned this pull request Mar 22, 2026

fix(security): fix prompt injection detection — async heuristics + inheritance chain #24284

Closed

5 tasks

vercel bot deployed to Preview March 22, 2026 08:01 View deployment

greptile-apps bot reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(security): fix prompt injection detection — async heuristics + inheritance chain#24346

fix(security): fix prompt injection detection — async heuristics + inheritance chain#24346
xr843 wants to merge 1 commit intoBerriAI:mainfrom
xr843:fix/prompt-injection-detection-v2

xr843 commented Mar 22, 2026

Uh oh!

vercel bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 22, 2026

Uh oh!

greptile-apps bot commented Mar 22, 2026

Important Files Changed

Uh oh!

greptile-apps bot Mar 22, 2026

Uh oh!

greptile-apps bot Mar 22, 2026

Uh oh!

greptile-apps bot Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

xr843 commented Mar 22, 2026

Summary

Breaking change note

Test plan

Uh oh!

vercel bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 22, 2026

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 22, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 22, 2026 •

edited

Loading