Skip to content

fix(presidio): PII token round-trip masking/unmasking for Anthropic native API#23037

Closed
firestaerter3 wants to merge 8 commits intoBerriAI:mainfrom
firestaerter3:fix/presidio-pii-roundtrip
Closed

fix(presidio): PII token round-trip masking/unmasking for Anthropic native API#23037
firestaerter3 wants to merge 8 commits intoBerriAI:mainfrom
firestaerter3:fix/presidio-pii-roundtrip

Conversation

@firestaerter3
Copy link

Problem

Presidio PII masking works on the input path but original values are never restored in responses. Affects both streaming and non-streaming Anthropic native API calls (/v1/messages).

Root causes fixed:

  1. Position bug in anonymize_text — pii token originals were read from new_text[start:end] where start/end come from redacted_text["items"] (anonymized coordinate space). After the first replacement, new_text shifts and subsequent slices extract wrong characters → garbled unmasked output.

  2. Lost tokens across instances — LiteLLM creates two _OPTIONAL_PresidioPIIMasking instances per request (pre_call + post_call). self.pii_tokens on the masking instance is invisible to the unmasking instance.

  3. apply_guardrail always masks — no branch for unmasking when called with input_type="response".

  4. Native dict responses not handledasync_post_call_success_hook only handled ModelResponse. Anthropic native responses are AnthropicMessagesResponse TypedDicts (type:"message").

  5. Native SSE bytes not handledasync_post_call_streaming_iterator_hook only handled ModelResponseStream. Anthropic native streaming arrives as raw bytes.

Changes (all in presidio.py)

  • anonymize_text: fix position bug using analyze_results coordinates (original text) not redacted_text["items"] coordinates (anonymized text); store mapping in self.pii_tokens; remove now-unnecessary request_data parameter
  • apply_guardrail: copy self.pii_tokensrequest_data["pii_tokens"] after masking; strip OpenAI-format tool keys; add early-return unmask path for input_type="response"
  • async_post_call_success_hook: read pii_tokens from request_data first; add elif branch for Anthropic native dict (type:"message")
  • async_post_call_streaming_iterator_hook: buffer all chunks; detect bytes vs ModelResponseStream; parse/unmask/rebuild Anthropic native SSE in-place preserving event structure; existing OpenAI path unchanged

Test plan

  • Non-streaming OpenAI format: PII masked in request, restored in response
  • Non-streaming Anthropic native (/v1/messages): PII masked in request, restored in dict response
  • Streaming OpenAI format: PII masked in request, restored in ModelResponseStream chunks
  • Streaming Anthropic native (/v1/messages with stream:true): PII masked in request, restored in SSE bytes
  • Multiple PII entities in same message: all restored correctly (position fix)

@vercel
Copy link

vercel bot commented Mar 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 8, 2026 11:26am

Request Review

@CLAassistant
Copy link

CLAassistant commented Mar 7, 2026

CLA assistant check
All committers have signed the CLA.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR fixes five related root causes that prevented Presidio PII tokens from being restored in Anthropic native API responses: a position offset bug in anonymize_text, lost pii_tokens across the two guardrail instances used per request, a missing unmask branch in apply_guardrail, and unhandled AnthropicMessagesResponse dict and raw SSE bytes in the post-call hooks. All changes are confined to presidio.py.

Key concerns:

  • Incorrect fallback in anonymize_text (line 526): when no matching analyze_results entry is found, new_text[start:end] slices the anonymized text, storing Presidio's own replacement token (e.g. <PHONE_NUMBER>) as the "original value" — so unmask silently replaces the UUID-suffixed token with the Presidio token, not the actual PII value.
  • FALLBACK 2 false-positive risk (_unmask_pii_text, line 954–961): searching for the bare UUID suffix (e.g. _550e8400-e29) using a global str.replace can corrupt LLM responses that legitimately contain UUID-like strings in tool outputs or generated content.
  • Provider-specific SSE format hardcoded outside llms/ (lines 1169–1234): Anthropic's content_block_delta / text_delta event vocabulary is embedded directly in the guardrail file, violating the project policy of keeping provider-specific logic in litellm/llms/.
  • No tests added: all five test-plan items remain unchecked and the changeset contains no new test code, making it impossible to verify the fixes or guard against regressions.

Confidence Score: 2/5

  • Not safe to merge — one logic bug can silently corrupt unmasked output, the FALLBACK 2 heuristic can produce false positives in tool-heavy responses, and there are no tests.
  • The PR fixes real and clearly described bugs, and the core algorithmic changes (right-to-left sort, UUID-inside-brackets, request_data token sharing) are sound. However the incorrect new_text[start:end] fallback can silently store the wrong "original value", FALLBACK 2 can corrupt non-PII content that contains UUID-like strings, Anthropic-specific SSE parsing violates the project's provider separation policy, and the complete absence of tests means none of the stated fixes are verifiably correct.
  • litellm/proxy/guardrails/guardrail_hooks/presidio.py — specifically anonymize_text (line 526 fallback), _unmask_pii_text (FALLBACK 2, lines 954–961), and the Anthropic native SSE path (lines 1169–1234).

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/presidio.py Large multi-fix PR adding Anthropic native API PII round-trip unmasking: fixes position bug in anonymize_text, cross-instance token sharing, apply_guardrail unmask path, and non-streaming/streaming Anthropic native response handling. Issues found: incorrect fallback value in anonymize_text stores Presidio token instead of original text; FALLBACK 2 in _unmask_pii_text can produce false positives on UUID-like substrings in tool outputs; Anthropic-specific SSE format hardcoded outside llms/ violating project policy; no tests added despite all test-plan items being unchecked.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLM as LiteLLM Proxy
    participant Presidio as Presidio Guardrail
    participant LLM as Anthropic / OpenAI

    Client->>LiteLLM: POST /v1/messages (contains PII)
    LiteLLM->>Presidio: async_pre_call_hook / apply_guardrail(request)
    Presidio->>Presidio: analyze_text() → analyze_results
    Presidio->>Presidio: anonymize_text()<br/>• sort items right-to-left<br/>• pair with analyze_results for original values<br/>• build UUID-suffixed tokens<br/>• store in self.pii_tokens
    Presidio->>LiteLLM: apply_guardrail copies pii_tokens → request_data["pii_tokens"]
    LiteLLM->>LLM: Masked request (PII replaced with tokens)

    alt Non-streaming response
        LLM-->>LiteLLM: ModelResponse / AnthropicMessagesResponse dict
        LiteLLM->>Presidio: async_post_call_success_hook
        Presidio->>Presidio: _unmask_pii_text()<br/>• exact match<br/>• FALLBACK 1: strip angle brackets<br/>• FALLBACK 2: UUID suffix only<br/>• FALLBACK 3: truncated token
        Presidio-->>LiteLLM: Unmasked response
    else Streaming response (OpenAI format)
        LLM-->>LiteLLM: ModelResponseStream chunks
        LiteLLM->>Presidio: async_post_call_streaming_iterator_hook
        Presidio->>Presidio: Buffer → assemble → unmask → re-stream
        Presidio-->>LiteLLM: Unmasked ModelResponseStream chunks
    else Streaming response (Anthropic native SSE bytes)
        LLM-->>LiteLLM: bytes SSE chunks
        LiteLLM->>Presidio: async_post_call_streaming_iterator_hook
        Presidio->>Presidio: Buffer bytes → decode → parse SSE events<br/>→ extract text_delta fragments<br/>→ unmask full text<br/>→ redistribute across events<br/>→ re-encode per-event
        Presidio-->>LiteLLM: Unmasked SSE bytes (per-event)
    end

    LiteLLM-->>Client: Unmasked response
Loading

Comments Outside Diff (1)

  1. litellm/proxy/guardrails/guardrail_hooks/presidio.py, line 1-10 (link)

    No automated tests added for new functionality

    The PR description lists five test-plan items (non-streaming OpenAI, non-streaming Anthropic native, streaming OpenAI, streaming Anthropic native, multiple PII entities) but all checkboxes are unchecked and no new test file is included in the changeset. The project's test file for Presidio already exists at tests/test_litellm/proxy/guardrails/guardrail_hooks/test_presidio.py and uses mocks, so network access is not required.

    Per the project guideline, a PR claiming to fix issues should include evidence that the issue is resolved (passing tests or an explicit description of the verification performed). The four new code paths — Anthropic native dict unmasking, Anthropic native SSE streaming unmasking, the apply_guardrail response path, and the anonymize_text position fix — each need at least a unit test before merging to prevent regressions.

    Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

Last reviewed commit: 86f940b

# When output_parse_pii is False, new_text == redacted_text["text"]
# because no UUID suffix is appended.
return new_text
return redacted_text["text"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UUID-suffixed tokens are built but then discarded

The function carefully builds new_text with potentially UUID-suffixed replacement tokens (lines 496–514) and stores those suffixed tokens as keys in self.pii_tokens. However, line 520 returns redacted_text["text"] — Presidio's original output — which never contains those UUID-suffixed tokens.

This means when the LLM echoes back <PHONE_NUMBER> (from Presidio's output), _unmask_pii_text searches for <PHONE_NUMBER>some-uuid-here> (the key stored in self.pii_tokens), finds nothing, and leaves the second entity permanently masked.

When two or more entities of the same type (e.g. two phone numbers, both producing <PHONE_NUMBER>) appear in a message, only the first can ever be correctly unmasked.

The fix is to return new_text instead:

Suggested change
return redacted_text["text"]
return new_text

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_text is not discarded — it is returned on line 527 (return new_text, ...). The function builds UUID-suffixed tokens into new_text and returns that version, not redacted_text["text"] (which has bare tokens like <PHONE_NUMBER> without UUID suffixes). This is intentional: the UUID suffix makes each token unique so str.replace() during unmasking replaces the correct occurrence.

Comment on lines +1197 to +1200
# 6. Rebuild SSE and yield as bytes
rebuilt = "\n\n".join("\n".join(pe["lines"]) for pe in parsed)
yield rebuilt.encode("utf-8")
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entire reassembled SSE stream is yielded as a single chunk

The code buffers all incoming chunks (lines 1125–1131), parses and unmaskes them, then rebuilds the entire SSE stream and yields it as one bytes object (lines 1198–1199).

Any streaming client (UI, SDK, or proxy layer) that expects individual SSE events will receive all data in a single large chunk. This changes the incremental delivery contract that streaming clients rely on and could cause buffering issues until the entire response is received, defeating the purpose of streaming.

A straightforward fix is to yield each rebuilt event individually:

# 6. Rebuild SSE and yield per-event as bytes
for pe in parsed:
    event_str = "\n".join(pe["lines"])
    if event_str.strip():
        yield (event_str + "\n\n").encode("utf-8")
return

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the no-change early-exit path now yields the original chunks individually instead of the concatenated blob. The per-event yield path (step 6) was already correct.


if is_anthropic_native:
# --- Anthropic native SSE unmask path ---
import json as _json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import json as _json is placed inside the hot async_post_call_streaming_iterator_hook function body (line 1138). The json module is already imported at the file's module level (line 12), and Python caches imports after the first load. Moving the import to the top-level imports block and using the standard module reference will improve clarity without any performance cost.

Suggested change
import json as _json
# Remove line 1138: import json as _json
# Add to module-level imports (already present at line 12): import json
# Then replace _json references with json in the streaming hook

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a33b7ae — removed the _json alias entirely and use json directly.

Comment on lines +504 to +511
for _ar_i, _ar in enumerate(_sorted_ar):
if _ar_i not in _ar_used and _ar_val(_ar, "entity_type") == item.get("entity_type"):
_s = _ar_val(_ar, "start")
_e = _ar_val(_ar, "end")
if _s is not None and _e is not None:
_orig_val = text[_s:_e]
_ar_used.add(_ar_i)
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entity-type matching logic in _sorted_ar does not account for the possibility of multiple PII entities of the same type arriving in redacted_text["items"] in a different order than _sorted_ar (the analyzer results sorted right-to-left).

Failure scenario:

  • Input contains two phone numbers: A at position 10, B at position 26 (both entity_type="PHONE_NUMBER")
  • _sorted_ar (right-to-left): [B (start=26), A (start=10)]
  • redacted_text["items"] (order from anonymizer): [A, B] or any other order
  • When processing item A from redacted_text["items"], the inner loop matches against _sorted_ar[0] first (which is B)
  • self.pii_tokens[replacement_for_A] is set to B's original value — incorrect

The fix is to match entities more precisely — not just by entity_type, but also by verifying position proximity or consuming _sorted_ar entries in the exact order they appear in redacted_text["items"] to ensure correct 1:1 pairing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _ar_used set handles this correctly. When multiple entities share the same type (e.g. two PHONE_NUMBER), the right-to-left sort order of both _sorted_ar and _sorted_items ensures they are consumed in matching pairs. Each consumed _sorted_ar index is added to _ar_used so it is never reused. This is tested with the multi-entity round-trip test case (two names + two phone numbers in a single prompt).

Comment on lines 489 to 514
for item in redacted_text["items"]:
start = item["start"]
end = item["end"]
replacement = item["text"] # replacement token
if item["operator"] == "replace" and output_parse_pii is True:
# check if token in dict
# if exists, add a uuid to the replacement token for swapping back to the original text in llm response output parsing
if request_data is None:
verbose_proxy_logger.warning(
"Presidio anonymize_text called without request_data — "
"PII tokens cannot be stored per-request. "
"This may indicate a missing caller update."
)
request_data = {}
if "pii_tokens" not in request_data:
request_data["pii_tokens"] = {}
pii_tokens = request_data["pii_tokens"]

# Always append a UUID to ensure the replacement token is unique to this request and session.
# This prevents collisions where the LLM might hallucinate a generic token like [PHONE_NUMBER].
# Always append UUID to ensure uniqueness — prevents
# str.replace() collisions when multiple entities share
# the same type (e.g. two <PHONE_NUMBER> tokens).
replacement = f"{replacement}_{str(uuid.uuid4())[:12]}"

pii_tokens[replacement] = new_text[
start:end
] # get text it'll replace
# Use analyze_results (original text positions), not item
# positions (anonymized text). Presidio's anonymizer returns
# start/end in the *anonymized* coordinate space, not the
# original. Match by entity_type consuming right-to-left.
_orig_val = None
for _ar_i, _ar in enumerate(_sorted_ar):
if _ar_i not in _ar_used and _ar_val(_ar, "entity_type") == item.get("entity_type"):
_s = _ar_val(_ar, "start")
_e = _ar_val(_ar, "end")
if _s is not None and _e is not None:
_orig_val = text[_s:_e]
_ar_used.add(_ar_i)
break
self.pii_tokens[replacement] = _orig_val if _orig_val is not None else new_text[start:end]

new_text = new_text[:start] + replacement + new_text[end:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The in-place string substitution at line 514 (new_text = new_text[:start] + replacement + new_text[end:]) is only position-safe when items are processed strictly right-to-left (descending start order).

The code comment at lines 477–478 asserts that "redacted_text["items"] is also right-to-left" per Presidio's behavior, but this is never enforced — there is no explicit sort of the loop's items before processing. If Presidio's anonymizer returns items in any other order, position offsets will drift after the first replacement, resulting in garbled masked tokens.

Example:

  • new_text = "Hello <PHONE> World <EMAIL>" (lengths: 5, 5)
  • Items left-to-right: [Item at offset 6 (PHONE→PHONE_UUID), Item at offset 20 (EMAIL→EMAIL_UUID)]
  • After Item 1: new_text length increases. Item 2's offset 20 no longer points to EMAIL.

The loop should explicitly sort redacted_text["items"] by descending start before processing, matching the defensive pattern already applied to _sorted_ar.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — this is exactly why _sorted_items is sorted right-to-left (key=lambda x: -(x.get("start") or 0)). Processing from the end of the string backwards means earlier positions are never invalidated by substitutions at later positions. This is a standard technique for in-place string edits by position.

…pii_tokens

anonymize_text() read pii token original values from new_text[start:end]
where start/end are from redacted_text["items"] — these positions are in
the anonymized coordinate space. After the first replacement, new_text
shifts and subsequent slices extract wrong characters, causing garbled
unmasked output.

Fix: sort analyze_results right-to-left, match each anonymizer item to an
analyzer result by entity_type, and read text[orig_start:orig_end] from
the original text (pre-anonymization). Store the mapping in self.pii_tokens
(removing the now-unnecessary request_data parameter).
Two _OPTIONAL_PresidioPIIMasking instances are created per request (one
pre_call, one post_call). self.pii_tokens on the masking instance is not
visible on the unmasking instance — tokens were silently lost.

Fix: after masking, copy self.pii_tokens into request_data["pii_tokens"]
which is the shared context dict passed through the full call lifecycle.

Also strip OpenAI-format tool keys from the apply_guardrail return value
to prevent native Anthropic tool definitions being overwritten.
apply_guardrail() previously always masked regardless of input_type.
When called with input_type="response" and output_parse_pii=True, restore
original values from request_data["pii_tokens"] instead of masking.
…ccess hook

async_post_call_success_hook only handled ModelResponse. Anthropic native
API responses arrive as AnthropicMessagesResponse TypedDicts (type:"message")
and passed through without PII unmasking.

Add elif branch iterating content blocks for type:"text". Also read
pii_tokens from request_data first (stored during pre_call masking).
…r hook

async_post_call_streaming_iterator_hook only handled ModelResponseStream.
Anthropic native SSE (/v1/messages stream:true) arrives as raw bytes —
passed through without PII unmasking.

Buffer all chunks and detect format via isinstance(chunk, bytes). For
Anthropic native: join bytes, parse SSE events, locate text_delta events,
assemble full text, unmask tokens, redistribute across original events
preserving structure, rebuild and yield as bytes. For OpenAI format:
existing ModelResponseStream path unchanged.
Comment on lines 906 to 914
@@ -899,6 +912,14 @@ async def async_post_call_success_hook( # type: ignore
request_data=data,
mode="unmask",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_pii_tokens fallback not applied to the OpenAI/ModelResponse unmask path

_pii_tokens is computed with a self.pii_tokens fallback on line 906, but it is only used in the Anthropic native elif branch (line 922). The _process_response_for_pii call on line 910 ignores _pii_tokens entirely — it internally calls request_data.get("pii_tokens", {}) with no fallback to self.pii_tokens.

This means that if request_data["pii_tokens"] is somehow absent (e.g. the pre-call apply_guardrail path was not hit), the OpenAI/ModelResponse path silently skips unmasking while the Anthropic native path would still attempt to unmask via the self.pii_tokens fallback. Both paths should use the same token resolution strategy. You could pass _pii_tokens to _process_response_for_pii, or pre-populate data["pii_tokens"] with _pii_tokens before calling either branch:

_pii_tokens = data.get("pii_tokens") or self.pii_tokens
if _pii_tokens:
    data["pii_tokens"] = _pii_tokens  # ensure _process_response_for_pii sees the same tokens
if isinstance(response, ModelResponse) and not isinstance(
    response.choices[0], StreamingChoices
):
    await self._process_response_for_pii(
        response=response,
        request_data=data,
        mode="unmask",
    )
elif (
    isinstance(response, dict)
    and response.get("type") == "message"
    and response.get("role") == "assistant"
):
    for block in response.get("content") or []:
        if isinstance(block, dict) and block.get("type") == "text":
            block["text"] = self._unmask_pii_text(block.get("text", ""), _pii_tokens)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — _process_response_for_pii was only reading request_data.get("pii_tokens") without the self.pii_tokens fallback that the caller computes. Fixed: the helper now falls back to self.pii_tokens when request_data has no tokens.

Comment on lines +1181 to +1184
# 4. If nothing changed, yield original bytes unchanged
if unmasked == full_text:
yield raw
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffered stream yielded as a single chunk in the no-change early-exit path

When unmasked == full_text (no PII tokens were found in the response text), raw — the entire buffered SSE stream joined into one bytes object — is yielded as a single chunk. This means streaming clients still receive the entire response body at once rather than incrementally, even though no unmasking was performed.

Since the original all_chunks list still holds the individual chunk boundaries, yielding them directly preserves the original streaming granularity:

# 4. If nothing changed, yield original chunks unchanged
if unmasked == full_text:
    for c in all_chunks:
        yield c
    return

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — same root cause as the other single-chunk comment. The early-exit path now yields per original chunk.

Comment on lines +1326 to +1329
# Strip keys that process_input_messages() converted to OpenAI format.
# Returning them overwrites native Anthropic tools with type:"function".
for _k in ("tools", "structured_messages", "model", "images"):
inputs.pop(_k, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently stripping tools/model/images keys may break existing callers

apply_guardrail is a public interface used by the UI and potentially other callers. Removing tools, structured_messages, model, and images from the returned inputs dict is a backwards-incompatible change: any caller that passes these keys in and expects them to be preserved in the return value will silently lose them after this PR.

Per the project's backwards-compatibility policy, behavioural changes to shared interfaces should be gated behind a feature flag rather than applied unconditionally. If this stripping is truly needed only for the Anthropic native tool case, consider narrowing the condition (e.g. only strip when the incoming request is detected as an Anthropic native call) or adding a guard flag.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These keys are stripped intentionally. When apply_guardrail is called for Anthropic native responses, request_data contains tools in Anthropic format (type: "auto", etc.) and model as a string. The guardrail framework passes request_data through litellm.completion() internally, which expects OpenAI-format tool defs (type: "function") and breaks on Anthropic-native ones. Stripping them prevents type-corruption errors downstream. images is stripped for the same reason — it is not a valid litellm.completion() kwarg and causes TypeError.

@@ -10,6 +10,7 @@

import asyncio
import json
import json as _json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why double import json?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — leftover from an earlier refactor where I aliased it to avoid a naming conflict. Removed in a33b7ae.

Address review feedback from @krrishdholakia — remove the redundant
`import json as _json` alias and use `json` directly throughout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +1185 to +1200
# 5. Redistribute unmasked text across original text_delta events
n = len(text_event_indices)
if n > 0:
chunk_size = max(1, len(unmasked) // n)
pos = 0
for k, evt_idx in enumerate(text_event_indices):
pe = parsed[evt_idx]
if k < n - 1:
fragment = unmasked[pos:pos + chunk_size]
pos += chunk_size
else:
fragment = unmasked[pos:]
pe["data"]["delta"]["text"] = fragment
pe["lines"][pe["data_idx"]] = "data: " + json.dumps(
pe["data"], ensure_ascii=False
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equal-size redistribution may split multi-byte / multi-codepoint sequences

The unmasked text is chopped into equal-size chunks using integer division:

chunk_size = max(1, len(unmasked) // n)

Because len(unmasked) operates on Unicode code points, a cut boundary could land in the middle of a surrogate pair, grapheme cluster, or combining-character sequence. Additionally, when the unmasked string is shorter than the number of text-delta events (e.g. len(unmasked) < n), chunk_size becomes 1, and the last event receives the rest; however the first len(unmasked) events each get a single character while the remaining events get empty strings. Some SSE clients treat an empty text delta differently from a missing one.

A safer redistribution would respect original fragment boundaries as much as possible — for example, only modifying events that originally contained a PII token — or at minimum ensure no empty-fragment events are emitted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — theoretically possible if a PII replacement contains multi-byte characters and the split lands mid-codepoint. In practice the tokens are ASCII UUID strings and the original values are typically Latin names/numbers, so the risk is very low. Will keep this in mind for a follow-up if it causes issues in the wild.

…fallback

- Fix SSE no-change early-exit to yield original chunks individually
  instead of concatenated blob (preserves streaming behavior)
- Add self.pii_tokens fallback in _process_response_for_pii so the
  OpenAI/ModelResponse unmask path is consistent with the Anthropic path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +482 to +483
def _ar_val(ar, key, default=None):
return ar.get(key, default) if isinstance(ar, dict) else getattr(ar, key, default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper function defined inside a hot method body

_ar_val is defined inside anonymize_text on every call. Since anonymize_text is called once per PII entity per message, this creates a new function object on each invocation. This is a minor performance concern in a hot path. Consider hoisting it to a module-level or class-level static helper to make it reusable and avoid the per-call overhead.

…stripping

LLMs interpret <TYPE> as XML/HTML markup and strip it, leaving orphaned
UUID fragments like `_8c971c39-ac2` in responses. Moving the UUID suffix
inside the angle brackets (`<TYPE_uuid>`) makes the whole token opaque.

Also adds fallback unmask layers for robustness:
1. Stripped angle brackets (PERSON_abc instead of <PERSON_abc>)
2. UUID-suffix-only match (_8c971c39-ac2)
3. Truncated token at text end (existing, restructured)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +517 to +526
_orig_val = None
for _ar_i, _ar in enumerate(_sorted_ar):
if _ar_i not in _ar_used and _ar_val(_ar, "entity_type") == item.get("entity_type"):
_s = _ar_val(_ar, "start")
_e = _ar_val(_ar, "end")
if _s is not None and _e is not None:
_orig_val = text[_s:_e]
_ar_used.add(_ar_i)
break
self.pii_tokens[replacement] = _orig_val if _orig_val is not None else new_text[start:end]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fallback value stores Presidio token, not original text

When no matching analyze_results entry is found for an entity (i.e. _orig_val is None), the fallback new_text[start:end] is used. At this point in execution, new_text is initialised to redacted_text["text"] — the anonymized text. The slice new_text[start:end] therefore extracts Presidio's own replacement token (e.g. <PHONE_NUMBER>) rather than the original PII value.

Storing the Presidio token as the "original value" means that when _unmask_pii_text runs, it replaces <PHONE_NUMBER_abc123> with <PHONE_NUMBER> instead of the actual phone number. The entity remains masked in the response, silently defeating the purpose of output_parse_pii=True.

The correct fallback is to slice the original text argument. Since _orig_val is only None when entity-type matching against analyze_results fails (which Presidio should not normally produce), the fix is:

self.pii_tokens[replacement] = _orig_val if _orig_val is not None else text[_ar_val(_sorted_ar[0], "start"):_ar_val(_sorted_ar[0], "end")]

Or, more defensively, just log a warning and skip storing the token rather than storing a useless placeholder:

if _orig_val is not None:
    self.pii_tokens[replacement] = _orig_val
else:
    verbose_proxy_logger.warning(
        "anonymize_text: could not find original value for entity_type=%s — token will not be unmasked",
        item.get("entity_type"),
    )

Comment on lines +954 to +961
# FALLBACK 2: LLM stripped type prefix, only UUID suffix remains
# e.g. <COMPANY_NAME_8c971c39-ac2> → _8c971c39-ac2
_uuid_idx = stripped.rfind("_")
if _uuid_idx > 0:
uuid_suffix = stripped[_uuid_idx:] # e.g. "_8c971c39-ac2"
if len(uuid_suffix) >= 10 and uuid_suffix in text:
text = text.replace(uuid_suffix, original_text)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FALLBACK 2 UUID suffix may match unrelated text segments

uuid_suffix = stripped[_uuid_idx:] produces strings like _550e8400-e29 (underscore + 12 UUID chars). The check len(uuid_suffix) >= 10 is trivially satisfied (always 13 chars) and does not guard against false positives.

If the LLM response contains tool-call output, file paths, or any other content that embeds a UUID-like string (e.g. a response from list_files returning reports_550e8400-e29.csv), text.replace(uuid_suffix, original_text) will substitute PII data into that unrelated segment, corrupting the response. Because str.replace is global, every occurrence of the suffix is replaced, so a single coincidence poisons all matches.

A more targeted guard is to anchor the search to the same bracketed/prefixed context the LLM is likely to have echoed, e.g. by checking that at least the entity-type prefix precedes the suffix, or by requiring the suffix appear as a whole token (bounded by non-alphanumeric characters) rather than as a bare substring:

import re
if re.search(r'(?<![A-Za-z0-9])' + re.escape(uuid_suffix) + r'(?![A-Za-z0-9>])', text):
    text = re.sub(r'(?<![A-Za-z0-9])' + re.escape(uuid_suffix) + r'(?![A-Za-z0-9>])', original_text, text)
    continue

Comment on lines +1169 to +1200
if is_anthropic_native:
# --- Anthropic native SSE unmask path ---
try:
# 1. Join all bytes into complete SSE stream
raw = b"".join(
c if isinstance(c, bytes) else str(c).encode()
for c in all_chunks
)
sse_text = raw.decode("utf-8", errors="replace")

# 2. Parse SSE events (double-newline separated)
events = sse_text.split("\n\n")
parsed = []
text_event_indices = []
text_fragments = []

for i, event_str in enumerate(events):
lines = event_str.split("\n")
entry = {"lines": lines, "data": None, "data_idx": None}
for j, line in enumerate(lines):
if line.startswith("data: "):
try:
data = json.loads(line[6:])
entry["data"] = data
entry["data_idx"] = j
if (data.get("type") == "content_block_delta"
and data.get("delta", {}).get("type") == "text_delta"):
text_event_indices.append(i)
text_fragments.append(data["delta"].get("text", ""))
except (json.JSONDecodeError, ValueError):
pass
parsed.append(entry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anthropic-specific SSE format hardcoded outside llms/ directory

The new Anthropic native SSE unmask path directly embeds Anthropic's streaming event vocabulary — content_block_delta, text_delta, type: "message" — inside a generic guardrail file under proxy/guardrails/guardrail_hooks/. This violates the project policy of keeping provider-specific logic in llms/ and risks becoming a long-term maintenance burden: any future changes to the Anthropic SSE wire format (e.g. new delta types like thinking_delta) would silently break unmasking without this file being in scope.

The canonical approach is to move the Anthropic-specific SSE parsing and reconstruction into a helper in litellm/llms/anthropic/ (or a shared pass-through utility already used by the Anthropic pass-through handler), and call that helper from the guardrail. The guardrail should only receive and return provider-agnostic content fragments.

Rule Used: What: Avoid writing provider-specific code outside... (source)

@firestaerter3
Copy link
Author

Superseded by #24290 (handler plumbing) + #24291 (presidio PII round-trip fixes). The new PRs address all Greptile feedback: comprehensive tests, no provider separation concerns, safer fallback logic, and true streaming unmask with carry buffer instead of full-stream buffering. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants