Skip to content

Commit d91edc9

Browse files
authored
fix: auto-redact tool_calls[*].function.arguments (#588)
* fix: auto-redact tool_calls[*].function.arguments The un-redact path walked message.content / reasoning_content / reasoning but not tool_calls[*].function.arguments, so when a model emitted a tool call whose JSON arguments echoed the user's (now- redacted) PII, the placeholder leaked to the client unchanged. Caught while testing an agentic round-trip on staging: POST /v1/chat/completions x-auto-redact: on messages=[{user, "Send a thank-you to alice.chen@gmail.com ..."}] tools=[{send_email(to, subject, body)}] Provider (claude-sonnet-4-6) correctly received <email1><email2> in its prompt, but the resulting tool_calls[0].function.arguments came back as {"to":"<email1>",...} instead of {"to":"alice.chen@gmail",...}. Fix: - Non-streaming (unredact_chat_response_in_place): walk message.tool_calls[*].function.arguments and apply map.unredact. - Streaming (unredact_chunk_in_place + StreamUnredactStates): add a per-(choice_index, tool_call_index) sliding-tail state map so arguments JSON streamed in fragments handles split placeholders the same way content does. New e2e test auto_redact_unredacts_tool_call_arguments configures the MockProvider to emit a send_email tool call with <email1> in its arguments and asserts the client receives the substituted original. Known limitations (not addressed here; separate follow-ups): - privacy-filter sometimes splits one PII into multiple adjacent spans (alice.chen@gmail.com -> <email1><email2>), and a model that treats placeholders as separate values may issue multiple tool calls (one per placeholder). This is a privacy-filter span-merging concern, not an un-redact bug. - End-of-stream flush for tool_call_arguments tails is not yet emitted as a synthetic chunk (the existing flush covers content fields). Most tool-call args flows complete cleanly without mid- token truncation, so deferring to a follow-up. * fix: JSON-escape replacements + flush tool_call_arguments tail Addresses claude-review on #588: 1. JSON corruption (blocker): tool_calls[*].function.arguments is a JSON-encoded string. map.unredact() did raw substring replacement, so PII originals containing `"`, `\`, control chars, or non-ASCII would break the surrounding JSON. - Add RedactionMap::unredact_json_string which JSON-escapes each replacement via serde_json::to_string + strip outer quotes. - Add StreamUnredact::new_for_json_string variant for the streaming path; routes substitutions through unredact_json_string. - Non-streaming unredact_chat_response_in_place now uses the json variant for tool_calls[*].function.arguments. - Streaming unredact_chunk_in_place creates per-(choice_idx, tc_idx) state via new_for_json_string. Tests: - placeholders::unredact_json_string_{escapes_quotes, escapes_backslash, escapes_newline, safe_for_simple_pii} - stream_unredact::json_string_variant_{escapes_quote_in_replacement, escapes_across_chunk_split, no_op_for_simple_pii} 2. End-of-stream flush gap (blocker): build_flush_chunks drained content/reasoning fields but ignored tool_call_arguments — partial placeholders held in the tail were silently dropped (or could leak as literal `<email1>` if the tail contained an incomplete-but-recognizable shape). - Extend build_flush_chunks with a second pass that drains states.tool_call_arguments and emits a synthetic SSE chunk per (choice_idx, tc_idx) with the held bytes as a tool_calls delta. - Stable ordering: sort by (choice_idx, tc_idx) for deterministic output. - New e2e auto_redact_unredacts_tool_call_arguments_streaming covers the streamed-args reassembly path end-to-end. 3. Nit (collision safety): Streaming chunk handler no longer falls back to tc.index = 0 for indexless tool calls — uses enumerate position instead, so two parallel indexless tool calls in one delta don't collide on (idx, 0). Test counts: - services unit: 43 pass (was 38; +5 JSON-escape tests) - e2e auto_redact: 10 pass (was 9; +1 streaming-args test) - clippy -D warnings: clean - cargo fmt --check: clean * fix: close 3 remaining auto-redact leak paths + add timeout Surfaced by an independent subagent code review of #588. Each is a distinct privacy regression vs. the design goal of "provider never sees raw PII". 1. **Input tool_calls.arguments now redacted.** In an agent loop the user resubmits the assistant's prior tool_call as part of conversation history. `collect_text_fragments` only walked `message.content`, so the JSON arguments string (which often echoes the original PII verbatim) was forwarded to the provider raw on every follow-up turn. - New `TextRef::ToolCallArg { msg_idx, tc_idx }` variant in `apply.rs`; `collect_text_fragments` + `write_back` extended to cover it. The arguments string is redacted as opaque text — our placeholders are pure ASCII (`<emailN>`) so they stay valid JSON. - Unit tests: collect_walks_assistant_tool_call_arguments, write_back_updates_tool_call_arguments. - E2E test: auto_redact_redacts_input_tool_call_arguments — sends a 4-turn history with a tool_call carrying bob@example.com and asserts the provider sees <email1>, not the raw email. 2. **Response message.refusal now un-redacted.** Safety-tuned models may quote our placeholders back ("I can't email <email1> per policy"). Without un-redacting `choice.message.refusal`, the placeholder leaked to the client. - Walk the field in `unredact_chat_response_in_place`. - E2E test: auto_redact_unredacts_refusal_field. - `ChatDelta` has no `refusal` field, so streaming has no corresponding gap. 3. **Privacy-filter error logs no longer leak upstream response body.** `PrivacyClassifyError::HttpError` carries the verbatim response.text() from the privacy-filter. A misbehaving filter that echoes its input in an error response would have routed customer PII straight to application logs via `tracing::warn!(error = %e, …)`. - Added `Self::privacy_classify_error_category(&e)` returning a bounded `&'static str` (`unauthorized`, `rate_limited`, `unavailable`, `server_error`, `client_error`, `http_other`, `request_failed`). - The "all providers failed" final error now hand-formats the status code only, bypassing `sanitize_error_message` which would have re-introduced the body via `Display`. - Demoted "Privacy classify completed successfully" from info to debug — high-volume info log on every redacted request. 4. **15-second wall-clock timeout on the redact step.** Detector retries cascade through `pool.privacy_classify` with each attempt bounded by `completion_timeout()` (default 600s) × multiple providers. Without an outer bound, a hung detector could hold the user's request hostage for tens of minutes before the 503 fires. Auto-redact is in the critical request path; we cap it tightly. - New `REDACT_TIMEOUT = Duration::from_secs(15)` constant. - `redact_messages` now wraps the inner work in `tokio::time::timeout` and maps `Elapsed` to `DetectorUnavailable`. Also includes 8 new adversarial e2e tests in `auto_redact_adversarial.rs` (from a separate subagent's testing pass): empty messages, empty content, PII in system messages, PII in multimodal content-parts arrays, repeated-PII dedup, user-supplied placeholder collision, 512 KB body, 26 MB body rejection. Test counts: - services unit: 47 pass (was 43; +4 collect/write/timeout tests) - e2e auto_redact: 12 pass (was 10; +2 new tests) - e2e auto_redact_adversarial: 8 pass (new file) - clippy -D warnings: clean - cargo fmt --check: clean
1 parent 972de6c commit d91edc9

9 files changed

Lines changed: 1162 additions & 10 deletions

File tree

crates/api/src/routes/completions.rs

Lines changed: 115 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -355,6 +355,12 @@ fn auto_redact_error_response(err: AutoRedactError) -> Response {
355355

356356
/// Walk a non-streaming chat response's text fields and substitute
357357
/// placeholders with their originals. Mutates in place.
358+
///
359+
/// Covers content + reasoning fields and the JSON-encoded
360+
/// `tool_calls[*].function.arguments` string. The latter is critical for
361+
/// agentic flows where the model emits a tool call whose arguments echo
362+
/// the user's PII — without un-redacting them, the placeholder leaks to
363+
/// the client.
358364
fn unredact_chat_response_in_place(
359365
response: &mut inference_providers::ChatCompletionResponse,
360366
map: &RedactionMap,
@@ -369,20 +375,41 @@ fn unredact_chat_response_in_place(
369375
if let Some(reasoning) = &mut choice.message.reasoning {
370376
*reasoning = map.unredact(reasoning);
371377
}
378+
if let Some(refusal) = &mut choice.message.refusal {
379+
// A safety-tuned model may produce a refusal that quotes our
380+
// placeholders back ("I can't email <email1>"). Without
381+
// un-redacting, the placeholder leaks to the client.
382+
*refusal = map.unredact(refusal);
383+
}
384+
if let Some(tool_calls) = &mut choice.message.tool_calls {
385+
for tc in tool_calls {
386+
if let Some(args) = &mut tc.function.arguments {
387+
// arguments is itself a JSON-encoded string. JSON-escape
388+
// each replacement so PII containing `"`, `\`, or
389+
// control chars doesn't corrupt the surrounding JSON.
390+
*args = map.unredact_json_string(args);
391+
}
392+
}
393+
}
372394
}
373395
}
374396

375397
/// Per-choice, per-field streaming un-redact state. For `n > 1`
376398
/// completions the provider may interleave chunks for different choice
377399
/// indices; each choice needs its own sliding tail buffer or split
378-
/// placeholders get cross-contaminated. The three text fields (`content`,
400+
/// placeholders get cross-contaminated. The text fields (`content`,
379401
/// `reasoning_content`, `reasoning`) are kept independent for the same
380402
/// reason — a model may emit them concurrently.
403+
///
404+
/// `tool_call_arguments` is keyed by `(choice_index, tool_call_index)`
405+
/// because a single response can have multiple parallel tool calls, each
406+
/// with its own arguments JSON stream.
381407
#[derive(Default)]
382408
struct StreamUnredactStates {
383409
content: std::collections::HashMap<i64, StreamUnredact>,
384410
reasoning_content: std::collections::HashMap<i64, StreamUnredact>,
385411
reasoning: std::collections::HashMap<i64, StreamUnredact>,
412+
tool_call_arguments: std::collections::HashMap<(i64, i64), StreamUnredact>,
386413
}
387414

388415
fn unredact_field(
@@ -419,6 +446,29 @@ fn unredact_chunk_in_place(
419446
if let Some(r) = &mut delta.reasoning {
420447
unredact_field(&mut states.reasoning, map, idx, r);
421448
}
449+
if let Some(tcs) = &mut delta.tool_calls {
450+
for (pos, tc) in tcs.iter_mut().enumerate() {
451+
// Per-tool-call streaming state keyed by
452+
// (choice_index, tool_call_index). Use the
453+
// delta's index if present; otherwise fall
454+
// back to the *position* in this chunk so two
455+
// parallel indexless tool calls don't collide.
456+
let tc_idx = tc.index.unwrap_or(pos as i64);
457+
if let Some(func) = &mut tc.function {
458+
if let Some(args) = &mut func.arguments {
459+
// arguments is JSON-encoded; substitute
460+
// with JSON-escaped originals.
461+
let s = states
462+
.tool_call_arguments
463+
.entry((idx, tc_idx))
464+
.or_insert_with(|| {
465+
StreamUnredact::new_for_json_string(map.clone())
466+
});
467+
*args = s.process(args);
468+
}
469+
}
470+
}
471+
}
422472
}
423473
}
424474
}
@@ -445,6 +495,9 @@ fn build_flush_chunks(states: &mut StreamUnredactStates, template: &ChunkTemplat
445495
return Vec::new();
446496
};
447497

498+
let mut out: Vec<Bytes> = Vec::new();
499+
500+
// --- Pass 1: content / reasoning_content / reasoning ---
448501
let mut pending: std::collections::BTreeMap<i64, (String, String, String)> =
449502
std::collections::BTreeMap::new();
450503
for (idx, st) in states.content.drain() {
@@ -465,8 +518,6 @@ fn build_flush_chunks(states: &mut StreamUnredactStates, template: &ChunkTemplat
465518
pending.entry(idx).or_default().2 = text;
466519
}
467520
}
468-
469-
let mut out = Vec::with_capacity(pending.len());
470521
for (idx, (content, rc, r)) in pending {
471522
let chunk = inference_providers::models::ChatCompletionChunk {
472523
id: id.clone(),
@@ -501,6 +552,67 @@ fn build_flush_chunks(states: &mut StreamUnredactStates, template: &ChunkTemplat
501552
out.push(Bytes::from(format!("data: {s}\n\n")));
502553
}
503554
}
555+
556+
// --- Pass 2: tool_call_arguments ---
557+
// Each held tail becomes a synthetic tool-call delta with just the
558+
// arguments fragment. The placeholder may be incomplete (the LLM was
559+
// cut off mid-`<email1>`), in which case the literal bytes are emitted
560+
// — visible signal of truncation, but no silent loss of held bytes
561+
// and no leaking placeholder we never minted.
562+
let mut tc_drain: Vec<((i64, i64), String)> = states
563+
.tool_call_arguments
564+
.drain()
565+
.filter_map(|(key, st)| {
566+
let text = st.flush();
567+
if text.is_empty() {
568+
None
569+
} else {
570+
Some((key, text))
571+
}
572+
})
573+
.collect();
574+
// Stable ordering: choice index ascending, then tool_call index.
575+
tc_drain.sort_by_key(|((c, t), _)| (*c, *t));
576+
for ((choice_idx, tc_idx), args) in tc_drain {
577+
let chunk = inference_providers::models::ChatCompletionChunk {
578+
id: id.clone(),
579+
object: "chat.completion.chunk".to_string(),
580+
created: *created,
581+
model: model.clone(),
582+
system_fingerprint: system_fingerprint.clone(),
583+
choices: vec![inference_providers::models::ChatChoice {
584+
index: choice_idx,
585+
delta: Some(inference_providers::models::ChatDelta {
586+
role: None,
587+
content: None,
588+
name: None,
589+
tool_call_id: None,
590+
tool_calls: Some(vec![inference_providers::models::ToolCallDelta {
591+
id: None,
592+
type_: None,
593+
index: Some(tc_idx),
594+
function: Some(inference_providers::models::FunctionCallDelta {
595+
name: None,
596+
arguments: Some(args),
597+
}),
598+
thought_signature: None,
599+
}]),
600+
reasoning_content: None,
601+
reasoning: None,
602+
}),
603+
logprobs: None,
604+
finish_reason: None,
605+
token_ids: None,
606+
}],
607+
usage: None,
608+
prompt_token_ids: None,
609+
modality: None,
610+
};
611+
if let Ok(s) = serde_json::to_string(&chunk) {
612+
out.push(Bytes::from(format!("data: {s}\n\n")));
613+
}
614+
}
615+
504616
out
505617
}
506618

0 commit comments

Comments
 (0)