Forward reasoning controls to Anthropic backend#691
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for forwarding caller-supplied extra fields (such as Anthropic's thinking configuration or reasoning_effort) to the upstream Anthropic API by adding a flattened extra map to the AnthropicRequest struct and implementing corresponding unit tests. The review feedback highlights a potential issue where keys in params.extra that overlap with explicitly defined fields in AnthropicRequest (e.g., model, messages, temperature) could be serialized twice, resulting in duplicate keys in the final JSON payload. To prevent this, it is recommended to explicitly remove these known keys from the extra map before serialization.
The Anthropic backend rebuilt the Messages request from scratch and only copied a fixed set of known fields, dropping everything in `params.extra` (including `reasoning_effort` and Anthropic-native `thinking` objects). The OpenAI-compatible backend already flattens `extra` straight to the upstream provider, so reasoning controls reached OpenAI but were silently dropped for Anthropic/Opus. Make the Anthropic backend a passthrough to match: add a flattened `extra` field to `AnthropicRequest` and populate it from `params.extra`. The upstream provider validates the forwarded fields and returns its own error on unsupported values, so reasoning controls are no longer silently dropped. Typed OpenAI-only sampling params stay out of `extra`, so they are still not forwarded. Fixes #690
099748c to
af179f7
Compare
Code Review — PR #691Scope is small and well-motivated; behavior matches the OpenAI-compatible backend, and the tests cover the happy paths cleanly. A couple of edge cases worth thinking about before merge. ⚠ Field collisions on the flattened
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 099748cbcc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Forwarding the entire `extra` map to Anthropic was unsafe: it would leak internal E2EE keys (x_signing_algo, x_client_pub_key, …) into the upstream request body, forward OpenAI-only fields that Anthropic rejects (max_completion_tokens, presence_penalty/frequency_penalty from the completions->chat translation), and collide with named fields (system, stop_sequences) producing duplicate JSON keys. Replace the blanket clone with an allowlist (`thinking`, `reasoning_effort`) extracted from `extra`. This forwards the reasoning controls we intend, guarantees no collision with named fields, and keeps internal/OpenAI-only fields out of the Anthropic request. Add a test asserting non-allowlisted keys are dropped.
|
Thanks for the reviews — all valid. Addressed in 9f452d7 by switching from a full
An allowlist is preferred over a denylist since |
lloydmak99
left a comment
There was a problem hiding this comment.
Code Review
Four findings from a multi-angle review of the diff, ranked by severity.
🔴 Critical — E2EE keys leak to Anthropic
File: crates/inference_providers/src/external/anthropic/mod.rs (line ~97, extra: params.extra.clone())
The PR comment says "Internal/tracing keys are already stripped upstream in ExternalProvider before the backend runs" — this is only half-true. strip_internal_tracing_keys() removes exactly three tracing keys (x_request_id, x_org_id, x_workspace_id). The five E2EE keys inserted by insert_encryption_headers() in completions.rs are never stripped for external/Anthropic backends:
x_client_pub_keyx_signing_algox_model_pub_keyx_encryption_versionx_encrypt_all_fields
The vLLM provider removes them inside prepare_encryption_headers(), but that code path is not reached here. After this PR, a request that hits an Anthropic model via E2EE routing will forward the client's public key and cipher metadata as top-level JSON fields to Anthropic's servers. There is even an existing vLLM test (~line 2409) that explicitly guards against this exact leak — the same fix is needed here.
Fix: extend strip_internal_tracing_keys() to also remove the five E2EE keys (or extract the vLLM stripping logic into a shared strip_e2ee_keys() helper called for all external backends).
🟡 Medium — Blanket clone forwards OpenAI-only fields, breaking existing callers
File: crates/inference_providers/src/external/anthropic/mod.rs (line ~97)
params.extra.clone() is completely unfiltered. Any OpenAI-only field a caller was previously passing silently (e.g., logprobs, top_logprobs, seed, logit_bias) will now be serialized as a top-level JSON field and forwarded to Anthropic, which rejects unknown fields with a 400. This is a silent breaking change for any caller that routes to Anthropic models while passing OpenAI-style extra params.
The PR acknowledges reasoning_effort → 400 as intentional ("No more silent 200s"), but the same logic applies to every unrecognized field. An allowlist (e.g., only forwarding thinking and reasoning_effort) would avoid this while still achieving the stated goal.
🟡 Medium — thinking + omitted/wrong temperature produces a 400
File: crates/inference_providers/src/external/anthropic/mod.rs (lines 80–84)
Anthropic requires temperature=1.0 for extended thinking. Two cases currently produce a 400 silently:
params.temperature = None—build_requestomitstemperatureentirely (it'sOption<f32>withskip_serializing_if). Anthropic's server-side default is not guaranteed to be 1.0.params.temperature = Some(0.7)— clamped to 0.7 (valid Anthropic range), but Anthropic rejects thethinkingfield when temperature ≠ 1.0.
The PR description says "The default temperature=1.0 is already thinking-compatible" but the default is omission, not 1.0. A simple guard would help:
// If caller supplies a thinking config, coerce temperature to 1.0
let temperature = if extra.contains_key("thinking") {
Some(1.0)
} else {
params.temperature.map(|t| t.clamp(0.0, 1.0))
};🟠 Low — #[serde(flatten)] on HashMap can emit duplicate JSON keys
File: crates/inference_providers/src/external/anthropic/converter.rs (line 91)
When extra contains a key that matches a named struct field (e.g., "stream", "model", "max_tokens"), serde serializes the named field first and then emits the flatten map's entry — producing duplicate JSON keys. Anthropic's parser uses last-wins semantics, meaning a caller could pass {"stream": false} in extra on a streaming call and get a full JSON body back instead of SSE chunks, or pass {"model": "claude-3-haiku"} to route to a cheaper model than the one selected by the backend. For the stream field specifically this causes an immediate client-side parse error; for model it's a silent authorization bypass.
This risk is reduced if finding #2 is addressed with an allowlist, but worth noting independently.
lloydmak99
left a comment
There was a problem hiding this comment.
Updated Review — Following up on 9f452d7
All three of the critical/medium findings are resolved by the allowlist approach. Walking through each:
🔴 E2EE key leak → ✅ Fixed.
extract_passthrough() only returns keys present in ANTHROPIC_PASSTHROUGH_KEYS (thinking, reasoning_effort). All five E2EE keys (x_signing_algo, x_client_pub_key, x_model_pub_key, x_encryption_version, x_encrypt_all_fields) are dropped regardless of what strip_internal_tracing_keys() does. test_build_request_drops_non_allowlisted_extra_keys explicitly asserts four of the five are absent — good belt-and-suspenders.
🟡 OpenAI-only fields causing 400 → ✅ Fixed.
max_completion_tokens, presence_penalty, frequency_penalty, response_format (and any other OpenAI-only field) are silently dropped because they're not on the allowlist, matching the pre-PR behavior. The test covers the extra-map variants of these too, not just the typed-field variants.
🟠 Duplicate JSON key collision → ✅ Fixed.
Neither thinking nor reasoning_effort is a named field on AnthropicRequest, so #[serde(flatten)] on the filtered map can never produce duplicate keys. The comment in converter.rs documents this invariant explicitly.
🟡 thinking + wrong temperature → still unaddressed (intentional).
If a caller passes thinking with an explicit non-1.0 temperature (or omits temperature entirely), Anthropic returns a 400. The PR is consistent about this design choice ("Anthropic validates and returns its own error"), and the failure mode is a clear error rather than silent misbehavior, so I won't block on it. Worth a follow-up to either auto-coerce temperature to 1.0 or document it explicitly in the API surface.
Other observations
extract_passthroughiterates a 2-element slice per request — negligible cost, fine.test_build_request_does_not_leak_openai_only_paramstests the typed-field path whiletest_build_request_drops_non_allowlisted_extra_keyscovers the extra-map path; both are needed and correct.stop_sequencesassertion in the new test (params.stopwins over theextracollision key) is a nice proof that named-field derivation is not affected by extra content.
Approved. The allowlist is the right fix — it directly resolves the security and correctness issues without requiring any changes to strip_internal_tracing_keys() or the broader E2EE stripping pipeline.
Problem
NEAR AI Cloud exposes Anthropic Opus models through the OpenAI-compatible
/v1/chat/completionssurface, but reasoning/thinking controls were silently dropped for them.The OpenAI-compatible backend serializes the whole request (including flattened
extrafields) straight to the upstream provider, soreasoning_effortreaches the provider and is validated upstream. The Anthropic backend instead rebuilt an Anthropic Messages request from scratch and only copied a fixed set of known fields — everything inparams.extra(includingreasoning_effortand an Anthropic-stylethinkingobject) was thrown away. The result: Opus appeared to accept reasoning controls but always ran with default behavior.Change
Make the Anthropic backend a passthrough, mirroring the OpenAI-compatible backend:
converter.rs: add#[serde(flatten)] pub extra: HashMap<String, serde_json::Value>toAnthropicRequest.mod.rs: populate it withextra: params.extra.clone()inbuild_request.The upstream provider validates the forwarded fields and returns its own error on unsupported values, so reasoning controls are no longer silently dropped.
Resulting behavior
thinking: {type, budget_tokens}→ forwarded verbatim; the provider applies extended thinking. The defaulttemperature=1.0is already thinking-compatible, andbuild_requestdropstop_pwhen temperature is set, so the common case works as-is.reasoning_effort→ forwarded; the provider rejects it with a 400, whichmap_provider_errorsurfaces asInvalidParams→ HTTP 400 with the provider's message. No more silent 200s.frequency_penalty, etc.) live in named struct fields, never inextra, so they are still not forwarded — no regression where we'd send the provider a field it rejects.Tests
Added unit tests in
anthropic/mod.rscovering serialization ofbuild_request:thinkingobject is forwardedreasoning_effortis forwardedextraadds no stray fieldscargo test -p inference_providers anthropic→ 25 passed.cargo clippyandcargo fmt --checkclean.Out of scope
reasoning_content/reasoning(streaming-parser change).reasoningconfig on the/v1/responsesAPI, which currently does not forward it to the provider at all.Fixes #690