Forward reasoning controls to Anthropic backend by PierreLeGuen · Pull Request #691 · nearai/cloud-api

PierreLeGuen · 2026-05-29T17:29:29Z

Problem

NEAR AI Cloud exposes Anthropic Opus models through the OpenAI-compatible /v1/chat/completions surface, but reasoning/thinking controls were silently dropped for them.

The OpenAI-compatible backend serializes the whole request (including flattened extra fields) straight to the upstream provider, so reasoning_effort reaches the provider and is validated upstream. The Anthropic backend instead rebuilt an Anthropic Messages request from scratch and only copied a fixed set of known fields — everything in params.extra (including reasoning_effort and an Anthropic-style thinking object) was thrown away. The result: Opus appeared to accept reasoning controls but always ran with default behavior.

Change

Make the Anthropic backend a passthrough, mirroring the OpenAI-compatible backend:

converter.rs: add #[serde(flatten)] pub extra: HashMap<String, serde_json::Value> to AnthropicRequest.
mod.rs: populate it with extra: params.extra.clone() in build_request.

The upstream provider validates the forwarded fields and returns its own error on unsupported values, so reasoning controls are no longer silently dropped.

Resulting behavior

Native thinking: {type, budget_tokens} → forwarded verbatim; the provider applies extended thinking. The default temperature=1.0 is already thinking-compatible, and build_request drops top_p when temperature is set, so the common case works as-is.
reasoning_effort → forwarded; the provider rejects it with a 400, which map_provider_error surfaces as InvalidParams → HTTP 400 with the provider's message. No more silent 200s.
Typed OpenAI-only sampling params (frequency_penalty, etc.) live in named struct fields, never in extra, so they are still not forwarded — no regression where we'd send the provider a field it rejects.

Tests

Added unit tests in anthropic/mod.rs covering serialization of build_request:

native thinking object is forwarded
reasoning_effort is forwarded
typed OpenAI-only params do not leak into the request body
empty extra adds no stray fields

cargo test -p inference_providers anthropic → 25 passed. cargo clippy and cargo fmt --check clean.

Out of scope

Surfacing thinking output as reasoning_content/reasoning (streaming-parser change).
Forwarding the typed reasoning config on the /v1/responses API, which currently does not forward it to the provider at all.

Fixes #690

gemini-code-assist

Code Review

This pull request introduces support for forwarding caller-supplied extra fields (such as Anthropic's thinking configuration or reasoning_effort) to the upstream Anthropic API by adding a flattened extra map to the AnthropicRequest struct and implementing corresponding unit tests. The review feedback highlights a potential issue where keys in params.extra that overlap with explicitly defined fields in AnthropicRequest (e.g., model, messages, temperature) could be serialized twice, resulting in duplicate keys in the final JSON payload. To prevent this, it is recommended to explicitly remove these known keys from the extra map before serialization.

The Anthropic backend rebuilt the Messages request from scratch and only copied a fixed set of known fields, dropping everything in `params.extra` (including `reasoning_effort` and Anthropic-native `thinking` objects). The OpenAI-compatible backend already flattens `extra` straight to the upstream provider, so reasoning controls reached OpenAI but were silently dropped for Anthropic/Opus. Make the Anthropic backend a passthrough to match: add a flattened `extra` field to `AnthropicRequest` and populate it from `params.extra`. The upstream provider validates the forwarded fields and returns its own error on unsupported values, so reasoning controls are no longer silently dropped. Typed OpenAI-only sampling params stay out of `extra`, so they are still not forwarded. Fixes #690

claude · 2026-05-29T17:32:18Z

Code Review — PR #691

Scope is small and well-motivated; behavior matches the OpenAI-compatible backend, and the tests cover the happy paths cleanly. A couple of edge cases worth thinking about before merge.

⚠ Field collisions on the flattened `extra` map

AnthropicRequest already has explicit fields for system, stop_sequences, tools, tool_choice, temperature, top_p, messages, max_tokens, model, stream. With #[serde(flatten)] on extra, any caller-supplied top-level field with the same name that isn't typed in ChatCompletionParams will end up in extra and produce duplicate JSON keys in the body sent to Anthropic.

Concrete cases:

system — ChatCompletionParams has no typed system field, so a top-level system in the user's request flows into extra. It then collides with the system extracted from system-role messages in convert_messages, producing two system keys in the outgoing JSON. Upstream behavior is implementation-defined (typically last-wins, which here depends on serde's non-deterministic HashMap order).
stop_sequences — same shape: not typed in ChatCompletionParams, but populated explicitly here from params.stop. A caller passing both stop and stop_sequences gets duplicate keys.

Suggest one of:

Strip these conflicting keys from extra before assignment in build_request (extra.remove("system"); extra.remove("stop_sequences"); ...), or
Skip serializing the explicit field when extra already contains the same key, or
At minimum add a test that exercises the collision and document the chosen precedence.

⚠ Backward-compatibility: previously-dropped OpenAI fields now 400

The PR description correctly notes reasoning_effort will now surface as a 400. But the change is broader — anything in ChatCompletionParams.extra is forwarded, including OpenAI-only fields not modeled in our typed struct (response_format, service_tier, audio, prediction, web_search_options, etc.). Anthropic rejects unknown fields, so callers who previously could send a portable "OpenAI-shaped" request to a Claude model and have it succeed will now get 400s.

Compare with strip_reasoning_effort_if_unsupported in openai_compatible.rs:85 — there's precedent for selectively stripping fields known to be incompatible with a given upstream. Worth considering whether a small Anthropic-side allowlist (e.g. thinking, reasoning_effort, top-level system, stop_sequences) is safer than full passthrough, or at least calling out the new 400 surface in release notes.

Nits

tests/test_build_request_empty_extra_adds_no_fields is good but consider also asserting key ordering doesn't matter (HashMap iteration is unspecified).
The 4 new tests don't cover the extra + named-field collision case described above.

Privacy/logging

Looks clean — no new log statements, no customer content surfaced. ✓

⚠️ (non-blocking) — Direction is correct, but the field-collision behavior with system/stop_sequences should be addressed (or explicitly documented) before merge given the unbounded passthrough surface.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 099748cbcc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Forwarding the entire `extra` map to Anthropic was unsafe: it would leak internal E2EE keys (x_signing_algo, x_client_pub_key, …) into the upstream request body, forward OpenAI-only fields that Anthropic rejects (max_completion_tokens, presence_penalty/frequency_penalty from the completions->chat translation), and collide with named fields (system, stop_sequences) producing duplicate JSON keys. Replace the blanket clone with an allowlist (`thinking`, `reasoning_effort`) extracted from `extra`. This forwards the reasoning controls we intend, guarantees no collision with named fields, and keeps internal/OpenAI-only fields out of the Anthropic request. Add a test asserting non-allowlisted keys are dropped.

PierreLeGuen · 2026-05-29T17:38:17Z

Thanks for the reviews — all valid. Addressed in 9f452d7 by switching from a full extra passthrough to an allowlist (thinking, reasoning_effort), which fixes every point at once:

Internal E2EE keys (x_signing_algo, x_client_pub_key, x_encryption_version, x_encrypt_all_fields) — inserted into extra by insert_encryption_headers and only x_model_pub_key is stripped downstream. The allowlist keeps all of them out of the Anthropic body.
OpenAI-only fields — max_completion_tokens (chat) and presence_penalty/frequency_penalty (from the /v1/completions→chat translation) live in extra; no longer forwarded, so no new 400s for those.
Duplicate JSON keys — system/stop_sequences are not in the allowlist, so they can never collide with the named fields. Added a test asserting non-allowlisted keys (incl. these and the E2EE keys) are dropped and that stop_sequences comes from params.stop.

An allowlist is preferred over a denylist since extra is an unbounded, growing surface. reasoning_effort is intentionally kept on the list so Anthropic returns its own 400 rather than us silently dropping it.

lloydmak99

Code Review

Four findings from a multi-angle review of the diff, ranked by severity.

🔴 Critical — E2EE keys leak to Anthropic

File: crates/inference_providers/src/external/anthropic/mod.rs (line ~97, extra: params.extra.clone())

The PR comment says "Internal/tracing keys are already stripped upstream in ExternalProvider before the backend runs" — this is only half-true. strip_internal_tracing_keys() removes exactly three tracing keys (x_request_id, x_org_id, x_workspace_id). The five E2EE keys inserted by insert_encryption_headers() in completions.rs are never stripped for external/Anthropic backends:

x_client_pub_key
x_signing_algo
x_model_pub_key
x_encryption_version
x_encrypt_all_fields

The vLLM provider removes them inside prepare_encryption_headers(), but that code path is not reached here. After this PR, a request that hits an Anthropic model via E2EE routing will forward the client's public key and cipher metadata as top-level JSON fields to Anthropic's servers. There is even an existing vLLM test (~line 2409) that explicitly guards against this exact leak — the same fix is needed here.

Fix: extend strip_internal_tracing_keys() to also remove the five E2EE keys (or extract the vLLM stripping logic into a shared strip_e2ee_keys() helper called for all external backends).

🟡 Medium — Blanket clone forwards OpenAI-only fields, breaking existing callers

File: crates/inference_providers/src/external/anthropic/mod.rs (line ~97)

params.extra.clone() is completely unfiltered. Any OpenAI-only field a caller was previously passing silently (e.g., logprobs, top_logprobs, seed, logit_bias) will now be serialized as a top-level JSON field and forwarded to Anthropic, which rejects unknown fields with a 400. This is a silent breaking change for any caller that routes to Anthropic models while passing OpenAI-style extra params.

The PR acknowledges reasoning_effort → 400 as intentional ("No more silent 200s"), but the same logic applies to every unrecognized field. An allowlist (e.g., only forwarding thinking and reasoning_effort) would avoid this while still achieving the stated goal.

🟡 Medium — `thinking` + omitted/wrong `temperature` produces a 400

File: crates/inference_providers/src/external/anthropic/mod.rs (lines 80–84)

Anthropic requires temperature=1.0 for extended thinking. Two cases currently produce a 400 silently:

params.temperature = None — build_request omits temperature entirely (it's Option<f32> with skip_serializing_if). Anthropic's server-side default is not guaranteed to be 1.0.
params.temperature = Some(0.7) — clamped to 0.7 (valid Anthropic range), but Anthropic rejects the thinking field when temperature ≠ 1.0.

The PR description says "The default temperature=1.0 is already thinking-compatible" but the default is omission, not 1.0. A simple guard would help:

// If caller supplies a thinking config, coerce temperature to 1.0
let temperature = if extra.contains_key("thinking") {
    Some(1.0)
} else {
    params.temperature.map(|t| t.clamp(0.0, 1.0))
};

🟠 Low — `#[serde(flatten)]` on HashMap can emit duplicate JSON keys

File: crates/inference_providers/src/external/anthropic/converter.rs (line 91)

When extra contains a key that matches a named struct field (e.g., "stream", "model", "max_tokens"), serde serializes the named field first and then emits the flatten map's entry — producing duplicate JSON keys. Anthropic's parser uses last-wins semantics, meaning a caller could pass {"stream": false} in extra on a streaming call and get a full JSON body back instead of SSE chunks, or pass {"model": "claude-3-haiku"} to route to a cheaper model than the one selected by the backend. For the stream field specifically this causes an immediate client-side parse error; for model it's a silent authorization bypass.

This risk is reduced if finding #2 is addressed with an allowlist, but worth noting independently.

lloydmak99

Updated Review — Following up on `9f452d7`

All three of the critical/medium findings are resolved by the allowlist approach. Walking through each:

🔴 E2EE key leak → ✅ Fixed.
extract_passthrough() only returns keys present in ANTHROPIC_PASSTHROUGH_KEYS (thinking, reasoning_effort). All five E2EE keys (x_signing_algo, x_client_pub_key, x_model_pub_key, x_encryption_version, x_encrypt_all_fields) are dropped regardless of what strip_internal_tracing_keys() does. test_build_request_drops_non_allowlisted_extra_keys explicitly asserts four of the five are absent — good belt-and-suspenders.

🟡 OpenAI-only fields causing 400 → ✅ Fixed.
max_completion_tokens, presence_penalty, frequency_penalty, response_format (and any other OpenAI-only field) are silently dropped because they're not on the allowlist, matching the pre-PR behavior. The test covers the extra-map variants of these too, not just the typed-field variants.

🟠 Duplicate JSON key collision → ✅ Fixed.
Neither thinking nor reasoning_effort is a named field on AnthropicRequest, so #[serde(flatten)] on the filtered map can never produce duplicate keys. The comment in converter.rs documents this invariant explicitly.

🟡 thinking + wrong temperature → still unaddressed (intentional).
If a caller passes thinking with an explicit non-1.0 temperature (or omits temperature entirely), Anthropic returns a 400. The PR is consistent about this design choice ("Anthropic validates and returns its own error"), and the failure mode is a clear error rather than silent misbehavior, so I won't block on it. Worth a follow-up to either auto-coerce temperature to 1.0 or document it explicitly in the API surface.

Other observations

extract_passthrough iterates a 2-element slice per request — negligible cost, fine.
test_build_request_does_not_leak_openai_only_params tests the typed-field path while test_build_request_drops_non_allowlisted_extra_keys covers the extra-map path; both are needed and correct.
stop_sequences assertion in the new test (params.stop wins over the extra collision key) is a nice proof that named-field derivation is not affected by extra content.

Approved. The allowlist is the right fix — it directly resolves the security and correctness issues without requiring any changes to strip_internal_tracing_keys() or the broader E2EE stripping pipeline.

PierreLeGuen temporarily deployed to Cloud API test env May 29, 2026 17:29 — with GitHub Actions Inactive

PierreLeGuen requested a review from lloydmak99 May 29, 2026 17:29

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

Comment thread crates/inference_providers/src/external/anthropic/mod.rs Outdated

PierreLeGuen force-pushed the fix/anthropic-reasoning-passthrough branch from 099748c to af179f7 Compare May 29, 2026 17:31

PierreLeGuen temporarily deployed to Cloud API test env May 29, 2026 17:31 — with GitHub Actions Inactive

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

Comment thread crates/inference_providers/src/external/anthropic/mod.rs Outdated

Comment thread crates/inference_providers/src/external/anthropic/converter.rs

Comment thread crates/inference_providers/src/external/anthropic/mod.rs Outdated

PierreLeGuen temporarily deployed to Cloud API test env May 29, 2026 17:38 — with GitHub Actions Inactive

lloydmak99 reviewed May 29, 2026

View reviewed changes

lloydmak99 approved these changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward reasoning controls to Anthropic backend#691

Forward reasoning controls to Anthropic backend#691
PierreLeGuen wants to merge 2 commits into
mainfrom
fix/anthropic-reasoning-passthrough

PierreLeGuen commented May 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

claude Bot commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PierreLeGuen commented May 29, 2026

Uh oh!

lloydmak99 left a comment

Uh oh!

lloydmak99 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PierreLeGuen commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change

Resulting behavior

Tests

Out of scope

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

claude Bot commented May 29, 2026

Code Review — PR #691

⚠ Field collisions on the flattened extra map

⚠ Backward-compatibility: previously-dropped OpenAI fields now 400

Nits

Privacy/logging

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PierreLeGuen commented May 29, 2026

Uh oh!

lloydmak99 left a comment

Choose a reason for hiding this comment

Code Review

🔴 Critical — E2EE keys leak to Anthropic

🟡 Medium — Blanket clone forwards OpenAI-only fields, breaking existing callers

🟡 Medium — thinking + omitted/wrong temperature produces a 400

🟠 Low — #[serde(flatten)] on HashMap can emit duplicate JSON keys

Uh oh!

lloydmak99 left a comment

Choose a reason for hiding this comment

Updated Review — Following up on 9f452d7

Other observations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PierreLeGuen commented May 29, 2026 •

edited

Loading

⚠ Field collisions on the flattened `extra` map

🟡 Medium — `thinking` + omitted/wrong `temperature` produces a 400

🟠 Low — `#[serde(flatten)]` on HashMap can emit duplicate JSON keys

Updated Review — Following up on `9f452d7`