Skip to content

fix(openai-shim): restore reasoning support for proxy and localhost providers#1155

Open
JoaoPedroCampanari wants to merge 10 commits into
Gitlawb:mainfrom
JoaoPedroCampanari:fix/reasoning-content-dead-flag
Open

fix(openai-shim): restore reasoning support for proxy and localhost providers#1155
JoaoPedroCampanari wants to merge 10 commits into
Gitlawb:mainfrom
JoaoPedroCampanari:fix/reasoning-content-dead-flag

Conversation

@JoaoPedroCampanari
Copy link
Copy Markdown

Fixes reasoning model compatibility issues for OpenAI-compatible localhost/proxy setups.

Previously, providers that require reasoning_content (DeepSeek, MiMo, Moonshot/Kimi, ZAI, etc.) could fail with:

400 OpenAI API error 400: data:{"error":{"code":"400","message":"Param Incorrect","param":"The reasoning_content in
the thinking mode must be passed back to the API.","type":""}}

during multi-turn agent workflows.

This especially affected local proxy/key-router setups such as:

  • localhost OpenAI-compatible endpoints
  • GRouter
  • reverse proxy gateways

Root Cause

Two independent issues caused the failures:

  1. requireReasoningContentOnAssistantMessages was never consumed

The flag existed in provider descriptors and was configured by:

  • DeepSeek
  • Moonshot/Kimi
  • MiMo
  • ZAI
  • kimi-code

However, the OpenAI shim message conversion pipeline never actually used the flag.

As a result, assistant messages could lose reasoning_content during agent execution.

  1. treatAsLocal bypassed model inference

The treatAsLocal code path skipped model-name-based config inference entirely.

This prevented localhost/proxy endpoints from inheriting reasoning-specific behavior even when the routed model was
clearly identifiable, such as:

  • mimo-v2.5-pro
  • deepseek-reasoner
  • other reasoning-capable providers

Changes

src/services/api/openaiShim.ts

Implemented 5 fixes in convertMessages():

  • Pass requireReasoningContentOnAssistantMessages into the conversion pipeline
  • Attach reasoning_content: '' to assistant messages when required by provider config
  • Remove the toolUses.length > 0 restriction so plain-text assistant messages are also handled
  • Preserve reasoning_content during assistant-message coalescing
  • Add fallback reasoning_content to synthetic assistant messages created during tool/user alternation
  • Add handling for non-array-content assistant messages

src/integrations/runtimeMetadata.ts

  • Merge reasoning-related fields from inferRemoteModelOpenAIShimConfig() into the treatAsLocal branch
  • Allows localhost/proxy endpoints to inherit correct reasoning behavior from model-name inference

Provider Impact

Fixes reasoning-agent failures for:

  • DeepSeek
  • Moonshot/Kimi
  • MiMo
  • ZAI
  • future providers using requireReasoningContentOnAssistantMessages: true

Also enables reasoning models to work correctly through:

  • localhost OpenAI-compatible endpoints
  • proxy routers
  • API gateways

No behavior changes for providers that do not use the flag.

Tested

  • bun run build ✅
  • bun run smoke ✅
  • bun test src/services/api/openaiShim → 104 pass / 0 fail ✅

Manual validation:

  • MiMo v2.5 Pro via GRouter (localhost:3103) ✅
  • MiMo v2.5 Pro via direct Xiaomi endpoint ✅

Verified:

  • Explore agents
  • background agents
  • multi-turn reasoning flows
  • reasoning-content persistence

Follow-up

Some reasoning providers (Qwen/QwQ/GLM/etc.) are not yet included in inferRemoteModelOpenAIShimConfig().

Once entries are added, the treatAsLocal fix ensures they will automatically work through localhost/proxy setups as
well.

Copy link
Copy Markdown
Collaborator

@jatmn jatmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [P2] Keep non-empty reasoning_content when coalescing assistant messages
    src/services/api/openaiShim.ts:762
    The new coalescing logic overwrites an existing non-empty reasoning_content whenever the following assistant message has the empty-string fallback. For example, a Moonshot/Kimi history with an assistant tool call that includes a thinking block, followed by another assistant text message before the tool result, is converted into one assistant message whose reasoning_content is "" instead of the original thinking text. That still sends a tool-call assistant message without the reasoning continuity these providers require, so the PR's target proxy/reasoning workflow can still hit the 400 this is meant to fix. Please keep the previous non-empty value when the incoming value is only the fallback, or otherwise merge in a way that does not discard the real reasoning text.

@JoaoPedroCampanari
Copy link
Copy Markdown
Author

@jatmn I've addressed your feedback in commit 81a8c02 — the coalescing logic now preserves non-empty reasoning_content when the incoming value is an empty fallback. Please re-review.

jatmn
jatmn previously approved these changes May 13, 2026
Copy link
Copy Markdown
Collaborator

@jatmn jatmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up on the earlier review. The requested reasoning_content coalescing fix looks addressed now.

No remaining issues here, LGTM.

Copy link
Copy Markdown
Collaborator

@techbrewboss techbrewboss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

Thanks for the follow-up here. The dead requireReasoningContentOnAssistantMessages path is now wired into the shim, and the prior coalescing issue looks fixed. I found one remaining scope/correctness gap around the advertised local-proxy provider coverage.

Findings

  • src/integrations/runtimeMetadata.ts:218 - Local/proxy inference does not cover advertised MiMo or Z.AI/GLM models.
    Impact: The PR says localhost/proxy setups like GRouter work for models such as mimo-v2.5-pro and also lists Z.AI, but inferRemoteModelOpenAIShimConfig() still only recognizes deepseek, kimi, and moonshot. In a local proxy context, mimo-v2.5-pro, GLM-5.1, and zai/glm-5.1 resolve to only { maxTokensField: "max_tokens" }, so they still do not get preserveReasoningContent, requireReasoningContentOnAssistantMessages, or reasoningContentFallback.
    Suggested fix: Add model-name inference for the advertised providers, with focused tests for local proxy models, or narrow the PR description/provider claims to DeepSeek and Kimi/Moonshot only.

Validation

  • bun test src/services/api/openaiShim.test.ts passed: 93 tests.
  • bun test src/integrations/routeMetadata.test.ts src/integrations/compatibility.test.ts src/services/api/providerConfig.local.test.ts passed: 40 tests.
  • bun run build passed.
  • Manual local-proxy config check showed deepseek-reasoner and kimi-k2.6 infer reasoning config, but mimo-v2.5-pro, GLM-5.1, and zai/glm-5.1 do not.

@JoaoPedroCampanari
Copy link
Copy Markdown
Author

@techbrewboss I've addressed your feedback in commit 95f7feeinferRemoteModelOpenAIShimConfig() now covers MiMo and GLM/Z.AI models in the local/proxy path. Could you re-review?

Changes:

  • mimopreserveReasoningContent, requireReasoningContentOnAssistantMessages, reasoningContentFallback
  • glm / zai/glm → same reasoning config
  • Tested with mimo-v2.5-pro, GLM-5.1, and zai/glm-5.1 via local proxy — all now infer correct reasoning behavior

@JoaoPedroCampanari JoaoPedroCampanari requested a review from jatmn May 14, 2026 19:22
Copy link
Copy Markdown
Collaborator

@techbrewboss techbrewboss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

Thanks for following up on the local/proxy reasoning-content path. The dead requireReasoningContentOnAssistantMessages flag is now wired into the shim and the MiMo/GLM local-proxy inference gap is addressed, but I found one direct-provider regression that should block merge.

Findings

  • src/integrations/runtimeMetadata.ts:166 - MiMo model-name inference overrides the direct Xiaomi MiMo route’s token field.
    Impact: direct https://api.xiaomimimo.com/v1 requests for mimo-v2.5-pro now send max_tokens instead of the descriptor-required max_completion_tokens. This breaks the existing xiaomi mimo route uses api-key auth header and max_completion_tokens test and contradicts the PR’s “No behavior changes” claim for direct providers.
    Suggested fix: keep the new MiMo/GLM inference scoped to local/proxy or generic OpenAI-compatible routing, or avoid returning maxTokensField: 'max_tokens' from MiMo model-name inference so the Xiaomi descriptor can continue to win.

Validation

  • Fetched PR Gitlawb/openclaude#1155 into an isolated worktree at commit 95f7fee.
  • Ran bun test src/services/api/openaiShim.test.ts src/integrations/routeMetadata.test.ts src/integrations/compatibility.test.ts src/services/api/providerConfig.local.test.ts with the repo’s existing node_modules symlinked into the temporary worktree.
  • Result: 136 pass, 1 fail.
  • Failing test: xiaomi mimo route uses api-key auth header and max_completion_tokens.

@Vasanthdev2004
Copy link
Copy Markdown
Collaborator

Blockers

  1. MiMo model-name inference overrides direct Xiaomi route's token fieldinferRemoteModelOpenAIShimConfig() now returns maxTokensField: 'max_tokens' for MiMo models, which overrides the descriptor-required max_completion_tokens for the direct Xiaomi endpoint. This breaks the existing test and contradicts the "No behavior changes" claim for direct providers.

Non-Blocking

  • PR went through multiple review rounds — coalescing fix and MiMo/GLM inference were addressed, but the direct-provider regression remains.

Looks Good

  • Fixes real reasoning support for proxy/localhost providers
  • Wires requireReasoningContentOnAssistantMessages into the shim
  • Covers DeepSeek, Kimi/Moonshot, MiMo, GLM/Z.AI
  • 104 tests passing in openaiShim

Verdict: Changes Requested — direct-provider regression must be fixed before merge.

…serve reasoning through local proxies

What changed:
`requireReasoningContentOnAssistantMessages` was defined in descriptors.ts
and set by five vendor configs (deepseek, moonshot, mimo, zai, kimi-code)
but never consumed in the OpenAI shim's message conversion pipeline. This
meant any provider that required reasoning_content on assistant messages
would get a 400 error from the API ("reasoning_content must be passed back
to the API") whenever the flag was the only source of truth.

Additionally, the `treatAsLocal` code path in runtimeMetadata.ts bypassed
model-name-based config inference entirely, so local proxies (e.g. key
routers like grouter) forwarding to remote reasoning models never received
reasoning-related config even when the model name clearly indicated a
reasoning provider.

Why it changed:
- The flag was a dead config — set but never read — causing 400 errors for
  all five vendors that rely on it
- Local proxies that forward to reasoning providers need the same config as
  direct connections; the model name (e.g. "mimo-v2.5-pro") is sufficient
  signal regardless of whether the base URL is localhost or remote
- The empty-string reasoning_content fallback was gated on toolUses.length > 0,
  so plain-text assistant messages without tool calls were missing the field
- The coalescing pass silently dropped reasoning_content when merging
  consecutive assistant messages
- Synthetic assistant messages injected during tool→user gaps and non-array
  content assistant messages never received reasoning_content

User/developer impact:
- Fixes 400 errors for DeepSeek, Moonshot/Kimi, MiMo, ZAI, and any future
  vendor that sets requireReasoningContentOnAssistantMessages: true
- Enables reasoning models to work through local proxies/key routers without
  manual config
- No behavior change for providers that do not set the flag (all defaults
  remain false/undefined)

Checks run:
- bun run build — clean
- bun run smoke — pass
- bun test src/services/api/openaiShim — 104 pass, 0 fail
When coalescing consecutive assistant messages, an empty
reasoning_content fallback ("") would overwrite a non-empty
thinking text from a previous message. Now we only overwrite
when the incoming value is non-empty or the existing value
is already empty/undefined.
inferRemoteModelOpenAIShimConfig() only recognized deepseek,
kimi, and moonshot. When using a local proxy (e.g. GRouter)
with mimo-v2.5-pro or GLM-5.1, the function returned undefined,
so these models got no reasoning config even with the treatAsLocal
merge fix. Now mimo and glm patterns are recognized and return
the same reasoning config their vendor files define.
InferRemoteModelOpenAIShimConfig for mimo and glm returned
maxTokensField and removeBodyFields, which overrode the Xiaomi
MiMo vendor descriptor's max_completion_tokens when the direct
route was used. Model-name inference should only return reasoning-
related fields (preserveReasoningContent, requireReasoningContent,
reasoningContentFallback, thinkingRequestFormat) so the descriptor
remains the source of truth for transport fields like maxTokensField.
mergeOpenAIShimConfig now checks for a descriptor-level maxTokensField
before applying the model-inferred value. This prevents direct providers
(e.g. Xiaomi with max_completion_tokens) from being overridden by
inference results that default to max_tokens.

Gateway routes without a descriptor still receive the inferred value.
The treatAsLocal code path was dropping removeBodyFields from the
model-inference config. For DeepSeek/Kimi models, inference returns
removeBodyFields: ['store'] which was silently lost for local proxies.

Currently mitigated by the isLocal store-stripping check, but this
ensures future removeBodyFields entries are also preserved.
@JoaoPedroCampanari JoaoPedroCampanari force-pushed the fix/reasoning-content-dead-flag branch from b818715 to 6dc6e84 Compare May 16, 2026 12:04
github-actions Bot and others added 4 commits May 16, 2026 21:12
MiMo's API expects `max_completion_tokens` (matching its vendor descriptor),
but the model-name inference was not setting maxTokensField. When going
through a localhost proxy (treatAsLocal path), the default `max_tokens`
was applied instead, causing the API to ignore the token limit and
terminate responses prematurely.

Also adds missing removeBodyFields: ['store'] for both MiMo and GLM
to match the pattern used by DeepSeek and Kimi inference configs.
The treatAsLocal code path hardcoded maxTokensField to 'max_tokens',
which broke providers like MiMo that require 'max_completion_tokens'
even when accessed through a local proxy (e.g. GRouter).

Now checks if remoteModelInferredConfig has a maxTokensField before
falling back to 'max_tokens', matching the pattern already used for
reasoning fields.
Copy link
Copy Markdown
Collaborator

@techbrewboss techbrewboss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

The reasoning-content/runtime fix looks useful and the focused provider tests pass on the current PR head. I found one remaining scope issue: the branch includes release-please version/changelog output unrelated to this fix.

Findings

  • package.json:3, .release-please-manifest.json:2, CHANGELOG.md:3 - Unrelated release-please version/changelog changes are included.
    Impact: This reasoning-content PR now also bumps the package to 0.12.1 and adds changelog entries for unrelated PRs #1191 and #1204. That mixes release automation output into a provider/runtime fix and can confuse the release process or make this PR appear to publish unrelated work.
    Suggested fix: Drop the release-please commit/files from this branch and keep the PR limited to the shim/runtime changes.

Validation

  • Fetched PR head 1348e0a into an isolated worktree.
  • bun test src/services/api/openaiShim.test.ts src/integrations/routeMetadata.test.ts src/integrations/compatibility.test.ts src/services/api/providerConfig.local.test.ts passed: 141 tests.
  • bun run build passed.

Copy link
Copy Markdown
Collaborator

@jatmn jatmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up on the earlier reviews. The requested reasoning_content coalescing fix still looks addressed, and the direct Xiaomi MiMo token-field regression is covered now. I found one remaining issue below.

Findings

  • [P2] Drop unrelated release-please output from this branch
    package.json:3, .release-please-manifest.json:2, CHANGELOG.md:3
    This reasoning-content fix now also bumps the package version to 0.12.1 and adds changelog entries for unrelated PRs #1191 and #1204. That mixes release automation output into a provider/runtime fix, which can confuse the release process and make this PR appear to publish unrelated work. Please remove the release-please commit/files from this branch and keep the PR scoped to the shim/runtime changes.

@gnanam1990
Copy link
Copy Markdown
Collaborator

Confirmed @jatmn's and @techbrewboss's point against the branch — they've both landed on the same single remaining issue. The reasoning-content/runtime fix itself looks addressed (both reviewers agree), but the branch also carries release-please output: package.json bumped to 0.12.1 and CHANGELOG.md / .release-please-manifest.json entries for unrelated PRs #1191 and #1204. That mixes release automation into a provider/runtime fix and can confuse the release process. Dropping the release-please commit/files and keeping this scoped to runtimeMetadata.ts / openaiShim.ts clears it. Thanks — happy to re-review as soon as the branch is trimmed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants