feat: opt-in ShimToolSearch — Minify tool loading for 3P providers#586
feat: opt-in ShimToolSearch — Minify tool loading for 3P providers#586Aeshma-Daeva wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an opt-in “ShimToolSearch” mode to reduce tool-schema token overhead for OpenAI-compatible third-party providers by (a) minifying tool schemas, (b) heuristically selecting a smaller tool subset per request, and (c) attempting a two-phase “request_tools” protocol for conversational turns.
Changes:
- Added schema minification utilities (truncate descriptions + strip parameter descriptions).
- Added a keyword-based predictor to choose a smaller tool set per turn when enabled via
ENABLE_SHIM_TOOL_SEARCH. - Added a two-phase request path intended to send a meta-tool first and then re-request with only the needed tools.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| // Phase 1 request — non-streaming, single meta-tool | ||
| const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools } |
There was a problem hiding this comment.
Phase-1 sets tools to [REQUEST_TOOLS_SCHEMA], but REQUEST_TOOLS_SCHEMA is already in OpenAI tool format ({type,function:{...}}). The request-building path calls convertTools(...), which expects Anthropic-style tool defs ({name, description, input_schema}), so t.name becomes undefined and the resulting OpenAI request will contain an invalid/missing tool name. Define request_tools in the same tool shape as params.tools (or special-case/bypass convertTools for this meta-tool) instead of casting.
| const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools } | |
| const requestToolsPhase1Tool = { | |
| name: 'request_tools', | |
| description: 'Request one or more tools from the available tool directory to continue solving the task.', | |
| input_schema: { | |
| type: 'object', | |
| properties: { | |
| tools: { | |
| type: 'array', | |
| description: 'Names of the tools needed for the next step.', | |
| items: { | |
| type: 'string', | |
| }, | |
| }, | |
| rationale: { | |
| type: 'string', | |
| description: 'Brief explanation of why these tools are needed.', | |
| }, | |
| }, | |
| required: ['tools'], | |
| additionalProperties: false, | |
| }, | |
| } | |
| const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [requestToolsPhase1Tool] as typeof params.tools } |
| // Clone messages, inject directory into first system message | ||
| const phase1Messages = JSON.parse(JSON.stringify(params.messages)) as Array<{ role: string; content: unknown }> | ||
| const sysIdx = phase1Messages.findIndex(m => m.role === 'system') | ||
| if (sysIdx >= 0 && typeof phase1Messages[sysIdx].content === 'string') { | ||
| phase1Messages[sysIdx].content += directoryNote | ||
| } else { | ||
| phase1Messages.unshift({ role: 'system', content: `You are a helpful AI assistant.${directoryNote}` }) | ||
| } | ||
|
|
||
| // Phase 1 request — non-streaming, single meta-tool | ||
| const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools } |
There was a problem hiding this comment.
The tool directory note is injected by adding/unshifting a { role: 'system', content: ... } message, but convertMessages(...) only processes user and assistant roles and ignores system messages (system prompt comes from params.system). As a result, the directory listing never reaches the provider. Append the directory note to params.system (handling string/array forms) or update convertMessages to include system role messages.
| // Clone messages, inject directory into first system message | |
| const phase1Messages = JSON.parse(JSON.stringify(params.messages)) as Array<{ role: string; content: unknown }> | |
| const sysIdx = phase1Messages.findIndex(m => m.role === 'system') | |
| if (sysIdx >= 0 && typeof phase1Messages[sysIdx].content === 'string') { | |
| phase1Messages[sysIdx].content += directoryNote | |
| } else { | |
| phase1Messages.unshift({ role: 'system', content: `You are a helpful AI assistant.${directoryNote}` }) | |
| } | |
| // Phase 1 request — non-streaming, single meta-tool | |
| const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools } | |
| // Clone messages, but send the directory note via params.system because | |
| // provider conversion only forwards system content from that field. | |
| const phase1Messages = JSON.parse(JSON.stringify(params.messages)) as Array<{ role: string; content: unknown }> | |
| const phase1System: ShimCreateParams['system'] = | |
| typeof params.system === 'string' | |
| ? `${params.system}${directoryNote}` | |
| : Array.isArray(params.system) | |
| ? [ | |
| ...params.system, | |
| { type: 'text', text: directoryNote.trimStart() }, | |
| ] as ShimCreateParams['system'] | |
| : `You are a helpful AI assistant.${directoryNote}` | |
| // Phase 1 request — non-streaming, single meta-tool | |
| const phase1Params = { | |
| ...params, | |
| stream: false, | |
| system: phase1System, | |
| messages: phase1Messages, | |
| tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools, | |
| } |
| return new Response(JSON.stringify(phase1Json), { | ||
| status: 200, | ||
| headers: { 'content-type': 'application/json' }, | ||
| }) |
There was a problem hiding this comment.
In the non-streaming conversational path, _shimToolSearchCreate returns a raw Response wrapping the OpenAI JSON (new Response(JSON.stringify(phase1Json), ...)). OpenAIShimMessages.create() otherwise returns an Anthropic-style message object (via _convertNonStreamingResponse(...) / Codex conversion), so enabling ShimToolSearch will change return types and likely break callers. Convert phase1Json to the same return shape as the normal non-streaming path instead of returning a Response.
| // Phase 2 — re-request with the actual tools | ||
| const phase2Params = { ...params, tools: toolSet as unknown as typeof params.tools } | ||
| const response = await this._doRequest(request, phase2Params, options) | ||
|
|
||
| if (params.stream) { | ||
| const isResponsesStream = response.url?.includes('/responses') | ||
| return new OpenAIShimStream( | ||
| (request.transport === 'codex_responses' || isResponsesStream) | ||
| ? codexStreamToAnthropic(response, request.resolvedModel) | ||
| : openaiStreamToAnthropic(response, request.resolvedModel), | ||
| ) | ||
| } | ||
| return response | ||
| } |
There was a problem hiding this comment.
After phase-2, the non-streaming branch returns the raw Response from _doRequest(...). This is inconsistent with the rest of create(), which consumes the response body and returns an Anthropic-format message object (and handles Codex /responses conversion). ShimToolSearch should reuse the same non-streaming conversion logic so callers always get the expected message shape.
| // ShimToolSearch: for conversational turns, skip tools entirely (phase-1 bypass) | ||
| if ( | ||
| isEnvTruthy(process.env.ENABLE_SHIM_TOOL_SEARCH) && | ||
| params.tools && (params.tools as unknown[]).length > 0 | ||
| ) { | ||
| const msgs = Array.isArray(params.messages) ? params.messages as unknown[] : [] | ||
| const predicted = predictNeededTools(msgs) | ||
| if (predicted !== null && predicted.size === 0) { | ||
| // Pure conversational — send with request_tools meta-tool only | ||
| return await self._shimToolSearchCreate(request, params, options) | ||
| } | ||
| } |
There was a problem hiding this comment.
The early return into _shimToolSearchCreate(...) bypasses the normal httpResponse = response assignment. That means .withResponse() will return a dummy new Response() when ShimToolSearch is used, losing headers/request id for consumers that call withResponse(). Capture and propagate the final HTTP Response from phase-1/phase-2 (or return an object that includes it) so withResponse() stays accurate.
| } | ||
|
|
||
| const choice = phase1Json.choices?.[0] | ||
| const toolCalls = choice?.message?.tool_calls ?? [] | ||
| const requestToolsCall = toolCalls.find(tc => tc.function?.name === 'request_tools') | ||
|
|
||
| if (!requestToolsCall) { | ||
| // Model chose to respond conversationally — return as synthetic stream | ||
| process.stderr.write('[ShimToolSearch] Phase 1 result: conversational (no tools requested)\n') | ||
| const msg = choice?.message ?? { role: 'assistant', content: '' } |
There was a problem hiding this comment.
_shimToolSearchCreate assumes the phase-1 response is a /chat/completions-style JSON with choices[0].message.tool_calls. But _doOpenAIRequest can transparently fall back to the /responses endpoint (e.g., GitHub Copilot 400 for GPT-5+), which returns a different shape. In that case this JSON parsing/inspection will fail or misbehave. Consider gating ShimToolSearch to known chat-completions providers/transports, or add handling for /responses-format results.
| } | |
| const choice = phase1Json.choices?.[0] | |
| const toolCalls = choice?.message?.tool_calls ?? [] | |
| const requestToolsCall = toolCalls.find(tc => tc.function?.name === 'request_tools') | |
| if (!requestToolsCall) { | |
| // Model chose to respond conversationally — return as synthetic stream | |
| process.stderr.write('[ShimToolSearch] Phase 1 result: conversational (no tools requested)\n') | |
| const msg = choice?.message ?? { role: 'assistant', content: '' } | |
| output?: Array<{ | |
| type?: string | |
| role?: string | |
| content?: string | null | Array<{ | |
| type?: string | |
| text?: string | |
| }> | |
| name?: string | |
| arguments?: string | |
| }> | |
| output_text?: string | null | |
| } | |
| const choice = phase1Json.choices?.[0] | |
| const extractResponsesMessageContent = (content: typeof phase1Json.output extends Array<infer T> ? T['content'] : never): string => { | |
| if (typeof content === 'string') return content | |
| if (!Array.isArray(content)) return '' | |
| return content | |
| .map(part => typeof part?.text === 'string' ? part.text : '') | |
| .join('') | |
| } | |
| const extractPhase1ToolCalls = () => { | |
| if (Array.isArray(choice?.message?.tool_calls)) { | |
| return choice.message.tool_calls | |
| } | |
| if (!Array.isArray(phase1Json.output)) { | |
| return [] as Array<{ function?: { name?: string; arguments?: string } }> | |
| } | |
| return phase1Json.output | |
| .filter(item => item?.type === 'function_call' || item?.type === 'tool_call') | |
| .map(item => ({ | |
| function: { | |
| name: typeof item?.name === 'string' ? item.name : undefined, | |
| arguments: typeof item?.arguments === 'string' ? item.arguments : undefined, | |
| }, | |
| })) | |
| } | |
| const extractPhase1Message = () => { | |
| if (choice?.message) { | |
| return choice.message | |
| } | |
| const outputMessage = Array.isArray(phase1Json.output) | |
| ? phase1Json.output.find(item => item?.type === 'message' && item?.role === 'assistant') | |
| : undefined | |
| if (outputMessage) { | |
| return { | |
| role: outputMessage.role ?? 'assistant', | |
| content: extractResponsesMessageContent(outputMessage.content), | |
| } | |
| } | |
| return { | |
| role: 'assistant', | |
| content: typeof phase1Json.output_text === 'string' ? phase1Json.output_text : '', | |
| } | |
| } | |
| const toolCalls = extractPhase1ToolCalls() | |
| const requestToolsCall = toolCalls.find(tc => tc.function?.name === 'request_tools') | |
| if (!requestToolsCall) { | |
| // Model chose to respond conversationally — return as synthetic stream | |
| process.stderr.write('[ShimToolSearch] Phase 1 result: conversational (no tools requested)\n') | |
| const msg = extractPhase1Message() |
| // Start from essential tools, then add whatever the heuristic predicts | ||
| const wanted = new Set(ESSENTIAL_TOOL_NAMES) | ||
| if (predicted) { | ||
| for (const t of predicted) wanted.add(t) | ||
| } | ||
| const filtered = converted.filter(t => wanted.has(t.function.name)) | ||
| const toolSet = minifyToolSchemas(filtered.length > 0 ? filtered : converted) |
There was a problem hiding this comment.
When predictNeededTools(...) returns null (documented as “uncertain”), the code still filters to ESSENTIAL_TOOL_NAMES only, dropping any other tools explicitly provided in params.tools (e.g., MCP tools or web/task tools). That makes “uncertain” behave like a hard filter and can remove capabilities unexpectedly. Treat null as “don’t filter” (still minify if desired), and only filter when the predictor returns a concrete Set.
| // Start from essential tools, then add whatever the heuristic predicts | |
| const wanted = new Set(ESSENTIAL_TOOL_NAMES) | |
| if (predicted) { | |
| for (const t of predicted) wanted.add(t) | |
| } | |
| const filtered = converted.filter(t => wanted.has(t.function.name)) | |
| const toolSet = minifyToolSchemas(filtered.length > 0 ? filtered : converted) | |
| let selected = converted | |
| if (predicted !== null) { | |
| // Start from essential tools, then add whatever the heuristic predicts | |
| const wanted = new Set(ESSENTIAL_TOOL_NAMES) | |
| for (const t of predicted) wanted.add(t) | |
| const filtered = converted.filter(t => wanted.has(t.function.name)) | |
| selected = filtered.length > 0 ? filtered : converted | |
| } | |
| const toolSet = minifyToolSchemas(selected) |
| const names = toolSet.map(t => t.function.name) | ||
| const totalChars = JSON.stringify(toolSet).length | ||
| process.stderr.write( | ||
| `[ShimToolSearch] ${toolSet.length} tools (${totalChars} chars): ${names.join(', ')}\n`, | ||
| ) |
There was a problem hiding this comment.
ShimToolSearch emits multiple process.stderr.write(...) lines on every request when enabled, including repeated JSON.stringify(toolSet) just to measure size. This can add noise in normal CLI usage and adds avoidable overhead. Prefer the existing debug logging utilities (e.g., logForDebugging) and/or gate these logs behind a debug flag; also avoid serializing the full tool set more than once.
| const names = toolSet.map(t => t.function.name) | |
| const totalChars = JSON.stringify(toolSet).length | |
| process.stderr.write( | |
| `[ShimToolSearch] ${toolSet.length} tools (${totalChars} chars): ${names.join(', ')}\n`, | |
| ) | |
| if (isEnvTruthy(process.env.ENABLE_SHIM_TOOL_SEARCH_DEBUG)) { | |
| const names = toolSet.map(t => t.function.name) | |
| const serializedToolSet = JSON.stringify(toolSet) | |
| const totalChars = serializedToolSet.length | |
| process.stderr.write( | |
| `[ShimToolSearch] ${toolSet.length} tools (${totalChars} chars): ${names.join(', ')}\n`, | |
| ) | |
| } |
| // ShimToolSearch: for conversational turns, skip tools entirely (phase-1 bypass) | ||
| if ( | ||
| isEnvTruthy(process.env.ENABLE_SHIM_TOOL_SEARCH) && | ||
| params.tools && (params.tools as unknown[]).length > 0 | ||
| ) { | ||
| const msgs = Array.isArray(params.messages) ? params.messages as unknown[] : [] | ||
| const predicted = predictNeededTools(msgs) | ||
| if (predicted !== null && predicted.size === 0) { | ||
| // Pure conversational — send with request_tools meta-tool only | ||
| return await self._shimToolSearchCreate(request, params, options) | ||
| } |
There was a problem hiding this comment.
ShimToolSearch introduces a new control-flow path in create() and new tool filtering/minification behavior, but there are no accompanying tests in openaiShim.test.ts. Since this repo already has extensive shim tests, add coverage for: (1) conversational prediction path (phase-1 meta-tool request + correct return shape), (2) requested-tools phase-2 retry, and (3) predictor null behavior (should not drop provided tools).
|
@copilot apply changes based on the comments in this thread |
gnanam1990
left a comment
There was a problem hiding this comment.
Thanks for the work here. I like the goal, and I agree the tool-schema overhead on OpenAI-compatible 3P providers is a real problem. But I’m not comfortable approving this in the current form yet.
I pulled the current head locally and verified bun run build passes. My hold is not about obvious syntax/build breakage, it’s about scope and confidence for a trust-sensitive shim path.
What gives me pause:
- even though this is gated behind
ENABLE_SHIM_TOOL_SEARCH=1, it still changes request construction insrc/services/api/openaiShim.tsin a pretty deep way - the PR combines several behavior changes at once: schema minification, heuristic tool prediction, and a new two-phase
request_toolsprotocol - this sits in a provider/routing-sensitive path, so build-only is not enough evidence for me
What I’d want before approval:
- focused tests showing flag off preserves the exact current behavior
- focused tests for flag on conversational turns using the phase-1 meta-tool path
- focused tests for flag on tool-requiring turns using the predicted subset path
- explicit fallback/retry coverage for wrong or incomplete prediction cases
- regression coverage for provider-specific edge cases in the OpenAI-compatible shim path
I also think the request_tools meta-tool flow is effectively a new behavior contract with third-party models, not just a low-risk optimization. That can be a valid direction, but it needs stronger validation and a clearer maintainer trust story before merge.
So for now my vote is request changes / hold, not because the idea is bad, but because the blast radius is larger than the current test/story coverage supports.
Vasanthdev2004
left a comment
There was a problem hiding this comment.
The concept is great — reducing 8.7K tool tokens per turn on 3P providers is a real win for low-TPM tiers. And I like the opt-in gate. But there are several real bugs that would break things at runtime, plus some design concerns.
🐛 Blocker 1: Phase-1 non-streaming response format is wrong
In _shimToolSearchCreate, when the model responds conversationally (no request_tools call) in non-streaming mode, you return:
return new Response(JSON.stringify(phase1Json), {
status: 200,
headers: { 'content-type': 'application/json' },
})But create() doesn't just return raw Response objects for non-streaming — it calls _convertNonStreamingResponse() to transform the OpenAI JSON into Anthropic format. Returning raw OpenAI JSON will break any caller expecting an Anthropic-shaped response.
Same issue for phase-2 non-streaming: return response at line ~1469 also returns a raw Response without conversion.
Both paths need to go through _convertNonStreamingResponse() (or convertCodexResponseToAnthropicMessage() for /responses endpoints).
🐛 Blocker 2: httpResponse not set for ShimToolSearch path
The create() method tracks httpResponse for the .withResponse() accessor:
httpResponse = response // normal pathBut _shimToolSearchCreate() returns early before this assignment, so withResponse() will always get new Response() (the fallback). This breaks x-request-id and response header access for any ShimToolSearch request.
🐛 Blocker 3: Phase-1 assumes /chat/completions response shape
Phase-1 does phase1Response.json() and reads choices[0].message.tool_calls. But _doRequest() can route to /responses endpoints (Codex, GitHub models). Those return a completely different response shape. The code needs to handle both response formats, or at least check which transport is in use.
⚠️ Concern 4: predictNeededTools returns null → drops non-essential tools
In the _doOpenAIRequest path (line ~1543), when predictNeededTools returns null (meaning "uncertain"), the code filters to ESSENTIAL_TOOL_NAMES only:
const wanted = new Set(ESSENTIAL_TOOL_NAMES)
if (predicted) { // null → skip this
for (const t of predicted) wanted.add(t)
}But what about tools the caller explicitly requested via params.tools that aren't in ESSENTIAL_TOOL_NAMES? They get silently dropped. The null case should probably fall through to sending all tools (the current behavior), not strip to essentials.
⚠️ Concern 5: No tests
This is a significant new control-flow path with heuristic prediction, two-phase protocol, message injection, and response synthesis. The existing openaiShim.test.ts has extensive coverage — this feature needs at least:
- Phase-1 conversational (no tools requested) → synthetic stream
- Phase-1 requests tools → phase-2 re-request
predictNeededToolsclassification testsminifyToolSchemas/stripParamDescriptionsunit tests- AbortSignal propagation through both phases
💡 Non-blocking:
process.stderr.writeon every request — multiple debug lines per turn includingJSON.stringify(toolSet)is noisy for CLI usage. Consider gating these behind aDEBUG/VERBOSEenv var, or at least only log on phase transitions.- Unrelated change mixed in: The
extra_content?.googleoptional chaining fix (lines 971, 1839) is a real bug fix but unrelated to ShimToolSearch. Would be cleaner as a separate PR, but not blocking. TOOL_DIRECTORYis hardcoded — if new tools are added upstream, this becomes stale. Consider deriving from the actual tool list or at least adding a comment noting this.extractUserQuerystrips XML tags — the regex-based stripping of<system-reminder>and<context>tags is fragile. A single malformed tag could leak into the heuristic.
Verdict: Needs changes — blockers 1-3 will cause runtime failures. Concern 4 is a silent behavior regression. Concern 5 (no tests) is important for a feature this complex. The idea is solid; the implementation needs these fixes.
|
@Vasanthdev2004 I will create pr according to this issue as well with well formatted structure and code but I am working on gemini mistral etc |
|
okay man @FluxLuFFy |
|
@Vasanthdev2004 @gnanam1990 Thank you both for the technical feedback. That was my main goal before doing focused work. This didn |
auriti
left a comment
There was a problem hiding this comment.
Review: ShimToolSearch — lazy tool loading for 3P providers
Great concept — the 90% token reduction is real and addresses a genuine pain point for TPM-constrained providers (Groq free tier, Cerebras, small Ollama setups). The minification approach is clean and composable, and the feature gating is solid (zero risk when flag is off).
However, I found 2 critical issues and 5 major issues that need addressing before this can merge safely.
Critical Issues
C1: 10 of ~19 tools are silently unreachable
TOOL_DIRECTORY lists 9 tools, and ESSENTIAL_TOOL_NAMES is derived from it. When predictNeededTools returns null (no keyword matches — the "uncertain" case), the path in _doOpenAIRequest filters to ESSENTIAL_TOOL_NAMES only. This means tools like WebSearch, WebFetch, Skill, TaskCreate, TaskUpdate, MultiEdit, NotebookEdit are never sent unless the model explicitly requests them via request_tools — but the model can't know they exist because they're not in the directory either.
Example: user says "search the web for React 19 release notes" → predictNeededTools returns null (no keyword match for "web") → only 9 essential tools sent → WebSearch is missing → model can't search.
Fix: Either (a) add all ~19 tools to TOOL_DIRECTORY, or (b) invert the null semantics so uncertain = send all tools minified.
C2: null prediction semantics are inverted
predictNeededTools returns:
Set([])→ conversational (no tools needed)Set([...])→ specific tools predictednull→ uncertain (no keywords matched)
The "uncertain" case should be the safest fallback — send all tools (minified) so nothing is lost. Instead, it currently sends only 9 essential tools, silently dropping capabilities. This is the wrong default for an uncertain prediction.
Fix: When predicted === null, set wanted to all tool names (not just essential), then minify.
Major Issues
M1: No try/catch on Phase 1 JSON.parse
const args = JSON.parse(requestToolsCall.function?.arguments ?? '{}')If the model returns malformed JSON (common with smaller models), this crashes the entire request. The existing catch block is empty but the try scope only covers the parse — requestedNames stays [], which means Phase 2 sends essential + all tools. This is actually fine as a fallback, but the empty catch should at least log a warning.
M2: Phase 1 is always non-streaming
Even when the caller requests stream: true, Phase 1 forces stream: false and waits for the full response. For conversational turns (the most common case when this feature activates), this adds noticeable latency — the user sees nothing until the entire response is generated, then it's wrapped in a synthetic stream.
Consider: for the conversational bypass, could you stream Phase 1 directly and detect request_tools calls in the stream? Or at minimum, document this latency trade-off.
M3: Mixed Phase 1 response (tool_call + content) discards content
If the model returns both a request_tools call AND conversational content in the same message, the content is discarded and Phase 2 starts fresh. The model's initial reasoning/response is lost.
M4: Conversational detection can suppress tools for ambiguous queries
"how does the authentication work in this codebase" → matches ^how as conversational, doesn't contain any exclusion keywords (file|code|test|build|... — note "codebase" ≠ "code"), so it's classified as conversational. But the user likely wants the model to read auth-related files.
The exclusion list should include: codebase, repository, repo, project, source, module, component.
M5: TOOL_DIRECTORY is incomplete
Only 9 of ~19 tools are listed. Even if C1/C2 are fixed, the Phase 1 system prompt injection only shows these 9 tools to the model. The model can't request tools it doesn't know about.
Missing tools that users commonly need: WebSearch, WebFetch, Skill, TaskCreate, TaskUpdate, MultiEdit.
Minor Issues
- stderr logging is fine for an opt-in beta feature, but should be gated behind a
DEBUGorVERBOSEflag before GA - Keyword heuristics are English-only — acceptable for v1 but worth noting
- The independent bugfix (optional chaining on
tc.extra_content?.google) is correct and should be extracted into a separate tiny PR so it can merge independently
What works well
- Feature gating: Perfect —
ENABLE_SHIM_TOOL_SEARCH=1is clean, zero behavior change when off - Minification pipeline:
truncateToolDescription+stripParamDescriptions+minifyToolSchemasis well-composed and reusable - Dependency inference:
Edit → also Bash+Readis smart and prevents common "missing tool" failures - Measured results: 7/7 real API calls with data — this is the right way to validate
Suggested path forward
- Fix C1+C2: when
predicted === null, send ALL tools minified (not just essential) - Expand
TOOL_DIRECTORYto all tools - Add the missing exclusion keywords to conversational detection
- Add unit tests for
predictNeededTools(at least 5-6 cases covering conversational, tool-requiring, ambiguous, and edge cases) - Extract the optional chaining bugfix into a separate 1-line PR
Happy to help with any of these — I've been working on the shim recently (#783) and know the code well.
| }, | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
C2 — Conversational detection exclusion list is too narrow
The exclusion check misses common terms that imply code exploration:
&& !/file|code|function|class|test|build|run|install|create|write|edit|implement|fix|debug/.test(t)
Suggested additions: codebase|repository|repo|project|source|module|component|endpoint|api|database|schema
Example false positive: "how does the authentication work in this codebase" → classified as conversational because "codebase" ≠ "code".
| */ | ||
| function stripParamDescriptions(schema: Record<string, unknown>): Record<string, unknown> { | ||
| const out: Record<string, unknown> = {} | ||
| for (const [k, v] of Object.entries(schema)) { |
There was a problem hiding this comment.
Minor — sentence boundary regex may over-match
The regex [\s\S]{30,}?[.!?](\s|\n|$) uses [\s\S] which matches newlines. A description like:
Execute shell commands.
IMPORTANT: Always check exit codes.
would match at the first . after "commands" (good), but the {30,}? minimum means very short first sentences (< 30 chars) fall through to the word-boundary fallback. This is fine for most tools but worth documenting.
|
|
||
| const isConversational = /^(what|who|why|how|when|where|explain|describe|tell me|is it|can you|do you|list|summarize|overview|difference between|compare|pros and cons)/.test(t.trim()) | ||
| && !/file|code|function|class|test|build|run|install|create|write|edit|implement|fix|debug/.test(t) | ||
|
|
There was a problem hiding this comment.
C1 — TOOL_DIRECTORY is incomplete (9/~19 tools)
Missing tools that users commonly need:
WebSearch/WebFetch— web queriesSkill— slash command executionTaskCreate/TaskUpdate— task managementMultiEdit— batch file editsNotebookEdit— Jupyter notebooks
Since ESSENTIAL_TOOL_NAMES is derived from this directory, these tools are unreachable when prediction returns null.
|
|
||
| // --------------------------------------------------------------------------- | ||
| // Types — minimal subset of Anthropic SDK types we need to produce | ||
| // --------------------------------------------------------------------------- |
There was a problem hiding this comment.
M1 — Empty catch swallows parse errors silently
try {
const args = JSON.parse(requestToolsCall.function?.arguments ?? '{}')
requestedNames = Array.isArray(args.tools) ? args.tools : []
} catch {
requestedNames = []
}The fallback to [] is correct (Phase 2 will send essential + all tools), but a process.stderr.write warning here would help debugging when smaller models return malformed JSON. Silent failures are hard to diagnose.
| for (let i = messages.length - 1; i >= 0; i--) { | ||
| const m = messages[i] as { role?: string; content?: unknown } | ||
| if (m.role !== 'user') continue | ||
| let rawText = '' |
There was a problem hiding this comment.
C2 — null prediction should send ALL tools minified, not just essential
In _doOpenAIRequest, when predicted is null (uncertain):
const wanted = new Set(ESSENTIAL_TOOL_NAMES)
if (predicted) {
for (const t of predicted) wanted.add(t)
}predicted === null means "I don't know what tools are needed" — the safe default should be to send everything (minified), not restrict to 9 essential tools. This silently drops WebSearch, WebFetch, Skill, etc.
Suggested fix:
if (predicted === null) {
// Uncertain — send all tools minified
body.tools = minifyToolSchemas(converted)
} else {
const wanted = new Set([...ESSENTIAL_TOOL_NAMES, ...predicted])
const filtered = converted.filter(t => wanted.has(t.function.name))
body.tools = minifyToolSchemas(filtered.length > 0 ? filtered : converted)
}|
@Vasanthdev2004 @auriti @FluxLuFFy @gnanam1990 The main change is that ShimToolSearch is no longer lazy-first. The safe path
Review feedback addressed:
DeepSeek V4 Pro via NanoGPT — Context Compression Benchmark Mode off (no compression): 10/10 success, 114,543 input tokens, $0.6506 Mode minify: 10/10 success, 33,578 input tokens, $0.2483 Minify cuts token usage by 70.7% and cost by 61.8% with zero loss in success rate. Validated locally: bun test src/services/api/openaiShim.test.ts
bun run build
Highly appreciate all your feedback and attention. I`m open to every discussion. |
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Thanks for the redesign update. The new mode split (off / minify / predict / lazy) is a better direction than the original lazy-first approach, but I cannot treat the current head as merge-reviewable yet.
Review scope: Targeted maintainer triage of the current head state and high-risk surface, not a full code approval review.
Verdict: Needs changes
Blocking issue:
- The PR is currently merge-conflicting/dirty against
mainand has no visible PR checks on the current head. This touchessrc/services/api/openaiShim.ts, which is a high-risk provider-routing/request-shaping surface, so it needs a clean rebase and green CI before we can safely review the behavior.
Please rebase onto latest main, keep the diff focused, and push a head with CI checks. After that, I can do a real current-head review focused on:
- default
offbehavior being byte-for-byte equivalent for existing providers - whether
minifypreserves required tool schema semantics - whether
predicthas safe fallback semantics when uncertain - whether
lazyis clearly experimental and cannot accidentally become default - tests for each mode in
openaiShim.test.ts
Non-blocking direction note:
- The safe
minifymode is the part that looks most mergeable.predictand especiallylazymay still need separate product/trust-model discussion depending on how much they alter tool availability.
4a54906 to
d26c310
Compare
|
@Vasanthdev2004 Current head: Also removed the unrelated optional-chaining fix from this PR so the diff is
Local validation:
|
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Scope: Full review of the current ShimToolSearch head (d26c310) with focus on the OpenAI-compatible shim/tool-routing surface.
Verdict: Approve-ready
What I checked:
- The feature is still opt-in and defaults to
off, so the normal OpenAI-compatible tool path remains unchanged unlessOPENAI_SHIM_TOOL_MODE/legacyENABLE_SHIM_TOOL_SEARCHis set. - The current implementation is much safer than the earlier version:
minify,predict, andlazyare explicit modes; debug logging is gated; malformedrequest_toolsJSON falls back to all tools; and phase 2 now routes through the normal_doOpenAIRequestpath instead of bypassing provider shaping. - Tool schema reduction preserves required structure while stripping prose, and prediction falls back to all tools when uncertain.
- The added tests cover minify mode, uncertain predict fallback, web-tool prediction, lazy phase-1 schema shape, and malformed lazy phase-2 fallback.
Verification I ran locally:
bun test src/services/api/openaiShim.test.tspassed: 92/92.bun run buildpassed. It emitted the existing external-list warnings, but exited successfully and produced the CLI/SDK bundles.
I do not see a remaining blocker on the current head. GitHub is not showing fresh check runs for this older PR from my view, so this is code-approval from my side pending any normal maintainer CI rerun policy.
jatmn
left a comment
There was a problem hiding this comment.
Thanks for following up on the earlier review. The requested rebase/focused-diff issue looks addressed, and the earlier predict fallback/tool-directory concerns are covered by the current code and tests. I found one remaining issue below.
Findings
- [P1] Convert lazy-mode non-streaming responses before returning
src/services/api/openaiShim.ts:1660
The lazyShimToolSearchbranch returns_shimToolSearchCreate()directly fromcreate(), so the normal non-streaming conversion block at lines 1680-1715 never runs. In the common non-streaming SDK path, both lazy phase-1 conversational responses and phase-2 responses therefore resolve to a rawResponseobject/OpenAI JSON instead of an Anthropic-shaped message, andwithResponse()also loses the real HTTP response becausehttpResponseis never assigned on this early return. The new lazy tests only assert the outbound request bodies, so this regression passes today. Please route the helper result back through the same conversion/httpResponsehandling as_doRequest, or have the helper return an already-converted Anthropic message for non-streaming calls and add coverage that assertsmessage.content/withResponse().response.
|
Thanks for catching that bug. I pushed a focused fix in 8afab76 that routes lazy-mode non-streaming results through the shared response conversion path and preserves the real HTTP response for withResponse(). I also extended the lazy tests to assert Anthropic-shaped message.content and withResponse().response/request_id for the phase-2 path.\n\nVerification: bun test src/services/api/openaiShim.test.ts (92/92 passing). |
jatmn
left a comment
There was a problem hiding this comment.
Thanks for following up on the earlier review. The requested lazy non-streaming conversion and withResponse() handling look addressed now: the helper result is routed back through _convertCreateResponse(), and the real phase response is stored for withResponse(). I found one remaining issue below.
Findings
- [P2] Apply ShimToolSearch tool reduction to Responses API requests too
src/services/api/openaiShim.ts:2188
The new minify/predict/lazy logic only assigns the reduced tool set to the chat-completionsbody.toolspath. Whenrequest.transport === 'responses',_doOpenAIRequest()sendsbuildResponsesBody()instead, and that builder reconvertsparams.toolsdirectly at line 2188, ignoringconvertedToolOverrideand theselectShimToolSet()result. As a result,OPENAI_API_FORMAT=responsesstill sends the full original tool schemas forminify/predict, and lazy phase 2 also sends all original tools even after_shimToolSearchCreate()computes a filtered/minifiedtoolSet. This is an opt-in feature, but it makes the advertised token reduction silently not work for Responses-format OpenAI-compatible routes and leaves the new tests blind to that transport. Please thread the selected/minified tool set through the Responses body path as well, with coverage that setsOPENAI_API_FORMAT=responsesand asserts the outboundtoolspayload.
|
Thanks, @jatmn. I pushed 8b1229e to address the latest Responses API gap. What changed for the explicit request:
I also used the same line of thought from your last two reviews to run a more methodical adjacent-failure search: transport parity, shape falsification, fallback ablation, and metadata preservation. That uncovered and covered a few nearby issues:
Verification on the PR-head worktree:
Typecheck note: the repo-wide typecheck still reports existing strictness errors in |
8b1229e to
70f2c22
Compare
jatmn
left a comment
There was a problem hiding this comment.
Thanks for following up on the latest review. The requested Responses API tool-reduction fix looks addressed now, and the earlier lazy non-streaming conversion / withResponse() issue is still covered by the current code and tests.
No issues here, LGTM.
techbrewboss
left a comment
There was a problem hiding this comment.
Review summary
Thanks for continuing to tighten this up. The minify path looks valuable, and the current head fixes the earlier lazy conversion and Responses transport issues. I found one remaining API-contract issue around forced tool_choice interacting with the new tool-selection modes.
Findings
src/services/api/openaiShim.ts:1945/src/services/api/openaiShim.ts:2407- Forcedtool_choicecan name a tool that ShimToolSearch removes.
Impact: Inpredictmode, conversational prompts can produce an empty predicted tool set, soselectShimToolSet()sendstools: []and dropstool_choiceentirely. Inlazymode, the same conversational detection enters phase 1 with onlyrequest_tools, but the original forcedtool_choicecan still name a different tool. For example, withtool_choice: { type: 'tool', name: 'WebSearch' }and promptWhat is 2+2?,predictsends no tools, whilelazysends onlyrequest_toolsplustool_choice: WebSearch, which many OpenAI-compatible providers will reject because the selected tool is absent.
Suggested fix: Whenparams.tool_choiceforces a named tool, always include that tool in the selected/minified tool set, and skip lazy phase-1 indirection unless the forced tool isrequest_tools. Please add coverage forpredictandlazywith a non-core forced tool such asWebSearch.
Validation
I ran bun test src/services/api/openaiShim.test.ts on the PR head locally and it passed: 106/106. I also reproduced the forced-tool mismatch with small inline calls against createOpenAIShimClient().
|
Thanks, @techbrewboss. I pushed What changed for the reported case:
I also used the same methodology from the last review cycle rather than stopping at the exact reproduction:
New focused coverage includes forced non-core Validation on the PR-head worktree:
The original mismatch you reproduced with |
Adds keyword-heuristic tool prediction + schema minification, gated behind ENABLE_SHIM_TOOL_SEARCH=1 (off by default, zero behavioral change without it). When enabled: - Conversational turns: sends only a request_tools meta-tool (phase 1) then re-requests with needed tools if needed (phase 2) - Tool-requiring turns: filters to essential + predicted tools, strips verbose descriptions/param docs, truncates to first-sentence Measured: 63KB → 5KB tool payloads (90%+ reduction) on Cerebras. Single file, no new dependencies.
Keep ShimToolSearch reductions aligned with forced tool_choice payloads across chat and responses transports. Prevent responses requests from re-expanding original tools after predict selects an empty set.
25e9b10 to
7ce40f1
Compare
|
Resolved the merge conflict by rebasing Current PR head: Validation after the rebase:
GitHub now reports the PR as mergeable from my side. |
jatmn
left a comment
There was a problem hiding this comment.
Findings
-
[P1] Preserve the
tool_choice: anycontract when prediction returns no tools
src/services/api/openaiShim.ts:626
selectShimToolSet()returns[]for conversational prompts unless a named tool is forced, andgetForcedToolChoiceName()does not treattype: 'any'as forced. The later request-shaping block only emitstool_choicewhenbody.tools.length > 0, sopredictmode turns Anthropic's "must use a tool" contract into a no-tool request for prompts likeWhat is 2+2?withtool_choice: { type: 'any' }. That is a real behavioral regression for callers that rely on mandatory tool execution. Please bypass reduction fortype: 'any', or at least keep a non-empty tool set so the request still satisfies the contract. -
[P2] Path-based repo/file prompts are still misclassified as conversational
src/services/api/openaiShim.ts:548
The conversational gate only excludes prompts containing generic words likefile,code,module, etc. Prompts such assummarize src/services/api/openaiShim.tsorwhat does docs/advanced-setup.md say about profiles?contain an explicit repository path but none of those keywords, sopredictNeededTools()returns an empty set andpredict/lazysend no tools. That is a concrete failure mode for common code-review and docs-review prompts: the model is asked about a specific file but is not given a way to read it. The heuristic should treat path-like tokens (/,\,.ts,.md, etc.) as file/codebase requests, or fall back to the uncertain/all-tools path instead. -
[P3] The benchmark helper will recursively delete any caller-supplied root path
scripts/benchmarks/shim-tool-minify-live.sh:10
The script doesrm -rf "$ROOT"before validating the target. Passing/tmp,/, or a mistyped project path will wipe it recursively. Since this is a checked-in helper script, it should at least reject empty/root-like paths and enforce an expected benchmark-directory prefix before deleting anything.
|
@jatmn hmm at this point i'm inclining towards removing the 2 other modes (predict, lazy) from production openshim code while keeping them under development. Minify is safe enough to be a viable option without too much destructive behavior on openshim as well as to not have to work simultaneously on 3 modes that affect it. The 2 remaining modes clearly needs more focused work and iterations that were more focused on other parts of the solution. Also that`s just 1(tool_schema) of the token-heavy parts the architecture has. So i'm currently running TokenDash, hindisight and ai-code-monitor with the openclaude's telemetry turned on to map it out. I'd rather focus on an overall solution hypothesis than iterating the local issue. |
Yea rescoping down to a smaller pr would probably be best, |
|
I'll edit this one out to match the pr. @jatmn appreciate the feedback, first PR ever tbh. |
techbrewboss
left a comment
There was a problem hiding this comment.
Review summary
Thanks for continuing to tighten this up. The minify path looks genuinely valuable for OpenAI-compatible providers, but I found a few current-head issues that should block merge: the PR is still dirty against main, and predict mode still breaks some tool-use contracts/common file prompts.
Findings
-
PR state- Current head is merge-conflicting againstmain.
Impact: GitHub reportsmergeable: false/mergeable_state: dirty, with no status check rollup on the current head. This touchessrc/services/api/openaiShim.ts, a high-risk provider/request-shaping path, so it should not merge until CI runs on the actual merge result.
Suggested fix: Rebase onto latestGitlawb/openclaude:main, resolve conflicts, and rerun CI. -
src/services/api/openaiShim.ts:626-tool_choice: { type: "any" }can be silently dropped.
Impact: Inpredictmode, conversational prompts return an empty selected tool set. BecausegetForcedToolChoiceName()only handles named tools and the request code only emitstool_choicewhen tools are non-empty, Anthropic's "must use a tool" contract becomes a no-tool request. I reproduced this withWhat is 2+2?plustool_choice: { type: "any" }, which sent{"tools":[]}.
Suggested fix: Treattype: "any"as a forced non-empty tool requirement: bypass reduction, or send all/minified tools and preserverequired. -
src/services/api/openaiShim.ts:548- Path-based file prompts are still classified as conversational.
Impact: Prompts likesummarize src/services/api/openaiShim.tsstart withsummarizeand do not hit the exclusion regex, sopredictsendstools: []. That breaks common code/docs review requests where the user names a file path but does not say "read file."
Suggested fix: Detect path-like tokens/extensions before conversational classification, or treat path-looking prompts as uncertain so all minified tools are available. -
scripts/benchmarks/shim-tool-minify-live.sh:10- Benchmark script recursively deletes caller-supplied paths.
Impact:ROOT="${1:-/tmp/openclaude-shim-bench}"followed byrm -rf "$ROOT"can wipe an accidental path such as/tmp,$HOME/foo, or a mistyped repo path.
Suggested fix: ValidateROOTis non-empty, not/, under an expected temp prefix, and preferably create/use a freshmktemp -ddirectory.
Validation
bun test src/services/api/openaiShim.test.tspassed: 114/114.bun run buildpassed.bun run security:pr-scanreported one medium finding on the new benchmark script.- Manual request-shaping probes reproduced the
tool_choice: anyand path-prompt issues.
|
closing as abandoned |
What
Adds client-side tool schema compression for OpenAI-compatible 3P providers, controlled via
OPENAI_SHIM_TOOL_MODE. Off by default — zero behavioral change without it.Anthropic's native
tool_reference(ENABLE_TOOL_SELECTION) lazily loads schemas server-side, but 3P providers going throughopenaiShim.tseat ~8,700 tool tokens on every turn. This PR is the open equivalent: provider-agnostic, pure client-side, single file.Why this matters: TPM arithmetic
Every API call through
openaiShim.tssends all 19 tool schemas in the request body — ~8,700 tokens of tool definitions before a single word of conversation or system prompt is counted.On a provider with a 30K TPM limit (e.g. Cerebras free tier), one turn with tools already burns 29% of the TPM budget on schema overhead alone. This compounds across a session:
Over a 50-turn session, 435K tokens go to re-transmitting identical tool JSON. That counts against rate limits, billing, and — on providers with context window pressure — attention budget.
Modes
offminifypredictlazyrequest_toolsprotocolThe safe default upgrade path is
minify: non-destructive, no narrowing, just compression.How it works
Three layers, all inside
src/services/api/openaiShim.ts:predictmode) — keyword heuristic on the user message. "Explain transformers" → no tool narrowing. "Fix the bug in main.ts" → Edit, Read, Bash.predicted === null(uncertain) sends all tools minified.lazymode) — conversational turns send a singlerequest_toolsmeta-tool. Model answers directly or asks for specific schemas in a phase-2 re-request.Measured results
DeepSeek V4 Pro via NanoGPT —
minifymode (10 queries):offminifyminifycuts token usage 70.7% and cost 61.8% with zero loss in success rate.Review feedback addressed (since v1)
request_toolsschema shapeparams.systemTOOL_DIRECTORYpredicted === nullsends all tools minifiedScope
src/services/api/openaiShim.tsOPENAI_SHIM_TOOL_MODE=off: vanillabody.tools = convertedpath completely untouched