feat: opt-in ShimToolSearch — Minify tool loading for 3P providers by Aeshma-Daeva · Pull Request #586 · Gitlawb/openclaude

Aeshma-Daeva · 2026-04-10T18:12:33Z

What

Adds client-side tool schema compression for OpenAI-compatible 3P providers, controlled via OPENAI_SHIM_TOOL_MODE. Off by default — zero behavioral change without it.

Anthropic's native tool_reference (ENABLE_TOOL_SELECTION) lazily loads schemas server-side, but 3P providers going through openaiShim.ts eat ~8,700 tool tokens on every turn. This PR is the open equivalent: provider-agnostic, pure client-side, single file.

Why this matters: TPM arithmetic

Every API call through openaiShim.ts sends all 19 tool schemas in the request body — ~8,700 tokens of tool definitions before a single word of conversation or system prompt is counted.

On a provider with a 30K TPM limit (e.g. Cerebras free tier), one turn with tools already burns 29% of the TPM budget on schema overhead alone. This compounds across a session:

Turns	Tool tokens sent (cumulative)	% of 30K TPM budget per turn
1	8,700	29%
10	87,000	29%
50	435,000	29%

Over a 50-turn session, 435K tokens go to re-transmitting identical tool JSON. That counts against rate limits, billing, and — on providers with context window pressure — attention budget.

Modes

Mode	Behavior
`off`	Current behavior, untouched
`minify`	Send all tools, strip verbose descriptions and param prose
`predict`	Narrow to predicted tools when confident; fall back to all tools minified
`lazy`	Experimental two-phase `request_tools` protocol

The safe default upgrade path is minify: non-destructive, no narrowing, just compression.

How it works

Three layers, all inside src/services/api/openaiShim.ts:

Schema minification — strips verbose Anthropic descriptions + param docs. Bash alone: 11.4KB → 0.3KB.
Task-aware prediction (predict mode) — keyword heuristic on the user message. "Explain transformers" → no tool narrowing. "Fix the bug in main.ts" → Edit, Read, Bash. predicted === null (uncertain) sends all tools minified.
Two-phase protocol (lazy mode) — conversational turns send a single request_tools meta-tool. Model answers directly or asks for specific schemas in a phase-2 re-request.

Measured results

DeepSeek V4 Pro via NanoGPT — minify mode (10 queries):

Mode	Success	Input tokens	Cost
`off`	10/10	114,543	$0.6506
`minify`	10/10	33,578	$0.2483

minify cuts token usage 70.7% and cost 61.8% with zero loss in success rate.

Review feedback addressed (since v1)

Fixed phase-1 request_tools schema shape
Fixed system prompt injection through params.system
Expanded TOOL_DIRECTORY
Fixed uncertain prediction semantics: predicted === null sends all tools minified
Added web/current-query prediction
Added repo/codebase exclusions to avoid suppressing tools on broad queries
Added malformed phase-1 JSON fallback
Added focused tests
Removed unrelated optional-chaining bugfix from PR scope

Scope

1 file modified: src/services/api/openaiShim.ts
~320 lines added, 0 new dependencies
No provider-specific coupling — works with any OpenAI-compatible endpoint
OPENAI_SHIM_TOOL_MODE=off: vanilla body.tools = converted path completely untouched

Copilot

Pull request overview

Adds an opt-in “ShimToolSearch” mode to reduce tool-schema token overhead for OpenAI-compatible third-party providers by (a) minifying tool schemas, (b) heuristically selecting a smaller tool subset per request, and (c) attempting a two-phase “request_tools” protocol for conversational turns.

Changes:

Added schema minification utilities (truncate descriptions + strip parameter descriptions).
Added a keyword-based predictor to choose a smaller tool set per turn when enabled via ENABLE_SHIM_TOOL_SEARCH.
Added a two-phase request path intended to send a meta-tool first and then re-request with only the needed tools.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-10T18:18:48Z

+    }
+
+    // Phase 1 request — non-streaming, single meta-tool
+    const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools }


Phase-1 sets tools to [REQUEST_TOOLS_SCHEMA], but REQUEST_TOOLS_SCHEMA is already in OpenAI tool format ({type,function:{...}}). The request-building path calls convertTools(...), which expects Anthropic-style tool defs ({name, description, input_schema}), so t.name becomes undefined and the resulting OpenAI request will contain an invalid/missing tool name. Define request_tools in the same tool shape as params.tools (or special-case/bypass convertTools for this meta-tool) instead of casting.

Suggested change

const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools }

const requestToolsPhase1Tool = {

name: 'request_tools',

description: 'Request one or more tools from the available tool directory to continue solving the task.',

input_schema: {

type: 'object',

properties: {

tools: {

type: 'array',

description: 'Names of the tools needed for the next step.',

items: {

type: 'string',

},

},

rationale: {

type: 'string',

description: 'Brief explanation of why these tools are needed.',

},

},

required: ['tools'],

additionalProperties: false,

},

}

const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [requestToolsPhase1Tool] as typeof params.tools }

Copilot · 2026-04-10T18:18:49Z

+    // Clone messages, inject directory into first system message
+    const phase1Messages = JSON.parse(JSON.stringify(params.messages)) as Array<{ role: string; content: unknown }>
+    const sysIdx = phase1Messages.findIndex(m => m.role === 'system')
+    if (sysIdx >= 0 && typeof phase1Messages[sysIdx].content === 'string') {
+      phase1Messages[sysIdx].content += directoryNote
+    } else {
+      phase1Messages.unshift({ role: 'system', content: `You are a helpful AI assistant.${directoryNote}` })
+    }
+
+    // Phase 1 request — non-streaming, single meta-tool
+    const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools }


The tool directory note is injected by adding/unshifting a { role: 'system', content: ... } message, but convertMessages(...) only processes user and assistant roles and ignores system messages (system prompt comes from params.system). As a result, the directory listing never reaches the provider. Append the directory note to params.system (handling string/array forms) or update convertMessages to include system role messages.

Suggested change

// Clone messages, inject directory into first system message

const phase1Messages = JSON.parse(JSON.stringify(params.messages)) as Array<{ role: string; content: unknown }>

const sysIdx = phase1Messages.findIndex(m => m.role === 'system')

if (sysIdx >= 0 && typeof phase1Messages[sysIdx].content === 'string') {

phase1Messages[sysIdx].content += directoryNote

} else {

phase1Messages.unshift({ role: 'system', content: `You are a helpful AI assistant.${directoryNote}` })

}

// Phase 1 request — non-streaming, single meta-tool

const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools }

// Clone messages, but send the directory note via params.system because

// provider conversion only forwards system content from that field.

const phase1Messages = JSON.parse(JSON.stringify(params.messages)) as Array<{ role: string; content: unknown }>

const phase1System: ShimCreateParams['system'] =

typeof params.system === 'string'

? `${params.system}${directoryNote}`

: Array.isArray(params.system)

? [

...params.system,

{ type: 'text', text: directoryNote.trimStart() },

] as ShimCreateParams['system']

: `You are a helpful AI assistant.${directoryNote}`

// Phase 1 request — non-streaming, single meta-tool

const phase1Params = {

...params,

stream: false,

system: phase1System,

messages: phase1Messages,

tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools,

}

Copilot · 2026-04-10T18:18:49Z

+      return new Response(JSON.stringify(phase1Json), {
+        status: 200,
+        headers: { 'content-type': 'application/json' },
+      })


In the non-streaming conversational path, _shimToolSearchCreate returns a raw Response wrapping the OpenAI JSON (new Response(JSON.stringify(phase1Json), ...)). OpenAIShimMessages.create() otherwise returns an Anthropic-style message object (via _convertNonStreamingResponse(...) / Codex conversion), so enabling ShimToolSearch will change return types and likely break callers. Convert phase1Json to the same return shape as the normal non-streaming path instead of returning a Response.

Copilot · 2026-04-10T18:18:49Z

+    // Phase 2 — re-request with the actual tools
+    const phase2Params = { ...params, tools: toolSet as unknown as typeof params.tools }
+    const response = await this._doRequest(request, phase2Params, options)
+
+    if (params.stream) {
+      const isResponsesStream = response.url?.includes('/responses')
+      return new OpenAIShimStream(
+        (request.transport === 'codex_responses' || isResponsesStream)
+          ? codexStreamToAnthropic(response, request.resolvedModel)
+          : openaiStreamToAnthropic(response, request.resolvedModel),
+      )
+    }
+    return response
+  }


After phase-2, the non-streaming branch returns the raw Response from _doRequest(...). This is inconsistent with the rest of create(), which consumes the response body and returns an Anthropic-format message object (and handles Codex /responses conversion). ShimToolSearch should reuse the same non-streaming conversion logic so callers always get the expected message shape.

Copilot · 2026-04-10T18:18:50Z

+      // ShimToolSearch: for conversational turns, skip tools entirely (phase-1 bypass)
+      if (
+        isEnvTruthy(process.env.ENABLE_SHIM_TOOL_SEARCH) &&
+        params.tools && (params.tools as unknown[]).length > 0
+      ) {
+        const msgs = Array.isArray(params.messages) ? params.messages as unknown[] : []
+        const predicted = predictNeededTools(msgs)
+        if (predicted !== null && predicted.size === 0) {
+          // Pure conversational — send with request_tools meta-tool only
+          return await self._shimToolSearchCreate(request, params, options)
+        }
+      }


The early return into _shimToolSearchCreate(...) bypasses the normal httpResponse = response assignment. That means .withResponse() will return a dummy new Response() when ShimToolSearch is used, losing headers/request id for consumers that call withResponse(). Capture and propagate the final HTTP Response from phase-1/phase-2 (or return an object that includes it) so withResponse() stays accurate.

Copilot · 2026-04-10T18:18:50Z

+    }
+
+    const choice = phase1Json.choices?.[0]
+    const toolCalls = choice?.message?.tool_calls ?? []
+    const requestToolsCall = toolCalls.find(tc => tc.function?.name === 'request_tools')
+
+    if (!requestToolsCall) {
+      // Model chose to respond conversationally — return as synthetic stream
+      process.stderr.write('[ShimToolSearch] Phase 1 result: conversational (no tools requested)\n')
+      const msg = choice?.message ?? { role: 'assistant', content: '' }


_shimToolSearchCreate assumes the phase-1 response is a /chat/completions-style JSON with choices[0].message.tool_calls. But _doOpenAIRequest can transparently fall back to the /responses endpoint (e.g., GitHub Copilot 400 for GPT-5+), which returns a different shape. In that case this JSON parsing/inspection will fail or misbehave. Consider gating ShimToolSearch to known chat-completions providers/transports, or add handling for /responses-format results.

Suggested change

}

const choice = phase1Json.choices?.[0]

const toolCalls = choice?.message?.tool_calls ?? []

const requestToolsCall = toolCalls.find(tc => tc.function?.name === 'request_tools')

if (!requestToolsCall) {

// Model chose to respond conversationally — return as synthetic stream

process.stderr.write('[ShimToolSearch] Phase 1 result: conversational (no tools requested)\n')

const msg = choice?.message ?? { role: 'assistant', content: '' }

output?: Array<{

type?: string

role?: string

content?: string | null | Array<{

type?: string

text?: string

}>

name?: string

arguments?: string

}>

output_text?: string | null

}

const choice = phase1Json.choices?.[0]

const extractResponsesMessageContent = (content: typeof phase1Json.output extends Array<infer T> ? T['content'] : never): string => {

if (typeof content === 'string') return content

if (!Array.isArray(content)) return ''

return content

.map(part => typeof part?.text === 'string' ? part.text : '')

.join('')

}

const extractPhase1ToolCalls = () => {

if (Array.isArray(choice?.message?.tool_calls)) {

return choice.message.tool_calls

}

if (!Array.isArray(phase1Json.output)) {

return [] as Array<{ function?: { name?: string; arguments?: string } }>

}

return phase1Json.output

.filter(item => item?.type === 'function_call' || item?.type === 'tool_call')

.map(item => ({

function: {

name: typeof item?.name === 'string' ? item.name : undefined,

arguments: typeof item?.arguments === 'string' ? item.arguments : undefined,

},

}))

}

const extractPhase1Message = () => {

if (choice?.message) {

return choice.message

}

const outputMessage = Array.isArray(phase1Json.output)

? phase1Json.output.find(item => item?.type === 'message' && item?.role === 'assistant')

: undefined

if (outputMessage) {

return {

role: outputMessage.role ?? 'assistant',

content: extractResponsesMessageContent(outputMessage.content),

}

}

return {

role: 'assistant',

content: typeof phase1Json.output_text === 'string' ? phase1Json.output_text : '',

}

}

const toolCalls = extractPhase1ToolCalls()

const requestToolsCall = toolCalls.find(tc => tc.function?.name === 'request_tools')

if (!requestToolsCall) {

// Model chose to respond conversationally — return as synthetic stream

process.stderr.write('[ShimToolSearch] Phase 1 result: conversational (no tools requested)\n')

const msg = extractPhase1Message()

Copilot · 2026-04-10T18:18:50Z

+          // Start from essential tools, then add whatever the heuristic predicts
+          const wanted = new Set(ESSENTIAL_TOOL_NAMES)
+          if (predicted) {
+            for (const t of predicted) wanted.add(t)
+          }
+          const filtered = converted.filter(t => wanted.has(t.function.name))
+          const toolSet = minifyToolSchemas(filtered.length > 0 ? filtered : converted)


When predictNeededTools(...) returns null (documented as “uncertain”), the code still filters to ESSENTIAL_TOOL_NAMES only, dropping any other tools explicitly provided in params.tools (e.g., MCP tools or web/task tools). That makes “uncertain” behave like a hard filter and can remove capabilities unexpectedly. Treat null as “don’t filter” (still minify if desired), and only filter when the predictor returns a concrete Set.

Suggested change

// Start from essential tools, then add whatever the heuristic predicts

const wanted = new Set(ESSENTIAL_TOOL_NAMES)

if (predicted) {

for (const t of predicted) wanted.add(t)

}

const filtered = converted.filter(t => wanted.has(t.function.name))

const toolSet = minifyToolSchemas(filtered.length > 0 ? filtered : converted)

let selected = converted

if (predicted !== null) {

// Start from essential tools, then add whatever the heuristic predicts

const wanted = new Set(ESSENTIAL_TOOL_NAMES)

for (const t of predicted) wanted.add(t)

const filtered = converted.filter(t => wanted.has(t.function.name))

selected = filtered.length > 0 ? filtered : converted

}

const toolSet = minifyToolSchemas(selected)

Copilot · 2026-04-10T18:18:50Z

+          const names = toolSet.map(t => t.function.name)
+          const totalChars = JSON.stringify(toolSet).length
+          process.stderr.write(
+            `[ShimToolSearch] ${toolSet.length} tools (${totalChars} chars): ${names.join(', ')}\n`,
+          )


ShimToolSearch emits multiple process.stderr.write(...) lines on every request when enabled, including repeated JSON.stringify(toolSet) just to measure size. This can add noise in normal CLI usage and adds avoidable overhead. Prefer the existing debug logging utilities (e.g., logForDebugging) and/or gate these logs behind a debug flag; also avoid serializing the full tool set more than once.

Suggested change

const names = toolSet.map(t => t.function.name)

const totalChars = JSON.stringify(toolSet).length

process.stderr.write(

`[ShimToolSearch] ${toolSet.length} tools (${totalChars} chars): ${names.join(', ')}\n`,

)

if (isEnvTruthy(process.env.ENABLE_SHIM_TOOL_SEARCH_DEBUG)) {

const names = toolSet.map(t => t.function.name)

const serializedToolSet = JSON.stringify(toolSet)

const totalChars = serializedToolSet.length

process.stderr.write(

`[ShimToolSearch] ${toolSet.length} tools (${totalChars} chars): ${names.join(', ')}\n`,

)

}

Copilot · 2026-04-10T18:18:51Z

+      // ShimToolSearch: for conversational turns, skip tools entirely (phase-1 bypass)
+      if (
+        isEnvTruthy(process.env.ENABLE_SHIM_TOOL_SEARCH) &&
+        params.tools && (params.tools as unknown[]).length > 0
+      ) {
+        const msgs = Array.isArray(params.messages) ? params.messages as unknown[] : []
+        const predicted = predictNeededTools(msgs)
+        if (predicted !== null && predicted.size === 0) {
+          // Pure conversational — send with request_tools meta-tool only
+          return await self._shimToolSearchCreate(request, params, options)
+        }


ShimToolSearch introduces a new control-flow path in create() and new tool filtering/minification behavior, but there are no accompanying tests in openaiShim.test.ts. Since this repo already has extensive shim tests, add coverage for: (1) conversational prediction path (phase-1 meta-tool request + correct return shape), (2) requested-tools phase-2 retry, and (3) predictor null behavior (should not drop provided tools).

Aeshma-Daeva · 2026-04-10T18:38:08Z

@copilot apply changes based on the comments in this thread

gnanam1990

Thanks for the work here. I like the goal, and I agree the tool-schema overhead on OpenAI-compatible 3P providers is a real problem. But I’m not comfortable approving this in the current form yet.

I pulled the current head locally and verified bun run build passes. My hold is not about obvious syntax/build breakage, it’s about scope and confidence for a trust-sensitive shim path.

What gives me pause:

even though this is gated behind ENABLE_SHIM_TOOL_SEARCH=1, it still changes request construction in src/services/api/openaiShim.ts in a pretty deep way
the PR combines several behavior changes at once: schema minification, heuristic tool prediction, and a new two-phase request_tools protocol
this sits in a provider/routing-sensitive path, so build-only is not enough evidence for me

What I’d want before approval:

focused tests showing flag off preserves the exact current behavior
focused tests for flag on conversational turns using the phase-1 meta-tool path
focused tests for flag on tool-requiring turns using the predicted subset path
explicit fallback/retry coverage for wrong or incomplete prediction cases
regression coverage for provider-specific edge cases in the OpenAI-compatible shim path

I also think the request_tools meta-tool flow is effectively a new behavior contract with third-party models, not just a low-risk optimization. That can be a valid direction, but it needs stronger validation and a clearer maintainer trust story before merge.

So for now my vote is request changes / hold, not because the idea is bad, but because the blast radius is larger than the current test/story coverage supports.

Vasanthdev2004

The concept is great — reducing 8.7K tool tokens per turn on 3P providers is a real win for low-TPM tiers. And I like the opt-in gate. But there are several real bugs that would break things at runtime, plus some design concerns.

🐛 Blocker 1: Phase-1 non-streaming response format is wrong

In _shimToolSearchCreate, when the model responds conversationally (no request_tools call) in non-streaming mode, you return:

return new Response(JSON.stringify(phase1Json), {
  status: 200,
  headers: { 'content-type': 'application/json' },
})

But create() doesn't just return raw Response objects for non-streaming — it calls _convertNonStreamingResponse() to transform the OpenAI JSON into Anthropic format. Returning raw OpenAI JSON will break any caller expecting an Anthropic-shaped response.

Same issue for phase-2 non-streaming: return response at line ~1469 also returns a raw Response without conversion.

Both paths need to go through _convertNonStreamingResponse() (or convertCodexResponseToAnthropicMessage() for /responses endpoints).

🐛 Blocker 2: `httpResponse` not set for ShimToolSearch path

The create() method tracks httpResponse for the .withResponse() accessor:

httpResponse = response  // normal path

But _shimToolSearchCreate() returns early before this assignment, so withResponse() will always get new Response() (the fallback). This breaks x-request-id and response header access for any ShimToolSearch request.

🐛 Blocker 3: Phase-1 assumes `/chat/completions` response shape

Phase-1 does phase1Response.json() and reads choices[0].message.tool_calls. But _doRequest() can route to /responses endpoints (Codex, GitHub models). Those return a completely different response shape. The code needs to handle both response formats, or at least check which transport is in use.

⚠️ Concern 4: `predictNeededTools` returns `null` → drops non-essential tools

In the _doOpenAIRequest path (line ~1543), when predictNeededTools returns null (meaning "uncertain"), the code filters to ESSENTIAL_TOOL_NAMES only:

const wanted = new Set(ESSENTIAL_TOOL_NAMES)
if (predicted) {  // null → skip this
  for (const t of predicted) wanted.add(t)
}

But what about tools the caller explicitly requested via params.tools that aren't in ESSENTIAL_TOOL_NAMES? They get silently dropped. The null case should probably fall through to sending all tools (the current behavior), not strip to essentials.

⚠️ Concern 5: No tests

This is a significant new control-flow path with heuristic prediction, two-phase protocol, message injection, and response synthesis. The existing openaiShim.test.ts has extensive coverage — this feature needs at least:

Phase-1 conversational (no tools requested) → synthetic stream
Phase-1 requests tools → phase-2 re-request
predictNeededTools classification tests
minifyToolSchemas / stripParamDescriptions unit tests
AbortSignal propagation through both phases

💡 Non-blocking:

process.stderr.write on every request — multiple debug lines per turn including JSON.stringify(toolSet) is noisy for CLI usage. Consider gating these behind a DEBUG/VERBOSE env var, or at least only log on phase transitions.
Unrelated change mixed in: The extra_content?.google optional chaining fix (lines 971, 1839) is a real bug fix but unrelated to ShimToolSearch. Would be cleaner as a separate PR, but not blocking.
TOOL_DIRECTORY is hardcoded — if new tools are added upstream, this becomes stale. Consider deriving from the actual tool list or at least adding a comment noting this.
extractUserQuery strips XML tags — the regex-based stripping of <system-reminder> and <context> tags is fragile. A single malformed tag could leak into the heuristic.

Verdict: Needs changes — blockers 1-3 will cause runtime failures. Concern 4 is a silent behavior regression. Concern 5 (no tests) is important for a feature this complex. The idea is solid; the implementation needs these fixes.

FluxLuFFy · 2026-04-11T15:03:54Z

@Vasanthdev2004 I will create pr according to this issue as well with well formatted structure and code but I am working on gemini mistral etc

Vasanthdev2004 · 2026-04-11T15:05:45Z

okay man @FluxLuFFy

Aeshma-Daeva · 2026-04-11T16:07:13Z

@Vasanthdev2004 @gnanam1990 Thank you both for the technical feedback. That was my main goal before doing focused work. This didnt have the deserved attention as im juggling between projects. With the feedback i can precisely tackle issues in design, syntax, formatting and dependecies. The current code is in a primitive experimental phase so i welcome errors as much as feedbacks. I`m thinking of ways to implement flag on and off without the ON flag being too destructive as well as not adding too much new functions for routers. Making this solution 3P agnostic is a complex issue since there are different paths for each as well as being hardcoded and bad for long term architecture. I will sit with all you both said and work on the solution.

auriti

Review: ShimToolSearch — lazy tool loading for 3P providers

Great concept — the 90% token reduction is real and addresses a genuine pain point for TPM-constrained providers (Groq free tier, Cerebras, small Ollama setups). The minification approach is clean and composable, and the feature gating is solid (zero risk when flag is off).

However, I found 2 critical issues and 5 major issues that need addressing before this can merge safely.

Critical Issues

C1: 10 of ~19 tools are silently unreachable

TOOL_DIRECTORY lists 9 tools, and ESSENTIAL_TOOL_NAMES is derived from it. When predictNeededTools returns null (no keyword matches — the "uncertain" case), the path in _doOpenAIRequest filters to ESSENTIAL_TOOL_NAMES only. This means tools like WebSearch, WebFetch, Skill, TaskCreate, TaskUpdate, MultiEdit, NotebookEdit are never sent unless the model explicitly requests them via request_tools — but the model can't know they exist because they're not in the directory either.

Example: user says "search the web for React 19 release notes" → predictNeededTools returns null (no keyword match for "web") → only 9 essential tools sent → WebSearch is missing → model can't search.

Fix: Either (a) add all ~19 tools to TOOL_DIRECTORY, or (b) invert the null semantics so uncertain = send all tools minified.

C2: null prediction semantics are inverted

predictNeededTools returns:

Set([]) → conversational (no tools needed)
Set([...]) → specific tools predicted
null → uncertain (no keywords matched)

The "uncertain" case should be the safest fallback — send all tools (minified) so nothing is lost. Instead, it currently sends only 9 essential tools, silently dropping capabilities. This is the wrong default for an uncertain prediction.

Fix: When predicted === null, set wanted to all tool names (not just essential), then minify.

Major Issues

M1: No try/catch on Phase 1 JSON.parse

const args = JSON.parse(requestToolsCall.function?.arguments ?? '{}')

If the model returns malformed JSON (common with smaller models), this crashes the entire request. The existing catch block is empty but the try scope only covers the parse — requestedNames stays [], which means Phase 2 sends essential + all tools. This is actually fine as a fallback, but the empty catch should at least log a warning.

M2: Phase 1 is always non-streaming

Even when the caller requests stream: true, Phase 1 forces stream: false and waits for the full response. For conversational turns (the most common case when this feature activates), this adds noticeable latency — the user sees nothing until the entire response is generated, then it's wrapped in a synthetic stream.

Consider: for the conversational bypass, could you stream Phase 1 directly and detect request_tools calls in the stream? Or at minimum, document this latency trade-off.

M3: Mixed Phase 1 response (tool_call + content) discards content

If the model returns both a request_tools call AND conversational content in the same message, the content is discarded and Phase 2 starts fresh. The model's initial reasoning/response is lost.

M4: Conversational detection can suppress tools for ambiguous queries

"how does the authentication work in this codebase" → matches ^how as conversational, doesn't contain any exclusion keywords (file|code|test|build|... — note "codebase" ≠ "code"), so it's classified as conversational. But the user likely wants the model to read auth-related files.

The exclusion list should include: codebase, repository, repo, project, source, module, component.

M5: TOOL_DIRECTORY is incomplete

Only 9 of ~19 tools are listed. Even if C1/C2 are fixed, the Phase 1 system prompt injection only shows these 9 tools to the model. The model can't request tools it doesn't know about.

Missing tools that users commonly need: WebSearch, WebFetch, Skill, TaskCreate, TaskUpdate, MultiEdit.

Minor Issues

stderr logging is fine for an opt-in beta feature, but should be gated behind a DEBUG or VERBOSE flag before GA
Keyword heuristics are English-only — acceptable for v1 but worth noting
The independent bugfix (optional chaining on tc.extra_content?.google) is correct and should be extracted into a separate tiny PR so it can merge independently

What works well

Feature gating: Perfect — ENABLE_SHIM_TOOL_SEARCH=1 is clean, zero behavior change when off
Minification pipeline: truncateToolDescription + stripParamDescriptions + minifyToolSchemas is well-composed and reusable
Dependency inference: Edit → also Bash+Read is smart and prevents common "missing tool" failures
Measured results: 7/7 real API calls with data — this is the right way to validate

Suggested path forward

Fix C1+C2: when predicted === null, send ALL tools minified (not just essential)
Expand TOOL_DIRECTORY to all tools
Add the missing exclusion keywords to conversational detection
Add unit tests for predictNeededTools (at least 5-6 cases covering conversational, tool-requiring, ambiguous, and edge cases)
Extract the optional chaining bugfix into a separate 1-line PR

Happy to help with any of these — I've been working on the shim recently (#783) and know the code well.

auriti · 2026-04-20T08:22:19Z

+  },
+}
+
+/**


auriti · 2026-04-20T08:22:20Z

+ */
+function stripParamDescriptions(schema: Record<string, unknown>): Record<string, unknown> {
+  const out: Record<string, unknown> = {}
+  for (const [k, v] of Object.entries(schema)) {


Minor — sentence boundary regex may over-match

The regex [\s\S]{30,}?[.!?](\s|\n|$) uses [\s\S] which matches newlines. A description like:

Execute shell commands. IMPORTANT: Always check exit codes.

would match at the first . after "commands" (good), but the {30,}? minimum means very short first sentences (< 30 chars) fall through to the word-boundary fallback. This is fine for most tools but worth documenting.

auriti · 2026-04-20T08:22:20Z

+
+  const isConversational = /^(what|who|why|how|when|where|explain|describe|tell me|is it|can you|do you|list|summarize|overview|difference between|compare|pros and cons)/.test(t.trim())
+    && !/file|code|function|class|test|build|run|install|create|write|edit|implement|fix|debug/.test(t)
+


C1 — TOOL_DIRECTORY is incomplete (9/~19 tools)

Missing tools that users commonly need:

WebSearch / WebFetch — web queries

Skill — slash command execution

TaskCreate / TaskUpdate — task management

MultiEdit — batch file edits

NotebookEdit — Jupyter notebooks

Since ESSENTIAL_TOOL_NAMES is derived from this directory, these tools are unreachable when prediction returns null.

auriti · 2026-04-20T08:22:20Z

+
 // ---------------------------------------------------------------------------
 // Types — minimal subset of Anthropic SDK types we need to produce
 // ---------------------------------------------------------------------------


M1 — Empty catch swallows parse errors silently

try { const args = JSON.parse(requestToolsCall.function?.arguments ?? '{}') requestedNames = Array.isArray(args.tools) ? args.tools : [] } catch { requestedNames = [] }

The fallback to [] is correct (Phase 2 will send essential + all tools), but a process.stderr.write warning here would help debugging when smaller models return malformed JSON. Silent failures are hard to diagnose.

auriti · 2026-04-20T08:22:20Z

+  for (let i = messages.length - 1; i >= 0; i--) {
+    const m = messages[i] as { role?: string; content?: unknown }
+    if (m.role !== 'user') continue
+    let rawText = ''


C2 — null prediction should send ALL tools minified, not just essential

In _doOpenAIRequest, when predicted is null (uncertain):

const wanted = new Set(ESSENTIAL_TOOL_NAMES) if (predicted) { for (const t of predicted) wanted.add(t) }

predicted === null means "I don't know what tools are needed" — the safe default should be to send everything (minified), not restrict to 9 essential tools. This silently drops WebSearch, WebFetch, Skill, etc.

Suggested fix:

if (predicted === null) { // Uncertain — send all tools minified body.tools = minifyToolSchemas(converted) } else { const wanted = new Set([...ESSENTIAL_TOOL_NAMES, ...predicted]) const filtered = converted.filter(t => wanted.has(t.function.name)) body.tools = minifyToolSchemas(filtered.length > 0 ? filtered : converted) }

Aeshma-Daeva · 2026-05-05T22:14:25Z

@Vasanthdev2004 @auriti @FluxLuFFy @gnanam1990
I pushed a redesign of this PR in 4a54906 based on the review feedback.

The main change is that ShimToolSearch is no longer lazy-first. The safe path
is now non-destructive schema minification:

OPENAI_SHIM_TOOL_MODE=off: current behavior
OPENAI_SHIM_TOOL_MODE=minify: send every available tool, but strip verbose
tool/parameter prose
OPENAI_SHIM_TOOL_MODE=predict: narrow only when confident; uncertain sends
all tools minified
OPENAI_SHIM_TOOL_MODE=lazy: experimental two-phase request_tools
protocol

Review feedback addressed:

Fixed phase-1 request_tools schema shape.
Fixed system prompt injection through params.system.
Expanded TOOL_DIRECTORY.
Fixed uncertain prediction semantics: predicted === null sends all tools
minified.
Added web/current-query prediction.
Added repo/codebase exclusions to avoid suppressing tools.
Added malformed phase-1 JSON fallback.
Added focused tests.
Removed the unrelated optional-chaining bugfix from this PR scope.

DeepSeek V4 Pro via NanoGPT — Context Compression Benchmark

Mode off (no compression): 10/10 success, 114,543 input tokens, $0.6506

Mode minify: 10/10 success, 33,578 input tokens, $0.2483

Minify cuts token usage by 70.7% and cost by 61.8% with zero loss in success rate.

Validated locally:

bun test src/services/api/openaiShim.test.ts
bun run build

Highly appreciate all your feedback and attention. I`m open to every discussion.

Vasanthdev2004

Thanks for the redesign update. The new mode split (off / minify / predict / lazy) is a better direction than the original lazy-first approach, but I cannot treat the current head as merge-reviewable yet.

Review scope: Targeted maintainer triage of the current head state and high-risk surface, not a full code approval review.

Verdict: Needs changes

Blocking issue:

The PR is currently merge-conflicting/dirty against main and has no visible PR checks on the current head. This touches src/services/api/openaiShim.ts, which is a high-risk provider-routing/request-shaping surface, so it needs a clean rebase and green CI before we can safely review the behavior.

Please rebase onto latest main, keep the diff focused, and push a head with CI checks. After that, I can do a real current-head review focused on:

default off behavior being byte-for-byte equivalent for existing providers
whether minify preserves required tool schema semantics
whether predict has safe fallback semantics when uncertain
whether lazy is clearly experimental and cannot accidentally become default
tests for each mode in openaiShim.test.ts

Non-blocking direction note:

The safe minify mode is the part that looks most mergeable. predict and especially lazy may still need separate product/trust-model discussion depending on how much they alter tool availability.

Aeshma-Daeva · 2026-05-06T09:22:37Z

@Vasanthdev2004
Rebased onto latest main (6af709e) and force-pushed the cleaned branch.

Current head: d26c310

Also removed the unrelated optional-chaining fix from this PR so the diff is
focused on:

src/services/api/openaiShim.ts
src/services/api/openaiShim.test.ts
scripts/benchmarks/shim-tool-minify-live.sh

Local validation:

bun test src/services/api/openaiShim.test.ts -> 92 pass
bun run build -> pass

Vasanthdev2004

Scope: Full review of the current ShimToolSearch head (d26c310) with focus on the OpenAI-compatible shim/tool-routing surface.

Verdict: Approve-ready

What I checked:

The feature is still opt-in and defaults to off, so the normal OpenAI-compatible tool path remains unchanged unless OPENAI_SHIM_TOOL_MODE/legacy ENABLE_SHIM_TOOL_SEARCH is set.
The current implementation is much safer than the earlier version: minify, predict, and lazy are explicit modes; debug logging is gated; malformed request_tools JSON falls back to all tools; and phase 2 now routes through the normal _doOpenAIRequest path instead of bypassing provider shaping.
Tool schema reduction preserves required structure while stripping prose, and prediction falls back to all tools when uncertain.
The added tests cover minify mode, uncertain predict fallback, web-tool prediction, lazy phase-1 schema shape, and malformed lazy phase-2 fallback.

Verification I ran locally:

bun test src/services/api/openaiShim.test.ts passed: 92/92.
bun run build passed. It emitted the existing external-list warnings, but exited successfully and produced the CLI/SDK bundles.

I do not see a remaining blocker on the current head. GitHub is not showing fresh check runs for this older PR from my view, so this is code-approval from my side pending any normal maintainer CI rerun policy.

jatmn

Thanks for following up on the earlier review. The requested rebase/focused-diff issue looks addressed, and the earlier predict fallback/tool-directory concerns are covered by the current code and tests. I found one remaining issue below.

Findings

[P1] Convert lazy-mode non-streaming responses before returning
src/services/api/openaiShim.ts:1660
The lazy ShimToolSearch branch returns _shimToolSearchCreate() directly from create(), so the normal non-streaming conversion block at lines 1680-1715 never runs. In the common non-streaming SDK path, both lazy phase-1 conversational responses and phase-2 responses therefore resolve to a raw Response object/OpenAI JSON instead of an Anthropic-shaped message, and withResponse() also loses the real HTTP response because httpResponse is never assigned on this early return. The new lazy tests only assert the outbound request bodies, so this regression passes today. Please route the helper result back through the same conversion/httpResponse handling as _doRequest, or have the helper return an already-converted Anthropic message for non-streaming calls and add coverage that asserts message.content/withResponse().response.

Aeshma-Daeva · 2026-05-13T10:20:27Z

Thanks for catching that bug. I pushed a focused fix in 8afab76 that routes lazy-mode non-streaming results through the shared response conversion path and preserves the real HTTP response for withResponse(). I also extended the lazy tests to assert Anthropic-shaped message.content and withResponse().response/request_id for the phase-2 path.\n\nVerification: bun test src/services/api/openaiShim.test.ts (92/92 passing).

jatmn

Thanks for following up on the earlier review. The requested lazy non-streaming conversion and withResponse() handling look addressed now: the helper result is routed back through _convertCreateResponse(), and the real phase response is stored for withResponse(). I found one remaining issue below.

Findings

[P2] Apply ShimToolSearch tool reduction to Responses API requests too
src/services/api/openaiShim.ts:2188
The new minify/predict/lazy logic only assigns the reduced tool set to the chat-completions body.tools path. When request.transport === 'responses', _doOpenAIRequest() sends buildResponsesBody() instead, and that builder reconverts params.tools directly at line 2188, ignoring convertedToolOverride and the selectShimToolSet() result. As a result, OPENAI_API_FORMAT=responses still sends the full original tool schemas for minify/predict, and lazy phase 2 also sends all original tools even after _shimToolSearchCreate() computes a filtered/minified toolSet. This is an opt-in feature, but it makes the advertised token reduction silently not work for Responses-format OpenAI-compatible routes and leaves the new tests blind to that transport. Please thread the selected/minified tool set through the Responses body path as well, with coverage that sets OPENAI_API_FORMAT=responses and asserts the outbound tools payload.

Aeshma-Daeva · 2026-05-13T16:23:24Z

Thanks, @jatmn. I pushed 8b1229e to address the latest Responses API gap.

What changed for the explicit request:

_doOpenAIRequest() now threads the already selected/minified ShimToolSearch tool set into buildResponsesBody() instead of rebuilding Responses tools from raw params.tools.
minify, predict, and lazy phase 2 now preserve the reduced tool payload when OPENAI_API_FORMAT=responses.
Added Responses-format coverage that asserts the outbound tools payload is reduced/minified.

I also used the same line of thought from your last two reviews to run a more methodical adjacent-failure search: transport parity, shape falsification, fallback ablation, and metadata preservation. That uncovered and covered a few nearby issues:

lazy phase 1 now recognizes Responses output function calls for request_tools, not only chat choices[].message.tool_calls.
Responses tool_choice is now preserved and translated for named tools plus auto / any / none scalar modes.
malformed lazy request_tools JSON is covered for Responses and falls back to all minified tools.
local toolless self-heal on Responses now has coverage proving stale tools and tool_choice are removed on retry.
token audit now handles both chat tool shape and Responses tool shape without throwing.

Verification on the PR-head worktree:

bun test src/services/api/openaiShim.test.ts -> 100 pass, 255 expect
bun run build -> passed; existing external-list warnings only
git diff --cached --check before commit -> passed

Typecheck note: the repo-wide typecheck still reports existing strictness errors in openaiShim.ts / tests unrelated to this change, so I treated build + focused shim tests as the reliable verification signal here.

jatmn

Thanks for following up on the latest review. The requested Responses API tool-reduction fix looks addressed now, and the earlier lazy non-streaming conversion / withResponse() issue is still covered by the current code and tests.

No issues here, LGTM.

techbrewboss

Review summary

Thanks for continuing to tighten this up. The minify path looks valuable, and the current head fixes the earlier lazy conversion and Responses transport issues. I found one remaining API-contract issue around forced tool_choice interacting with the new tool-selection modes.

Findings

src/services/api/openaiShim.ts:1945 / src/services/api/openaiShim.ts:2407 - Forced tool_choice can name a tool that ShimToolSearch removes.
Impact: In predict mode, conversational prompts can produce an empty predicted tool set, so selectShimToolSet() sends tools: [] and drops tool_choice entirely. In lazy mode, the same conversational detection enters phase 1 with only request_tools, but the original forced tool_choice can still name a different tool. For example, with tool_choice: { type: 'tool', name: 'WebSearch' } and prompt What is 2+2?, predict sends no tools, while lazy sends only request_tools plus tool_choice: WebSearch, which many OpenAI-compatible providers will reject because the selected tool is absent.
Suggested fix: When params.tool_choice forces a named tool, always include that tool in the selected/minified tool set, and skip lazy phase-1 indirection unless the forced tool is request_tools. Please add coverage for predict and lazy with a non-core forced tool such as WebSearch.

Validation

I ran bun test src/services/api/openaiShim.test.ts on the PR head locally and it passed: 106/106. I also reproduced the forced-tool mismatch with small inline calls against createOpenAIShimClient().

Aeshma-Daeva · 2026-05-14T01:53:44Z

Thanks, @techbrewboss. I pushed 25e9b10 to address the forced tool_choice contract issue you found.

What changed for the reported case:

params.tool_choice: { type: "tool", name: ... } is now treated as part of ShimToolSearch selection, so predict mode keeps the forced tool in the selected/minified set instead of allowing tools: [] for conversational prompts.
lazy mode now skips the phase-1 request_tools indirection when the caller has forced a real tool. It only enters phase 1 when no forced tool exists, or when the forced tool is actually request_tools.
named provider tool_choice is emitted only when the matching tool schema is present in the outgoing payload, avoiding invalid requests where the selected tool is absent.

I also used the same methodology from the last review cycle rather than stopping at the exact reproduction:

transport parity: checked chat-completions and Responses payloads for the same invariant.
ablation: tested conversational prompts where prediction returns an empty set, because that is where reductions are most likely to erase tools.
misformation/missing-schema cases: covered forced names whose schema was not supplied, so the shim drops the invalid named choice instead of fabricating a schema or sending a broken payload.
silent-failure search: found and fixed an adjacent Responses issue where buildResponsesBody() could re-expand raw params.tools after predict intentionally selected no tools.
scalar edge case: covered tool_choice: any with an empty predicted tool set so Responses does not send required without tools.

New focused coverage includes forced non-core WebSearch for predict and lazy, Responses parity, missing forced schema behavior, and the empty-prediction Responses case.

Validation on the PR-head worktree:

bun test src/services/api/openaiShim.test.ts -> 113 pass, 282 expect

The original mismatch you reproduced with tool_choice: WebSearch and What is 2+2? should now be resolved: predict includes WebSearch, and lazy no longer sends request_tools with a stale forced WebSearch choice.

Adds keyword-heuristic tool prediction + schema minification, gated behind ENABLE_SHIM_TOOL_SEARCH=1 (off by default, zero behavioral change without it). When enabled: - Conversational turns: sends only a request_tools meta-tool (phase 1) then re-requests with needed tools if needed (phase 2) - Tool-requiring turns: filters to essential + predicted tools, strips verbose descriptions/param docs, truncates to first-sentence Measured: 63KB → 5KB tool payloads (90%+ reduction) on Cerebras. Single file, no new dependencies.

Keep ShimToolSearch reductions aligned with forced tool_choice payloads across chat and responses transports. Prevent responses requests from re-expanding original tools after predict selects an empty set.

Aeshma-Daeva · 2026-05-14T01:59:59Z

Resolved the merge conflict by rebasing feat/shim-tool-search onto latest main and keeping both sides of the env setup conflict in openaiShim.test.ts (MIMO_API_KEY from main plus the ShimToolSearch env vars from this PR).

Current PR head: 7ce40f1

Validation after the rebase:

bun test src/services/api/openaiShim.test.ts -> 114 pass, 286 expect

GitHub now reports the PR as mergeable from my side.

jatmn

Findings

[P1] Preserve the tool_choice: any contract when prediction returns no tools
src/services/api/openaiShim.ts:626
selectShimToolSet() returns [] for conversational prompts unless a named tool is forced, and getForcedToolChoiceName() does not treat type: 'any' as forced. The later request-shaping block only emits tool_choice when body.tools.length > 0, so predict mode turns Anthropic's "must use a tool" contract into a no-tool request for prompts like What is 2+2? with tool_choice: { type: 'any' }. That is a real behavioral regression for callers that rely on mandatory tool execution. Please bypass reduction for type: 'any', or at least keep a non-empty tool set so the request still satisfies the contract.
[P2] Path-based repo/file prompts are still misclassified as conversational
src/services/api/openaiShim.ts:548
The conversational gate only excludes prompts containing generic words like file, code, module, etc. Prompts such as summarize src/services/api/openaiShim.ts or what does docs/advanced-setup.md say about profiles? contain an explicit repository path but none of those keywords, so predictNeededTools() returns an empty set and predict / lazy send no tools. That is a concrete failure mode for common code-review and docs-review prompts: the model is asked about a specific file but is not given a way to read it. The heuristic should treat path-like tokens (/, \, .ts, .md, etc.) as file/codebase requests, or fall back to the uncertain/all-tools path instead.
[P3] The benchmark helper will recursively delete any caller-supplied root path
scripts/benchmarks/shim-tool-minify-live.sh:10
The script does rm -rf "$ROOT" before validating the target. Passing /tmp, /, or a mistyped project path will wipe it recursively. Since this is a checked-in helper script, it should at least reject empty/root-like paths and enforce an expected benchmark-directory prefix before deleting anything.

Aeshma-Daeva · 2026-05-14T14:30:03Z

@jatmn hmm at this point i'm inclining towards removing the 2 other modes (predict, lazy) from production openshim code while keeping them under development. Minify is safe enough to be a viable option without too much destructive behavior on openshim as well as to not have to work simultaneously on 3 modes that affect it. The 2 remaining modes clearly needs more focused work and iterations that were more focused on other parts of the solution. Also that`s just 1(tool_schema) of the token-heavy parts the architecture has. So i'm currently running TokenDash, hindisight and ai-code-monitor with the openclaude's telemetry turned on to map it out. I'd rather focus on an overall solution hypothesis than iterating the local issue.

jatmn · 2026-05-14T18:48:40Z

@jatmn hmm at this point i'm inclining towards removing the 2 other modes (predict, lazy) from production openshim code while keeping them under development. Minify is safe enough to be a viable option without too much destructive behavior on openshim as well as to not have to work simultaneously on 3 modes that affect it. The 2 remaining modes clearly needs more focused work and iterations that were more focused on other parts of the solution. Also that`s just 1(tool_schema) of the token-heavy parts the architecture has. So i'm currently running TokenDash, hindisight and ai-code-monitor with the openclaude's telemetry turned on to map it out. I'd rather focus on an overall solution hypothesis than iterating the local issue.

Yea rescoping down to a smaller pr would probably be best,
You can either edit this current PR and submit your changes to match your PR
or we can close this one and you can open fresh PR(s)

Aeshma-Daeva · 2026-05-14T18:57:17Z

I'll edit this one out to match the pr. @jatmn appreciate the feedback, first PR ever tbh.

techbrewboss

Review summary

Thanks for continuing to tighten this up. The minify path looks genuinely valuable for OpenAI-compatible providers, but I found a few current-head issues that should block merge: the PR is still dirty against main, and predict mode still breaks some tool-use contracts/common file prompts.

Findings

PR state - Current head is merge-conflicting against main.
Impact: GitHub reports mergeable: false / mergeable_state: dirty, with no status check rollup on the current head. This touches src/services/api/openaiShim.ts, a high-risk provider/request-shaping path, so it should not merge until CI runs on the actual merge result.
Suggested fix: Rebase onto latest Gitlawb/openclaude:main, resolve conflicts, and rerun CI.
src/services/api/openaiShim.ts:626 - tool_choice: { type: "any" } can be silently dropped.
Impact: In predict mode, conversational prompts return an empty selected tool set. Because getForcedToolChoiceName() only handles named tools and the request code only emits tool_choice when tools are non-empty, Anthropic's "must use a tool" contract becomes a no-tool request. I reproduced this with What is 2+2? plus tool_choice: { type: "any" }, which sent {"tools":[]}.
Suggested fix: Treat type: "any" as a forced non-empty tool requirement: bypass reduction, or send all/minified tools and preserve required.
src/services/api/openaiShim.ts:548 - Path-based file prompts are still classified as conversational.
Impact: Prompts like summarize src/services/api/openaiShim.ts start with summarize and do not hit the exclusion regex, so predict sends tools: []. That breaks common code/docs review requests where the user names a file path but does not say "read file."
Suggested fix: Detect path-like tokens/extensions before conversational classification, or treat path-looking prompts as uncertain so all minified tools are available.
scripts/benchmarks/shim-tool-minify-live.sh:10 - Benchmark script recursively deletes caller-supplied paths.
Impact: ROOT="${1:-/tmp/openclaude-shim-bench}" followed by rm -rf "$ROOT" can wipe an accidental path such as /tmp, $HOME/foo, or a mistyped repo path.
Suggested fix: Validate ROOT is non-empty, not /, under an expected temp prefix, and preferably create/use a fresh mktemp -d directory.

Validation

bun test src/services/api/openaiShim.test.ts passed: 114/114.
bun run build passed.
bun run security:pr-scan reported one medium finding on the new benchmark script.
Manual request-shaping probes reproduced the tool_choice: any and path-prompt issues.

jatmn · 2026-06-16T04:25:56Z

closing as abandoned

Copilot AI review requested due to automatic review settings April 10, 2026 18:12

Copilot started reviewing on behalf of Aeshma-Daeva April 10, 2026 18:13 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

kevincodex1 requested review from Vasanthdev2004, anandh8x, auriti and gnanam1990 April 11, 2026 13:29

gnanam1990 requested changes Apr 11, 2026

View reviewed changes

Vasanthdev2004 requested changes Apr 11, 2026

View reviewed changes

auriti suggested changes Apr 20, 2026

View reviewed changes

Aeshma-Daeva requested review from Vasanthdev2004, auriti and gnanam1990 May 5, 2026 22:15

Vasanthdev2004 requested changes May 6, 2026

View reviewed changes

Aeshma-Daeva force-pushed the feat/shim-tool-search branch from 4a54906 to d26c310 Compare May 6, 2026 09:18

Aeshma-Daeva changed the title ~~feat: opt-in ShimToolSearch — lazy tool loading for 3P providers~~ feat: opt-in ShimToolSearch — Minify tool loading for 3P providers May 6, 2026

Aeshma-Daeva requested a review from Vasanthdev2004 May 6, 2026 10:59

Vasanthdev2004 previously approved these changes May 7, 2026

View reviewed changes

jatmn requested changes May 13, 2026

View reviewed changes

Aeshma-Daeva dismissed Vasanthdev2004’s stale review via 8afab76 May 13, 2026 10:20

Aeshma-Daeva requested a review from jatmn May 13, 2026 10:23

jatmn requested changes May 13, 2026

View reviewed changes

Aeshma-Daeva force-pushed the feat/shim-tool-search branch from 8b1229e to 70f2c22 Compare May 13, 2026 16:33

jatmn previously approved these changes May 13, 2026

View reviewed changes

jatmn requested a review from techbrewboss May 13, 2026 17:25

techbrewboss requested changes May 13, 2026

View reviewed changes

Aeshma-Daeva dismissed jatmn’s stale review via 25e9b10 May 14, 2026 01:53

Aeshma-Daeva added 5 commits May 13, 2026 22:57

fix: harden shim tool schema reduction

4fb324f

fix: convert lazy shim responses

10b420c

fix(shim): apply tool reduction to responses requests

1502ee8

fix(shim): preserve forced tool choices

7ce40f1

Keep ShimToolSearch reductions aligned with forced tool_choice payloads across chat and responses transports. Prevent responses requests from re-expanding original tools after predict selects an empty set.

Aeshma-Daeva force-pushed the feat/shim-tool-search branch from 25e9b10 to 7ce40f1 Compare May 14, 2026 01:59

Aeshma-Daeva requested a review from techbrewboss May 14, 2026 02:05

jatmn requested changes May 14, 2026

View reviewed changes

techbrewboss requested changes May 15, 2026

View reviewed changes

gnanam1990 approved these changes May 25, 2026

View reviewed changes

jatmn closed this Jun 16, 2026

-    const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [REQUEST_TOOLS_SCHEMA] as unknown as typeof params.tools }
+    const requestToolsPhase1Tool = {
+      name: 'request_tools',
+      description: 'Request one or more tools from the available tool directory to continue solving the task.',
+      input_schema: {
+        type: 'object',
+        properties: {
+          tools: {
+            type: 'array',
+            description: 'Names of the tools needed for the next step.',
+            items: {
+              type: 'string',
+            },
+          },
+          rationale: {
+            type: 'string',
+            description: 'Brief explanation of why these tools are needed.',
+          },
+        },
+        required: ['tools'],
+        additionalProperties: false,
+      },
+    }
+    const phase1Params = { ...params, stream: false, messages: phase1Messages, tools: [requestToolsPhase1Tool] as typeof params.tools }

-    }
-    const choice = phase1Json.choices?.[0]
-    const toolCalls = choice?.message?.tool_calls ?? []
-    const requestToolsCall = toolCalls.find(tc => tc.function?.name === 'request_tools')
-    if (!requestToolsCall) {
-      // Model chose to respond conversationally — return as synthetic stream
-      process.stderr.write('[ShimToolSearch] Phase 1 result: conversational (no tools requested)\n')
-      const msg = choice?.message ?? { role: 'assistant', content: '' }
+      output?: Array<{
+        type?: string
+        role?: string
+        content?: string | null | Array<{
+          type?: string
+          text?: string
+        }>
+        name?: string
+        arguments?: string
+      }>
+      output_text?: string | null
+    }
+    const choice = phase1Json.choices?.[0]
+    const extractResponsesMessageContent = (content: typeof phase1Json.output extends Array<infer T> ? T['content'] : never): string => {
+      if (typeof content === 'string') return content
+      if (!Array.isArray(content)) return ''
+      return content
+        .map(part => typeof part?.text === 'string' ? part.text : '')
+        .join('')
+    }
+    const extractPhase1ToolCalls = () => {
+      if (Array.isArray(choice?.message?.tool_calls)) {
+        return choice.message.tool_calls
+      }
+      if (!Array.isArray(phase1Json.output)) {
+        return [] as Array<{ function?: { name?: string; arguments?: string } }>
+      }
+      return phase1Json.output
+        .filter(item => item?.type === 'function_call' || item?.type === 'tool_call')
+        .map(item => ({
+          function: {
+            name: typeof item?.name === 'string' ? item.name : undefined,
+            arguments: typeof item?.arguments === 'string' ? item.arguments : undefined,
+          },
+        }))
+    }
+    const extractPhase1Message = () => {
+      if (choice?.message) {
+        return choice.message
+      }
+      const outputMessage = Array.isArray(phase1Json.output)
+        ? phase1Json.output.find(item => item?.type === 'message' && item?.role === 'assistant')
+        : undefined
+      if (outputMessage) {
+        return {
+          role: outputMessage.role ?? 'assistant',
+          content: extractResponsesMessageContent(outputMessage.content),
+        }
+      }
+      return {
+        role: 'assistant',
+        content: typeof phase1Json.output_text === 'string' ? phase1Json.output_text : '',
+      }
+    }
+    const toolCalls = extractPhase1ToolCalls()
+    const requestToolsCall = toolCalls.find(tc => tc.function?.name === 'request_tools')
+    if (!requestToolsCall) {
+      // Model chose to respond conversationally — return as synthetic stream
+      process.stderr.write('[ShimToolSearch] Phase 1 result: conversational (no tools requested)\n')
+      const msg = extractPhase1Message()

+                },
+              }
+              /**


		const isConversational = /^(what\|who\|why\|how\|when\|where\|explain\|describe\|tell me\|is it\|can you\|do you\|list\|summarize\|overview\|difference between\|compare\|pros and cons)/.test(t.trim())
		&& !/file\|code\|function\|class\|test\|build\|run\|install\|create\|write\|edit\|implement\|fix\|debug/.test(t)

Uh oh!

Conversation

Aeshma-Daeva commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why this matters: TPM arithmetic

Modes

How it works

Measured results

Review feedback addressed (since v1)

Scope

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Aeshma-Daeva commented Apr 10, 2026

Uh oh!

gnanam1990 left a comment

Choose a reason for hiding this comment

Uh oh!

Vasanthdev2004 left a comment

Choose a reason for hiding this comment

🐛 Blocker 1: Phase-1 non-streaming response format is wrong

🐛 Blocker 2: httpResponse not set for ShimToolSearch path

🐛 Blocker 3: Phase-1 assumes /chat/completions response shape

⚠️ Concern 4: predictNeededTools returns null → drops non-essential tools

⚠️ Concern 5: No tests

💡 Non-blocking:

Uh oh!

FluxLuFFy commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vasanthdev2004 commented Apr 11, 2026

Uh oh!

Aeshma-Daeva commented Apr 11, 2026

Uh oh!

auriti left a comment

Choose a reason for hiding this comment

Review: ShimToolSearch — lazy tool loading for 3P providers

Critical Issues

Major Issues

Minor Issues

What works well

Suggested path forward

Uh oh!

auriti Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

auriti Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

auriti Apr 20, 2026

Aeshma-Daeva commented Apr 10, 2026 •

edited

Loading

🐛 Blocker 2: `httpResponse` not set for ShimToolSearch path

🐛 Blocker 3: Phase-1 assumes `/chat/completions` response shape

⚠️ Concern 4: `predictNeededTools` returns `null` → drops non-essential tools

FluxLuFFy commented Apr 11, 2026 •

edited

Loading