-
Notifications
You must be signed in to change notification settings - Fork 234
Feature Request: Support defer_loading and tool_search for dynamic tool discovery (parity with Anthropic & OpenAI) #1416
Description
Summary
Both Anthropic (Claude) and OpenAI (GPT-5.4+) now support deferred tool loading and tool search at the API level, enabling efficient management of large tool sets without consuming context tokens upfront. The Gemini API currently lacks this capability.
Problem
When building AI agents with many tools (30+), two problems compound:
- Context bloat: All tool schemas must be loaded into the model's context upfront. A typical multi-server MCP setup can consume 50k+ tokens in tool definitions alone before the model does any actual work.
- Tool selection accuracy: Model ability to correctly pick the right tool degrades significantly once you exceed 30-50 available tools.
The current workaround is client-side pre-filtering (selecting tools before each API call), but this breaks prompt caching because the tools parameter changes between requests, invalidating the prefix cache.
What Anthropic and OpenAI have
Anthropic (tool_search_tool_regex_20251119 / tool_search_tool_bm25_20251119)
- Tools marked with
defer_loading: trueare sent in the request but NOT rendered into Claude's context - A
tool_searchserver-side tool lets Claude search deferred tools by regex or BM25 - Matched tools are expanded into full schemas within the same API call (zero extra round-trips)
- The
toolsparameter never changes → prompt cache is preserved - Supports both server-side and client-side custom search implementations
- Available on Sonnet 4.0+ / Opus 4.0+
OpenAI (tool_search + namespaces)
- Same
defer_loading: truemechanism - Additionally supports namespaces to group tools — model only sees namespace name + description, not individual function schemas
- Discovered tool schemas are injected at the end of the context window to preserve cache
- Supports both hosted (server-executed) and client-executed search
- Available on GPT-5.4+
Proposed feature for Gemini API
Add support for:
defer_loading: trueonfunctionDeclarations— tool schema is sent to the API but not rendered into the model's context window- A built-in
tool_searchtool — model can search deferred tools by name/description and load them on demand - Automatic schema expansion — matched tools are injected into the model's context within the same
generateContentcall
This would also help address the related issue of the 100 tool limit (google-gemini/gemini-cli#21823) — with deferred loading, the model only needs to hold a subset in context at any time.
Why this matters
For multi-provider AI agent frameworks, the lack of this capability in Gemini means:
- Cannot take advantage of tool search optimizations when using Gemini as the provider
- Must fall back to client-side pre-filtering which breaks prompt caching
- Or must send all tool schemas upfront, consuming context tokens unnecessarily
- Gemini is the only major provider (among Anthropic, OpenAI, Google) without this capability
References
- Anthropic tool search docs: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/tool-search
- OpenAI tool search docs: https://developers.openai.com/api/docs/guides/tools-tool-search
- Anthropic blog on advanced tool use: https://www.anthropic.com/engineering/advanced-tool-use
- Related Gemini CLI issue (100 tool limit): [Feature Request] Increase MCP tool limit from 100 to 500 google-gemini/gemini-cli#21823
- Related Gemini issue (search + function calling cannot be combined): Error when both search and function calls are in the tools[] config option #435