Skip to content

Feature Request: Support defer_loading and tool_search for dynamic tool discovery (parity with Anthropic & OpenAI) #1416

@leeclouddragon

Description

@leeclouddragon

Summary

Both Anthropic (Claude) and OpenAI (GPT-5.4+) now support deferred tool loading and tool search at the API level, enabling efficient management of large tool sets without consuming context tokens upfront. The Gemini API currently lacks this capability.

Problem

When building AI agents with many tools (30+), two problems compound:

  1. Context bloat: All tool schemas must be loaded into the model's context upfront. A typical multi-server MCP setup can consume 50k+ tokens in tool definitions alone before the model does any actual work.
  2. Tool selection accuracy: Model ability to correctly pick the right tool degrades significantly once you exceed 30-50 available tools.

The current workaround is client-side pre-filtering (selecting tools before each API call), but this breaks prompt caching because the tools parameter changes between requests, invalidating the prefix cache.

What Anthropic and OpenAI have

Anthropic (tool_search_tool_regex_20251119 / tool_search_tool_bm25_20251119)

  • Tools marked with defer_loading: true are sent in the request but NOT rendered into Claude's context
  • A tool_search server-side tool lets Claude search deferred tools by regex or BM25
  • Matched tools are expanded into full schemas within the same API call (zero extra round-trips)
  • The tools parameter never changes → prompt cache is preserved
  • Supports both server-side and client-side custom search implementations
  • Available on Sonnet 4.0+ / Opus 4.0+

OpenAI (tool_search + namespaces)

  • Same defer_loading: true mechanism
  • Additionally supports namespaces to group tools — model only sees namespace name + description, not individual function schemas
  • Discovered tool schemas are injected at the end of the context window to preserve cache
  • Supports both hosted (server-executed) and client-executed search
  • Available on GPT-5.4+

Proposed feature for Gemini API

Add support for:

  1. defer_loading: true on functionDeclarations — tool schema is sent to the API but not rendered into the model's context window
  2. A built-in tool_search tool — model can search deferred tools by name/description and load them on demand
  3. Automatic schema expansion — matched tools are injected into the model's context within the same generateContent call

This would also help address the related issue of the 100 tool limit (google-gemini/gemini-cli#21823) — with deferred loading, the model only needs to hold a subset in context at any time.

Why this matters

For multi-provider AI agent frameworks, the lack of this capability in Gemini means:

  • Cannot take advantage of tool search optimizations when using Gemini as the provider
  • Must fall back to client-side pre-filtering which breaks prompt caching
  • Or must send all tool schemas upfront, consuming context tokens unnecessarily
  • Gemini is the only major provider (among Anthropic, OpenAI, Google) without this capability

References

Metadata

Metadata

Assignees

Labels

api:gemini-apipriority: p2Moderately-important priority. Fix may not be included in next release.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions