Skip to content

Tool-registry scaling: resolve scratchpad/memory collision via #688 dynamic loading #800

@kovtcharov

Description

@kovtcharov

Context

Two PRs currently in flight add a combined 8 new tools to ChatAgent and bring the total tool count into "too many for reliable selection" territory:

After both land, ChatAgent will have ~27 tools simultaneously registered. Per issue #688, ~12K tokens of the 32K context (~37%) will be consumed by tool descriptions alone.

The concrete collision

scratchpad.query_data(sql) (from #495) and memory.recall(query) (from #606) answer different questions but are both terminal "query" steps. Without disambiguation the LLM will pick the wrong one when the query is ambiguous:

User query Right tool Why
"What did I spend on groceries in March?" query_data Structured data from session pipeline
"What did I learn about FTS5 last week?" recall Semantic concept across history
"How many research papers mention transformer attention?" Ambiguous Depends on where the data is stored

Why this needs to be fixed as part of #688, not in #495 or #606

Proposed resolution (implements #688 Phase 1)

Introduce a ToolLoader abstraction that gates tool visibility at prompt-generation time, not at registration time. The module-level _TOOL_REGISTRY remains the source of truth for all registered tools; the loader picks which subset appears in the LLM prompt each turn.

Bundles for the current tool set (27 tools):

Bundle Always on? Activation trigger Tools
core read_file (RAG-side), list_files, run_shell_command
rag conditional indexed docs exist query_documents, query_specific_file, index_document, list_indexed_documents, search_indexed_chunks, index_directory
filesystem conditional file/folder keywords, or recent use browse_directory, tree, file_info, find_files, read_file (FS-side), bookmark
scratchpad conditional after create_table in session, or keyword match create_table, insert_data, query_data, list_tables, drop_table
browser conditional URL pattern or web-search keywords fetch_page, search_web, download_file
memory conditional after any remember in session, or keyword match remember, recall, update_memory, forget, search_past_conversations
mcp_<server> per-server user enabled the MCP server dynamic per server

Tool routing decision order:

  1. Always-on bundles load (always core)
  2. If memory tool_history shows user used a bundle in last 24h → keep warm
  3. If current user message contains activation keywords → activate bundle
  4. If mid-workflow (e.g. just created a scratchpad table) → keep bundle active for rest of session
  5. LLM can explicitly request via meta-tool activate_bundle(name) (phase 2)

Acceptance criteria

  • ChatAgent tool prompt drops from ~12K → ~3-4K tokens in typical sessions (Dynamic tool loading based on conversation context via memory #688 target)
  • `scratchpad.query_data` and `memory.recall` are not both in the prompt simultaneously unless the user's message / session history justifies it
  • Core tools always available regardless of heuristic failure
  • Bundle selection decisions are logged to memory's `tool_history` so we can eval and tune
  • End-to-end regression test: a prompt that needs scratchpad.query_data picks it (not recall), and vice versa
  • Eval suite passes with no regression in tool-selection accuracy
  • Mid-conversation re-evaluation works (session that starts with file browsing and pivots to web research loads the right bundles at each turn)

Dependencies

Out of scope

  • The ToolLoader itself (that is Dynamic tool loading based on conversation context via memory #688 Phase 1 — this issue is the coordination tracker ensuring the collision is actually fixed, not left as a known gap)
  • Migration of non-ChatAgent agents (CodeAgent, JiraAgent, etc.) — each owns its own tool set; handle when / if that becomes a problem

References


/cc @kovtcharov-amd @itomek-amd

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentdomain:platformFoundation: Lemonade, providers, runtime, install, packagingenhancementNew feature or requestp1medium priorityperformancePerformance-critical changestech debttrack:platformFoundation that both consumer-app and oem-pc tracks consume

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions