Skip to content

agent_discovery: cache resolved agent location to skip re-discovery on repeat sends#174

Open
weishi-imbue wants to merge 3 commits into
mainfrom
wz/message-cache-agent-match
Open

agent_discovery: cache resolved agent location to skip re-discovery on repeat sends#174
weishi-imbue wants to merge 3 commits into
mainfrom
wz/message-cache-agent-match

Conversation

@weishi-imbue

Copy link
Copy Markdown
Contributor

What

agent_discovery.send_message ran a full mngr discovery (find_all_agents) on every message. This caches each agent's resolved AgentMatch by name and sends straight to the known host on repeat messages, leveraging mngr's pre-resolved agents_to_message API. The agent lookup now happens once per agent instead of once per message.

How it works

  • First message to an agent: resolve via find_all_agents, cache the location.
  • Repeat messages: send straight to the cached AgentMatch — no discovery.
  • Stale cache (the send reaches no agent: destroyed / recreated / moved hosts): drop the entry and re-resolve, so a stale location self-heals. Detected via empty MessageResult.successful_agents (errors are collected, not raised, under ErrorBehavior.CONTINUE).
  • A name resolving to multiple agents is never cached, preserving the existing "message all matches" behavior.
  • STOPPED agents are still auto-started (is_start_desired=True) on both paths.

The cache is thread-safe (the sync send endpoint runs in FastAPI's threadpool). The resolver/sender are injected as MutableModel callables so the cache + stale-fallback orchestration is unit-tested without monkeypatching the mngr API.

What this does NOT do (yet)

send_message still builds a fresh MngrContext per call (_get_mngr_contextload_config), so a cache hit skips the multi-provider discovery scan but still pays config load + one host-scoped get_agents(). Reusing a long-lived context (warm provider cache; also fixes the per-call provider-instance accumulation in mngr's get_provider_instance) is the natural follow-up.

Tests

  • 6 new unit tests: hit, miss, stale-reresolve, multi-match-not-cached, no-match, and cache get/put/invalidate. agent_discovery_test.py → 12 passed.
  • ty clean; ruff clean; ratchets pass (the lone local test_no_type_errors failure is a pre-existing environment issue that fails identically on main).
  • Callers server_test + welcome_resend_test → 57 passed.

Draft for iteration.

🤖 Generated with Claude Code

weishi-imbue and others added 2 commits June 16, 2026 16:30
…n repeat sends

send_message resolved its target via a full mngr discovery (find_all_agents)
on every call. Cache each agent's resolved AgentMatch by name and feed it
straight to mngr's pre-resolved agents_to_message API, so repeat messages to
the same agent skip discovery and go to the known host.

On a stale cache hit (the send reaches no agent -- destroyed, recreated, or
moved hosts) the entry is dropped and the agent is re-resolved, preserving
correctness. A name resolving to multiple agents is never cached, preserving
the existing "message all matches" behavior. STOPPED agents are still
auto-started (is_start_desired=True) regardless of cache state.

The resolver/sender are injected as MutableModel callables so the cache and
stale-fallback orchestration is unit-tested without monkeypatching the mngr API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reworks the caching from the previous commit to follow existing precedents
instead of bespoke machinery (per review feedback):

- Cache is now a plain module-level dict (_AGENT_LOCATION_CACHE), like mngr's
  own get_provider_instance _instance_cache, instead of an _AgentMatchCache
  class. No lock: dict get/set/pop are individually atomic and the compound
  race is benign for a best-effort cache (mngr's cache is unlocked for the
  same reason).
- Drops the _AgentResolver/_AgentSender MutableModel __call__ wrappers, which
  only existed to bind mngr_ctx without a closure/partial. The resolve/send
  seam now follows welcome_resend.py's idiom: typed Callable aliases injected
  as defaulted parameters on one orchestration helper (_send_message_to_agent),
  defaulting to plain module-level functions (_resolve_agent/_send_to_agents).
- send_message stays a thin wrapper -- _send_message_to_agent(name, msg,
  mngr_ctx) -- with no global passed in and no wrapper objects constructed.

Tests use plain inline def fakes (inline functions are allowed in test files;
the ratchet excludes them) instead of MutableModel stubs, plus a real-but-cheap
MngrContext fixture (empty MNGR_PROJECT_CONFIG_DIR so no config files load under
mngr's pytest guard). Behavior unchanged; 11 unit tests pass, ty/ruff/ratchets
clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@weishi-imbue weishi-imbue marked this pull request as ready for review June 17, 2026 02:43
…ent docstring

The docstring described the prod code's current behavior plus a note that
resolve/send "are injected in tests" -- prod docstrings shouldn't narrate the
test setup. The defaulted Callable params are self-evident from the signature.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant