[Feature] Behavioral trust scoring for multi-agent tool calls via MCP Observatory #7714

vdineshk · 2026-05-19T02:31:39Z

vdineshk
May 19, 2026

Problem

In multi-agent systems, agents delegate to other agents and call external tools (MCP servers, APIs). AutoGen currently treats all registered tools as equally trusted at runtime. There's no mechanism to check whether a tool server is behaving normally right now — a server that passed all checks yesterday could be timing out, returning anomalous data, or exhibiting behavioral drift today.

This matters more in AutoGen than single-agent frameworks because agent-to-agent delegation amplifies tool failures. When AgentA hands off to AgentB which calls ToolC, a behavioral anomaly at ToolC cascades through the chain before anyone detects it.

Proposal

Integrate optional runtime behavioral trust scoring using the Dominion Observatory — a behavioral trust registry tracking 14,800+ MCP servers with anonymized telemetry (latency, success rate, anomaly detection).

The integration would:

Before tool execution: query the Observatory for the tool's current behavioral trust score
Policy gate: block, warn, or log based on configurable threshold (e.g., block tools scoring below 40/100)
After tool execution: report anonymized telemetry back (server_url, success, latency_ms — no prompts, no arguments, no user data)

What this looks like in AutoGen

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.tools import FunctionTool
from dominion_observatory import check_trust

# Trust-aware tool wrapper
async def trust_gated_tool_call(tool, args):
    result = check_trust(tool.server_url)
    if result.trust_score is not None and result.trust_score < 40:
        raise RuntimeError(
            f"Tool '{tool.name}' blocked: trust score {result.trust_score:.1f} "
            f"below threshold 40 ({tool.server_url})"
        )
    return await tool.run(args)

# Or as middleware in the agent runtime
agent = AssistantAgent(
    name="research_agent",
    tools=[web_search, code_exec, data_fetch],
    tool_call_middleware=TrustGateMiddleware(
        min_trust_score=40,
        block_on_low_trust=True,
    ),
)

Why this matters for AutoGen specifically

Multi-agent cascades: Agent teams (GroupChat, Swarm) call tools in chains. A degraded tool midway through a Swarm handoff can corrupt downstream agent state. Runtime trust scoring catches this before the cascade.
MCP server ecosystem: AutoGen's MCP integration means agents call external servers they don't control. Behavioral trust is the missing signal between "server is reachable" and "server is behaving correctly."
Enterprise compliance: AutoGen targets enterprise workflows. Runtime behavioral audit trails (which tools were called, what their trust scores were, whether any were blocked) map directly to EU AI Act Article 12 logging requirements.

Existing infrastructure

Observatory: live at dominion-observatory.sgdata.workers.dev, tracking 14,800+ servers, 87,000+ interactions
Python SDK: pip install dominion-observatory — check_trust(server_url) returns trust score + anomaly flags
LangChain integration: ObservatoryTrustCallbackHandler — same before/after pattern
TypeScript SDK: @dominion/trust-provider on npm with beforeSettle hook for x402 protocol
Privacy: only anonymized telemetry (server_url, success, latency_ms, tool_name, http_status). No prompts, arguments, outputs, user IDs, or IPs.

Integration points

Cleanest integration options:

Tool call middleware on the Agent runtime — intercepts before/after every tool call
TrustGateMiddleware as an autogen_ext extension
Per-tool trust policy in team/swarm configuration

Happy to contribute a PR. The Python SDK and hook pattern are built — it's a matter of wiring into AutoGen's tool execution lifecycle.

References

Dominion Observatory — live behavioral trust data
Observatory Python SDK
x402 Trust-Provider Interface
Related proposals: OpenAI Agents SDK #3454, CrewAI #5789, LangChain #37376

productmakerjason · 2026-05-23T20:55:46Z

productmakerjason
May 23, 2026

This is very close to a failure mode I’m testing around runtime trust gates.

Before an agent follows an external task or calls an external tool, I’m trying to see whether it can verify the basics:

what it fetched
whether it selected a real task
whether it followed the expected schema
whether it stopped safely before claiming anything unverified

I’m collecting a few quick external runs here:

https://the-agents-of-nations.vercel.app/llms.txt

No full review needed. A failed run is useful if it shows where the trust gate should have caught the problem.

0 replies

ElamOlame31 · 2026-05-28T01:14:27Z

ElamOlame31
May 28, 2026

Exactly the problem we solved. AgentGate's behavioral dimension (20% of total trust) tracks request velocity, access pattern anomalies, and cross-session patterns across 24h. The key: score the sequence, not just the individual request. An agent reading 10 files in 5 minutes has a different behavioral score than the same 10 files read over 2 hours.

https://www.tryagentgate.com/
https://github.com/ElamOlame31/agentgate-public

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Behavioral trust scoring for multi-agent tool calls via MCP Observatory #7714

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Feature] Behavioral trust scoring for multi-agent tool calls via MCP Observatory #7714

Uh oh!

vdineshk May 19, 2026

Problem

Proposal

What this looks like in AutoGen

Why this matters for AutoGen specifically

Existing infrastructure

Integration points

References

Replies: 2 comments

Uh oh!

productmakerjason May 23, 2026

Uh oh!

ElamOlame31 May 28, 2026

vdineshk
May 19, 2026

productmakerjason
May 23, 2026

ElamOlame31
May 28, 2026