feat(llmobs): capture pydantic-ai external/MCP tools, fix run_stream name#18528
feat(llmobs): capture pydantic-ai external/MCP tools, fix run_stream name#18528PROFeNoM wants to merge 8 commits into
Conversation
Codeowners resolved as |
|
c77dfcb to
b07c3c9
Compare
BenchmarksBenchmark execution time: 2026-06-10 13:41:35 Comparing candidate commit 4b615a3 in PR branch Found 0 performance improvements and 4 performance regressions! Performance is the same for 616 metrics, 10 unstable metrics. scenario:iastaspects-index_aspect
scenario:iastaspects-title_aspect
scenario:iastaspectsospath-ospathbasename_aspect
scenario:span-start
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b07c3c9a5f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2825cd277d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 25a8d78761
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5688d79434
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a526d8f503
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eeb53bb4e3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8b0c3149ba
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2360f6ba7d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…am agent name Extend the pydantic-ai LLMObs integration to record the full agent manifest: statically declared tools, externally/MCP-provided tools discovered during a run, and MCP server connection details (url/command/args, with credentials scrubbed from URLs and launch args). Observed tools are attributed per agent run via the ddtrace span parent chain: the agent span seeds an observed-tools dict in its ctx item and each tool span walks up to its nearest agent ancestor to record there. This keeps attribution correct under concurrency and nested agent-as-tool delegation without any context-local token state. Also honor pydantic-ai's `infer_name=False` on the run_stream path and re-infer the agent name through our proxy frame when it is left default.
…ming Add coverage for external/MCP tool capture in the agent manifest, credential scrubbing of MCP urls and launch args, per-run and override toolsets, concurrent and nested-delegation tool attribution, agent entry failure, and run_stream name inference (including infer_name=False). Add the mcp test server and the pydantic-ai mcp test venv.
2360f6b to
39e939d
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 39e939d8e8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…sets Dynamic, combined, and capability toolsets resolve to their MCP toolset only at run time, so they aren't reachable from the agent's static toolset list and were absent from the manifest's mcp_servers (orphaning the tool's mcp_server_id). Capture the realized MCP toolset from the observed tool call and merge it into mcp_servers. The observed tool path stashes only the toolset object; scrubbing and formatting run once per run at manifest assembly, not per tool call.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7c5fe1599
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The __aenter__ span-finish guard fixed a pre-existing span leak (and stale _run_stream_active flag) that exists on main, unrelated to tool/MCP capture. Removing it keeps this PR scoped to the feature; the entry-failure leak can be addressed in a separate fix PR.
The 3.13 cap wasn't required by any package metadata (pydantic-ai-slim 1.106.0 and mcp are both >=3.10 with no upper bound). Dropping it; the full pydantic_ai suite passes on 3.14 (63 passed).
Collapse the MCP test cluster from 8 functions to 4 without losing coverage: the 3 end-to-end runs become one parametrized test (static vs dynamic toolset; the redundant MCPToolset live-run is dropped, its only unique path is covered by the wrapper unit test), and the 5 _get_mcp_servers unit tests become 3 cohesive ones (credential scrubbing, source resolution, wrapper unwrapping). MCP/fastmcp imports stay function-local because the py3.9 venv can't install the mcp package and would fail collection otherwise.
Keep comments only where the behavior is non-obvious (toolset unwrapping, MCP detection, server resolution, credential scrubbing); drop the rest.
JIRA: MLOB-7609
Description
Extends the Pydantic AI LLM Observability integration so the agent manifest reflects the tools and MCP servers an agent actually uses:
ExternalToolset.tool_defs) now appear in the manifest.MCPServer*classes, the newerMCPToolset, and MCP toolsets behind wrappers (.prefixed(),load_mcp_toolsets()). Secret-looking launch args (e.g.--api-key <token>) and credentials in MCP URLs are scrubbed before reaching the manifest.run_streamagent name inference is fixed (the tracing proxy frame previously broke pydantic-ai's own frame-walk).Observed tool calls are attributed to their agent by walking the span's parent chain to the nearest agent span, so concurrent runs and nested agent-as-tool delegations don't cross-contaminate manifests. This replaces an unused process-global attribution registry (
_running_agents/_latest_agent, populated but never read and never cleaned up) with span-scoped state, fixing latent cross-contamination before any feature relied on it.What changes in Datadog
Snippets below are from the demo app. In each screenshot, released
ddtraceis on the left, this branch on the right.👉 Link to spans
1. External toolset tools in the manifest
Before: manifest
toolsis empty. After: it listslookup_order.2. MCP server metadata + called MCP tool
Before: no MCP info. After: manifest records the
time-mcpserver (id + command/args) and the calledget_current_timetool. (See thetime_agentspan, not the separateMCP Client Sessionspan.)3.
run_streamagent name inferenceBefore: span falls back to
PydanticAI Agent. After: span name isstreamed_agent.Testing
New tests in
tests/contrib/pydantic_ai/test_pydantic_ai_llmobs.py: external toolset capture, end-to-end MCP server (real stdio FastMCP subprocess),MCPToolsetcapture, wrapper-toolset MCP capture, dynamic MCP toolset capture, credential scrubbing,run_streamname inference, and concurrent/nested agent tool attribution.Full
llmobs::pydantic_aisuite passes locally (pydantic-ai 0.8.1 / 1.0.0 / 1.106.0). MCP-server tests gated>= 1.0.0(needsMCPServer.id);MCPToolsettest gated>= 1.97.0.Risks
Low. Integration-only. All toolset/MCP attribute reads are duck-typed and guarded, so a missing or renamed attribute never raises in a customer run. No public API change.
Performance
Feature overhead is within run-to-run noise (~0.2%). The per-tool-call cost is a short walk up the span parent chain plus an accumulator on the agent span; MCP scrubbing/formatting happens once per run at manifest assembly, not per tool call.
Benchmarked before vs after in the same venv (pydantic-ai 1.106.0, LLMObs enabled): 3 function tools + a real MCP stdio toolset,
TestModel(no network) calling every tool each run.Benchmark script and method
Dropped into
tests/contrib/pydantic_ai/, run via the harness:scripts/run-tests --venv <py3.12 pydantic-ai-slim 1.106.0> -- -s -- -s -k test_bench.It raises
AssertionErrorat the end so the harness surfaces the numbers (stdout is hidden on pass)."Before" = the three integration files at merge-base (
c641709); "After" = this branch.The MCP connection is opened once (
async with agent) and reused across all runs, so the measurementisolates per-run instrumentation cost, not MCP setup.
TestModelremoves LLM network latency sooverhead is not buried under request time.
Additional Notes
Full resolved tool catalog (via
ToolManager.for_run_step) is deferred until pydantic-ai 2.0 stable.