MCP catalog exposure: integration tests and benchmarks#926
Conversation
Add 6 integration tests validating the full catalog exposure pipeline (spec §12) and a benchmark script measuring description size and rendering latency at various tool counts (spec §13). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review: MCP catalog exposure — integration tests and benchmarksSummaryWell-structured PR adding 6 integration tests and a benchmark script for the MCP catalog exposure feature. All 6 integration tests map directly to spec §12 requirements, and the benchmark covers the core measurable aspects of §13. Clean implementation with good test isolation. What's Good
Issues (Must Fix)None. Suggestions (Optional)
SecurityNo concerns. Tests and benchmarks only — no production code changes, no external calls, no credential handling. DocumentationNo updates needed. The PR adds tests and a benchmark script, both of which are self-documenting. The benchmark script includes usage comments and a VerdictApprove. All 6 spec §12 integration tests are correctly implemented, the benchmark covers the core measurable aspects of §13, and code quality is solid. The optional suggestions are minor improvements that can be deferred. |
|
@claude please fix:
|
Auto-Triage Summary
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix Summary✅ Issue Fixed: Removed unused
|
PR Review: MCP catalog exposure — integration tests and benchmarksSummarySolid PR adding 6 integration tests (spec §12) and a benchmark script (spec §13) for the MCP catalog exposure feature. All §12 integration test requirements are covered, the benchmark is well-structured with CLI options, and the code follows established codebase patterns. No issues found. What's Good
Issues (Must Fix)None. Suggestions (Optional)
SecurityNo concerns. Test and benchmark code only — no production changes, no external calls, no credential handling. DocumentationNo updates needed. The benchmark script is self-documenting with header comments and VerdictApprove. All 6 spec §12 integration tests are correctly implemented, the benchmark covers the measurable §13 items with clear documentation of deferrals, and code quality matches established codebase patterns. The two suggestions are minor housekeeping items that can be deferred. |
Summary
Integration tests (
catalog_integration_test.exs)catalog/search-tools→catalog/describe-tool→tool/mcp-callpipelinecatalog/list-tools→catalog/describe-tool→tool/mcp-calltool/mcp-callworks without catalog builtinsBenchmark script (
catalog_bench.exs)Measures rendered description size (chars) and rendering latency (µs) at 10, 30, 50, 100, and 200 tools for inline vs lazy modes. Outputs a markdown table with threshold analysis to justify/revise the default values (
catalog_inline_max_chars=12000,catalog_inline_max_tools=40).Run:
mix run mcp_server/bench/catalog_bench.exs [--runs=N] [--out=PATH]Closes #913
Test plan
mix test mcp_server/test/ptc_runner_mcp/catalog_integration_test.exs)Fix Automation State
Fix attempts: 2/3