feat: optional OpenLLMetry tracing + per-client docs + SECURITY + ECM test#51
Closed
feat: optional OpenLLMetry tracing + per-client docs + SECURITY + ECM test#51
Conversation
- Add mcp_server/telemetry.py: init_telemetry(), get_tracer(), trace_operation() context manager, traced() decorator, record_tool_error(), _truncate() - Add mcp_server/otel_middleware.py: FastMCP Middleware subclass creating a SpanKind.SERVER span per tool call with mcp.tool.* attributes; sets ERROR status on ok=False results and uncaught exceptions; W3C trace propagation via fastmcp.telemetry.extract_trace_context() - Wire server.py: init_telemetry() + mcp.add_middleware(OtelMiddleware()) - Add config.py constants: OTEL_ENDPOINT, OTEL_SERVICE_NAME, OTEL_EXPORT_BATCH - Add @Traced() to key slow operations: run_simulation, apply_measure, create_measure, create_typical_building, create_bar_building, create_new_building, run_qaqc_checks - Add [telemetry] optional extras group in pyproject.toml: opentelemetry-sdk>=1.27, opentelemetry-exporter-otlp-proto-http>=1.27 - Add ARG OTEL_ENABLED=0 + conditional pip install to Dockerfile - Add tests/test_telemetry.py: 18 unit tests (no OpenStudio, no Docker) covering no-op behavior, provider init, middleware spans, error recording, arg truncation, trace_operation child spans, traced() decorator - Add test_telemetry.py to CI shard 5 Telemetry is fully opt-in: zero overhead when OTEL_EXPORTER_OTLP_ENDPOINT is unset (OTel API returns no-op tracer by default). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…build arg OpenTelemetry SDK and OTLP exporter are development-only tools — no user should be running production servers with bundled observability backends. Moving them into [dev] extras makes this explicit: - pip install -e '.[dev]' → includes telemetry (for dev/test) - pip install openstudio-mcp → no telemetry SDK installed The runtime telemetry.py module remains and is always imported, but falls back gracefully to no-ops when the SDK is absent (_SDK_AVAILABLE=False path). This means a production install still logs a warning if OTEL_EXPORTER_OTLP_ENDPOINT is set but the SDK is missing, rather than silently doing nothing. Dockerfile: removed OTEL_ENABLED ARG and conditional install; dev image always includes the SDK via pip install -e '.[dev]'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use traceloop/openllmetry instead of raw opentelemetry-sdk: pip install opentelemetry-instrumentation-mcp # auto-instruments FastMCP pip install traceloop-sdk # Traceloop.init() + @task() pyproject.toml: - Remove opentelemetry-sdk and opentelemetry-exporter-otlp-proto-http from [dev] - Add traceloop-sdk>=0.49.2 and opentelemetry-instrumentation-mcp>=0.38.0 mcp_server/telemetry.py (rewrite): - init_telemetry() calls McpInstrumentor().instrument() (auto-instruments all FastMCP tool calls) then Traceloop.init() with TRACELOOP_BASE_URL - Traceloop.init() print() calls are redirected to sys.stderr to avoid corrupting the MCP JSON-RPC stdio pipe - traced() delegates to @task() from traceloop.sdk.decorators; no-ops cleanly when SDK absent (_SDK_AVAILABLE=False) - trace_operation() uses raw OTel API tracer (always available via fastmcp); works with any TracerProvider including Traceloop - find_spec() wrapped in try/except so module imports safely when traceloop is not installed mcp_server/otel_middleware.py: deleted - McpInstrumentor().instrument() patches FastMCP directly and replaces this mcp_server/server.py: - Remove OtelMiddleware import and mcp.add_middleware(OtelMiddleware()) call mcp_server/config.py: - Replace OTEL_ENDPOINT / OTEL_EXPORT_BATCH with TRACELOOP_BASE_URL / TRACELOOP_API_KEY tests/test_telemetry.py (rewrite): - 15 unit tests covering all new behavior - Mocking strategy: patch sys.modules with traceloop stubs + parent package mock so find_spec() works in CI where traceloop-sdk is not installed Environment variables (updated): TRACELOOP_BASE_URL OTLP endpoint (replaces OTEL_EXPORTER_OTLP_ENDPOINT) TRACELOOP_API_KEY API key for Traceloop cloud (optional for generic OTLP) OTEL_SERVICE_NAME Service name (unchanged) OTEL_EXPORT_BATCH false -> disable_batch=True (unchanged) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- BLOCKING: traced() now checks _TELEMETRY_ENABLED at call time (not _SDK_AVAILABLE at decoration time). Prevents traceloop @task() from printing warnings to stdout when SDK is installed but no endpoint is configured — which would corrupt MCP JSON-RPC pipe. - HIGH: Replace @task() with trace_operation() context manager in traced(). Error marking now happens on the correct span (before it closes), not on the parent span after @task() has already ended it. - HIGH: McpInstrumentor().instrument() only called when endpoint is configured. No longer monkey-patches FastMCP when tracing is disabled. - MEDIUM: init_telemetry() idempotency fixed — returns _TELEMETRY_ENABLED on second call instead of always True. - Added 2 new tests: idempotent-disabled return value, stdout restoration on Traceloop.init() exception. Removed duplicate dict keys in test mocks. Total: 17 tests, all passing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Finding 3 (High): instrumentation ordering - Move FastMCP instantiation and register_all_skills() into main() so McpInstrumentor().instrument() patches FastMCP.__init__ before the instance is created. Server metadata (name/version) now correctly captured on spans. Finding 4 (Medium): redundant dep listing - Remove opentelemetry-instrumentation-mcp from [dev] extras. traceloop-sdk already declares it as a transitive dependency, so listing it separately created version-skew risk. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove TRACELOOP_BASE_URL, TRACELOOP_API_KEY, OTEL_SERVICE_NAME from config.py — they were never imported by any code and were misleading (read at module-import time while telemetry.py reads them live from os.environ at init time). - Add Tracing (OpenLLMetry) section to README with env var table, enabling instructions, and description of which operations are traced. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… testing User-facing documentation for evaluating openstudio-mcp across different MCP clients, measuring token context impact, and tracking performance. SECURITY.md: - Documents allowed path roots (/runs, /inputs, /repo, measures dirs) - Explains path traversal mitigations (is_path_allowed + Path.resolve) - Container isolation model (explicit volume mounts, offline EnergyPlus) - Stdout/MCP transport integrity (SWIG memleak + logger suppression) - Responsible disclosure contact instructions docs/testing/advanced-evaluation-template.md: - 6-section evaluation log for manual MCP client testing sessions - Sections: environment setup, skills orchestration adherence, artifact size limits (token context impact), in-memory session persistence, BEM practitioner workflow walkthroughs, security checks - Scoring rubric for skills adherence (1-5 scale) - Suggested 20-hour evaluation schedule - Cross-references to automated test coverage per section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Finding 1 (High) — telemetry/telemetry.py:
Swap McpInstrumentor().instrument() to run AFTER Traceloop.init() so
FastMCP tool-call spans have a live TracerProvider/exporter when they
are created. Previous order patched FastMCP before the provider was
established, meaning auto-traced spans had no real destination.
Finding 3 (Medium) — SECURITY.md container isolation section:
Remove false claim that the server runs as non-root (Dockerfile has no
USER instruction — it runs as root by default). Add guidance to add a
non-root USER in a derived image for production deployments.
Finding 2 (Medium) — SECURITY.md offline/network section:
Qualify the 'fully offline' claim: server is offline by default, but
when TRACELOOP_BASE_URL is set it exports traces outbound to that
OTLP endpoint.
Finding 4 (Low) — docs/testing/advanced-evaluation-template.md:
- Point test file references to their actual branch
(test/artifact-security-coverage) to avoid confusion
- Soften 'copy_file always succeeds for large files' — it can still
fail on disk space, permissions, or copy constraints
- Clarify /repo allowed-root note: intentional for skill guides but
also exposes server source code; evaluators should document behavior
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add docs/clients/ with setup instructions for each supported MCP host: - index.md — compatibility matrix, canonical Docker config, guide links - claude-desktop.md — config path, JSON, hammer icon verification, file access pattern - claude-code.md — .mcp.json, ToolSearch behavior, measured benchmark results - vs-code-copilot.md — .vscode/mcp.json (servers key), 128-tool limit, agent mode - windsurf.md — mcp_config.json, 100-tool hard cap, curated 80-tool starter set - gemini-cli.md — ~/.gemini/settings.json, includeTools for 100-tool soft limit - cursor.md — 40-tool hard cap makes it incompatible; lists alternatives - token-context-performance.md — measured schema overhead (~15K tokens for 142 tools), per-client context budget table, ToolSearch deferral impact, accuracy vs tool count data, strategies to reduce overhead Update README.md Other MCP Hosts section: - Replace one-liner with links to per-client guides - Add tool limit column to compatibility table - Add warning indicators (✅/⚠️ /❌) for partial/incompatible clients - Link to token context performance doc Data sourced from measured schema sizes and the three-model benchmark sweep (docs/knowledge/tool-discovery-and-llm-testing.md). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Verified against actual installed clients on macOS: Docker MCP protocol: - Confirmed container responds correctly to MCP initialize handshake - Server returns 142 tools, correct version, and instructions Claude Desktop: - Config at ~/Library/Application Support/Claude/claude_desktop_config.json - mcpServers key confirmed; already configured and working Claude Code: - Corrected: .mcp.json is gitignored (contains absolute paths); add .mcp.json.example template with placeholder paths for sharing - Corrected: 'claude mcp list' shows user-scope only; project-scope .mcp.json loads at session start, not reflected in mcp list - Add: verify registration with 'claude mcp add --scope project' which prints 'already exists' if correctly configured - Add: .mcp.json.example to repo for team sharing Gemini CLI: - Corrected: homebrew formula has broken @google/gemini-cli-core dep; recommend npm install instead - Add: 'gemini mcp list' verification command (no API call needed; shows Connected status directly) - Confirmed mcpServers key format against ~/.gemini/settings.json VS Code Copilot: - Confirmed 'servers' key (not 'mcpServers') from actual user mcp.json - Existing user config had relative paths for openstudio-mcp; fixed to absolute paths Windsurf: - Confirmed mcpServers key from ~/.codeium/windsurf/mcp_config.json - Added openstudio-mcp to local Windsurf config for testing Added .mcp.json to .gitignore (machine-specific absolute paths) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add .env to .gitignore (security: prevent accidental credential commit) - Fix trace_operation() to yield _NoopSpan on ImportError so it is safe to call unconditionally in production (no opentelemetry-api installed) - Add _NoopSpan class with no-op set_attribute/set_status/record_exception - Add test_trace_operation_noop_span_on_import_error to cover new path - Add docs/examples/20_deep_retrofit_package.md (Example 20: ECM package) - Add tests/test_skill_ecm_package.py (integration test for Example 20) - Add test_skill_ecm_package.py + test_telemetry.py to CI shard 2 / 5 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
create_standard_opaque_material creates wall/roof layers, not glazing. Replaced incorrect tool call sequence with a correct note pointing to SimpleGlazing via create_measure, matching actual SDK behaviour. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Telemetry is not dev tooling — it's a runtime feature for operators who want distributed tracing. Moving it out of [dev] into a dedicated [telemetry] extra makes the intent clear and lets production deployments opt in without pulling test dependencies. pip install 'openstudio-mcp[telemetry]' - pyproject.toml: new [telemetry] extra, remove from [dev] - telemetry.py: update module docstring and install-hint warning message - README.md: update Tracing section with correct install command Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add llama3.2:3b vs gemma3:4b benchmark results section - GSM8K: llama3.2:3b 0.67, gemma3:4b 0.55 - IFEval prompt-level loose: llama3.2:3b 0.630, gemma3:4b 0.750 - Key finding: gemma3:4b does not support native tool calling in Ollama - Recommended CI model: llama3.2:3b (tool calling + 2 GB fits in 16 GB runner) - Add Token Overhead by Scenario section with Ollama prompt_eval_count measurements - Measured with llama3.2:3b: 36 tokens baseline → 11,763 tokens (142 synthetic tools) - Projected real overhead ~29K tokens (204 tokens/tool avg vs 83 tokens/tool synthetic) - Add Jaeger tracing stack (docker-compose.tracing.yml + otel-collector-config.yaml) - Update Dockerfile with optional telemetry install (ARG TELEMETRY=0) - Correct schema size history: names+descriptions-only vs full JSON distinction - Update per-client overhead table with live measurements (117K chars / ~29K tokens) - Add reproduce instructions for benchmark (lm_eval + Ollama integration quirks) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…line comments, add CHANGELOG - Add opentelemetry-sdk>=1.20 to [dev] optional-dependencies so the 6 test_telemetry.py tests that import opentelemetry.sdk.trace don't fail on clean CI images (fastmcp only ships opentelemetry-api, not the SDK) - Convert all 18 test_telemetry.py Validates/Regression docstrings to inline # comments per CLAUDE.md convention - Add [Unreleased] CHANGELOG entry covering telemetry, client docs, SECURITY.md, ECM example, and stdout suppression tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- README: fix Docker tracing example to use openstudio-mcp:tracing image (openstudio-mcp:dev lacks traceloop-sdk; using it with TRACELOOP_BASE_URL set silently disables tracing with a warning) - README/docs: expand TRACELOOP_TRACE_CONTENT description to warn that the default (true) exports tool args and outputs to the OTLP backend; recommend false as a safe starting point - CI: add 'Smoke-install telemetry extra' step to build job that verifies pip install openstudio-mcp[telemetry] installs cleanly; catches packaging drift without a full TELEMETRY=1 Docker build - tests: add test_server_main_calls_init_telemetry_before_fastmcp (AST-based) to guard the init_telemetry-before-FastMCP startup ordering contract - tests: add test_traced_decorator_applied_to_promised_operations (AST-based) to guard that @Traced() remains on all 7 operations promised in CHANGELOG - .env.example: commit template with TRACELOOP_* vars and privacy guidance - CHANGELOG: update test count (18→20), add Fixed entries for docs corrections Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
traceloop-sdk>=0.49.2 requires opentelemetry-sdk>=1.38.0,<2.0.0. The previous >=1.20 lower bound could allow pip to resolve an incompatible version in constrained environments, causing install failures when combining [dev] and [telemetry] extras. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Collaborator
Author
|
Closing in favour of three focused PRs:
Branch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Added
pip install 'openstudio-mcp[telemetry]'+TRACELOOP_BASE_URLenv var enables distributed tracing via traceloop-sdk. Zero overhead when unset. Key operations (run_simulation,apply_measure,create_measure,create_*_building,run_qaqc_checks) emit named spans; every FastMCP tool call is auto-instrumented viaMcpInstrumentor.docs/clients/— detailed MCP config examples, tool limits, and performance notes for Claude Code, Claude Desktop, VS Code Copilot, Windsurf, Gemini CLI, and Cursor.docs/clients/token-context-performance.md— benchmark of how each client handles the 142-tool surface and context overhead.docs/examples/20_deep_retrofit_package.md— wall insulation + thermostat + window + PV stack with expected EUI ranges..mcp.json.example: ready-to-use Claude Code MCP config.docker/docker-compose.tracing.yml+docker/otel-collector-config.yamlfor local Jaeger/OTEL collector.test_telemetry.py: 20 unit tests for telemetry module (no Docker required) — includes startup-wiring and decorator-coverage regression tests..env.example: template for telemetry environment variables with privacy guidance.test_stdout_logger_silence.py: integration tests verifying Polyhedron/Space Logger warnings are fully suppressed aftersilence_openstudio_stdout_logger().Fixed
create_standard_opaque_material; now correctly notes that glazing requiresSimpleGlazingauthored viacreate_measure.openstudio-mcp:devtoopenstudio-mcp:tracing(the dev image does not include traceloop-sdk); added build command and explanatory note.TRACELOOP_TRACE_CONTENTdocs: expanded to warn that the default (true) exports tool arguments and outputs to the OTLP backend; recommendsfalseas the safe starting point.opentelemetry-sdkversion constraint: tightened[dev]lower bound from>=1.20to>=1.38.0to matchtraceloop-sdk's actual minimum; prevents pip resolving an incompatible version when both extras are installed.