Skip to content

feat: optional OpenLLMetry tracing + per-client docs + SECURITY + ECM test#50

Closed
anchapin wants to merge 18 commits intoNatLabRockies:developfrom
anchapin:user-testing
Closed

feat: optional OpenLLMetry tracing + per-client docs + SECURITY + ECM test#50
anchapin wants to merge 18 commits intoNatLabRockies:developfrom
anchapin:user-testing

Conversation

@anchapin
Copy link
Copy Markdown
Collaborator

Summary

Added

  • Optional OpenLLMetry tracing: pip install 'openstudio-mcp[telemetry]' + TRACELOOP_BASE_URL env var enables distributed tracing via traceloop-sdk. Zero overhead when unset. Key operations (run_simulation, apply_measure, create_measure, create_*_building, run_qaqc_checks) emit named spans; every FastMCP tool call is auto-instrumented via McpInstrumentor.
  • Per-client setup guides: docs/clients/ — detailed MCP config examples, tool limits, and performance notes for Claude Code, Claude Desktop, VS Code Copilot, Windsurf, Gemini CLI, and Cursor.
  • Token context performance doc: docs/clients/token-context-performance.md — benchmark of how each client handles the 142-tool surface and context overhead.
  • SECURITY.md: disclosure policy and supported versions.
  • ECM package example: docs/examples/20_deep_retrofit_package.md — wall insulation + thermostat + window + PV stack with expected EUI ranges.
  • .mcp.json.example: ready-to-use Claude Code MCP config.
  • Docker tracing stack: docker/docker-compose.tracing.yml + docker/otel-collector-config.yaml for local Jaeger/OTEL collector.
  • test_telemetry.py: 20 unit tests for telemetry module (no Docker required) — includes startup-wiring and decorator-coverage regression tests.
  • .env.example: template for telemetry environment variables with privacy guidance.
  • test_stdout_logger_silence.py: integration tests verifying Polyhedron/Space Logger warnings are fully suppressed after silence_openstudio_stdout_logger().

Fixed

  • ECM package example: window ECM was incorrectly using create_standard_opaque_material; now correctly notes that glazing requires SimpleGlazing authored via create_measure.
  • README tracing Docker example: corrected image tag from openstudio-mcp:dev to openstudio-mcp:tracing (the dev image does not include traceloop-sdk); added build command and explanatory note.
  • TRACELOOP_TRACE_CONTENT docs: expanded to warn that the default (true) exports tool arguments and outputs to the OTLP backend; recommends false as the safe starting point.
  • opentelemetry-sdk version constraint: tightened [dev] lower bound from >=1.20 to >=1.38.0 to match traceloop-sdk's actual minimum; prevents pip resolving an incompatible version when both extras are installed.

anchapin and others added 18 commits April 14, 2026 21:53
- Add mcp_server/telemetry.py: init_telemetry(), get_tracer(), trace_operation()
  context manager, traced() decorator, record_tool_error(), _truncate()
- Add mcp_server/otel_middleware.py: FastMCP Middleware subclass creating a
  SpanKind.SERVER span per tool call with mcp.tool.* attributes; sets ERROR
  status on ok=False results and uncaught exceptions; W3C trace propagation
  via fastmcp.telemetry.extract_trace_context()
- Wire server.py: init_telemetry() + mcp.add_middleware(OtelMiddleware())
- Add config.py constants: OTEL_ENDPOINT, OTEL_SERVICE_NAME, OTEL_EXPORT_BATCH
- Add @Traced() to key slow operations: run_simulation, apply_measure,
  create_measure, create_typical_building, create_bar_building,
  create_new_building, run_qaqc_checks
- Add [telemetry] optional extras group in pyproject.toml:
  opentelemetry-sdk>=1.27, opentelemetry-exporter-otlp-proto-http>=1.27
- Add ARG OTEL_ENABLED=0 + conditional pip install to Dockerfile
- Add tests/test_telemetry.py: 18 unit tests (no OpenStudio, no Docker)
  covering no-op behavior, provider init, middleware spans, error recording,
  arg truncation, trace_operation child spans, traced() decorator
- Add test_telemetry.py to CI shard 5

Telemetry is fully opt-in: zero overhead when OTEL_EXPORTER_OTLP_ENDPOINT
is unset (OTel API returns no-op tracer by default).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…build arg

OpenTelemetry SDK and OTLP exporter are development-only tools — no user
should be running production servers with bundled observability backends.
Moving them into [dev] extras makes this explicit:

- pip install -e '.[dev]'  → includes telemetry (for dev/test)
- pip install openstudio-mcp  → no telemetry SDK installed

The runtime telemetry.py module remains and is always imported, but falls
back gracefully to no-ops when the SDK is absent (_SDK_AVAILABLE=False path).
This means a production install still logs a warning if OTEL_EXPORTER_OTLP_ENDPOINT
is set but the SDK is missing, rather than silently doing nothing.

Dockerfile: removed OTEL_ENABLED ARG and conditional install; dev image
always includes the SDK via pip install -e '.[dev]'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use traceloop/openllmetry instead of raw opentelemetry-sdk:

  pip install opentelemetry-instrumentation-mcp  # auto-instruments FastMCP
  pip install traceloop-sdk                       # Traceloop.init() + @task()

pyproject.toml:
- Remove opentelemetry-sdk and opentelemetry-exporter-otlp-proto-http from [dev]
- Add traceloop-sdk>=0.49.2 and opentelemetry-instrumentation-mcp>=0.38.0

mcp_server/telemetry.py (rewrite):
- init_telemetry() calls McpInstrumentor().instrument() (auto-instruments all
  FastMCP tool calls) then Traceloop.init() with TRACELOOP_BASE_URL
- Traceloop.init() print() calls are redirected to sys.stderr to avoid
  corrupting the MCP JSON-RPC stdio pipe
- traced() delegates to @task() from traceloop.sdk.decorators; no-ops cleanly
  when SDK absent (_SDK_AVAILABLE=False)
- trace_operation() uses raw OTel API tracer (always available via fastmcp);
  works with any TracerProvider including Traceloop
- find_spec() wrapped in try/except so module imports safely when traceloop
  is not installed

mcp_server/otel_middleware.py: deleted
- McpInstrumentor().instrument() patches FastMCP directly and replaces this

mcp_server/server.py:
- Remove OtelMiddleware import and mcp.add_middleware(OtelMiddleware()) call

mcp_server/config.py:
- Replace OTEL_ENDPOINT / OTEL_EXPORT_BATCH with TRACELOOP_BASE_URL / TRACELOOP_API_KEY

tests/test_telemetry.py (rewrite):
- 15 unit tests covering all new behavior
- Mocking strategy: patch sys.modules with traceloop stubs + parent package
  mock so find_spec() works in CI where traceloop-sdk is not installed

Environment variables (updated):
  TRACELOOP_BASE_URL    OTLP endpoint (replaces OTEL_EXPORTER_OTLP_ENDPOINT)
  TRACELOOP_API_KEY     API key for Traceloop cloud (optional for generic OTLP)
  OTEL_SERVICE_NAME     Service name (unchanged)
  OTEL_EXPORT_BATCH     false -> disable_batch=True (unchanged)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- BLOCKING: traced() now checks _TELEMETRY_ENABLED at call time (not
  _SDK_AVAILABLE at decoration time). Prevents traceloop @task() from
  printing warnings to stdout when SDK is installed but no endpoint is
  configured — which would corrupt MCP JSON-RPC pipe.

- HIGH: Replace @task() with trace_operation() context manager in
  traced(). Error marking now happens on the correct span (before it
  closes), not on the parent span after @task() has already ended it.

- HIGH: McpInstrumentor().instrument() only called when endpoint is
  configured. No longer monkey-patches FastMCP when tracing is disabled.

- MEDIUM: init_telemetry() idempotency fixed — returns _TELEMETRY_ENABLED
  on second call instead of always True.

- Added 2 new tests: idempotent-disabled return value, stdout restoration
  on Traceloop.init() exception. Removed duplicate dict keys in test
  mocks. Total: 17 tests, all passing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Finding 3 (High): instrumentation ordering
- Move FastMCP instantiation and register_all_skills() into main() so
  McpInstrumentor().instrument() patches FastMCP.__init__ before the
  instance is created. Server metadata (name/version) now correctly
  captured on spans.

Finding 4 (Medium): redundant dep listing
- Remove opentelemetry-instrumentation-mcp from [dev] extras.
  traceloop-sdk already declares it as a transitive dependency, so
  listing it separately created version-skew risk.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove TRACELOOP_BASE_URL, TRACELOOP_API_KEY, OTEL_SERVICE_NAME from
  config.py — they were never imported by any code and were misleading
  (read at module-import time while telemetry.py reads them live from
  os.environ at init time).

- Add Tracing (OpenLLMetry) section to README with env var table,
  enabling instructions, and description of which operations are traced.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… testing

User-facing documentation for evaluating openstudio-mcp across different
MCP clients, measuring token context impact, and tracking performance.

SECURITY.md:
- Documents allowed path roots (/runs, /inputs, /repo, measures dirs)
- Explains path traversal mitigations (is_path_allowed + Path.resolve)
- Container isolation model (explicit volume mounts, offline EnergyPlus)
- Stdout/MCP transport integrity (SWIG memleak + logger suppression)
- Responsible disclosure contact instructions

docs/testing/advanced-evaluation-template.md:
- 6-section evaluation log for manual MCP client testing sessions
- Sections: environment setup, skills orchestration adherence,
  artifact size limits (token context impact), in-memory session
  persistence, BEM practitioner workflow walkthroughs, security checks
- Scoring rubric for skills adherence (1-5 scale)
- Suggested 20-hour evaluation schedule
- Cross-references to automated test coverage per section

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Finding 1 (High) — telemetry/telemetry.py:
  Swap McpInstrumentor().instrument() to run AFTER Traceloop.init() so
  FastMCP tool-call spans have a live TracerProvider/exporter when they
  are created. Previous order patched FastMCP before the provider was
  established, meaning auto-traced spans had no real destination.

Finding 3 (Medium) — SECURITY.md container isolation section:
  Remove false claim that the server runs as non-root (Dockerfile has no
  USER instruction — it runs as root by default). Add guidance to add a
  non-root USER in a derived image for production deployments.

Finding 2 (Medium) — SECURITY.md offline/network section:
  Qualify the 'fully offline' claim: server is offline by default, but
  when TRACELOOP_BASE_URL is set it exports traces outbound to that
  OTLP endpoint.

Finding 4 (Low) — docs/testing/advanced-evaluation-template.md:
  - Point test file references to their actual branch
    (test/artifact-security-coverage) to avoid confusion
  - Soften 'copy_file always succeeds for large files' — it can still
    fail on disk space, permissions, or copy constraints
  - Clarify /repo allowed-root note: intentional for skill guides but
    also exposes server source code; evaluators should document behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add docs/clients/ with setup instructions for each supported MCP host:
- index.md — compatibility matrix, canonical Docker config, guide links
- claude-desktop.md — config path, JSON, hammer icon verification, file access pattern
- claude-code.md — .mcp.json, ToolSearch behavior, measured benchmark results
- vs-code-copilot.md — .vscode/mcp.json (servers key), 128-tool limit, agent mode
- windsurf.md — mcp_config.json, 100-tool hard cap, curated 80-tool starter set
- gemini-cli.md — ~/.gemini/settings.json, includeTools for 100-tool soft limit
- cursor.md — 40-tool hard cap makes it incompatible; lists alternatives
- token-context-performance.md — measured schema overhead (~15K tokens for 142
  tools), per-client context budget table, ToolSearch deferral impact, accuracy
  vs tool count data, strategies to reduce overhead

Update README.md Other MCP Hosts section:
- Replace one-liner with links to per-client guides
- Add tool limit column to compatibility table
- Add warning indicators (✅/⚠️/❌) for partial/incompatible clients
- Link to token context performance doc

Data sourced from measured schema sizes and the three-model benchmark sweep
(docs/knowledge/tool-discovery-and-llm-testing.md).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Verified against actual installed clients on macOS:

Docker MCP protocol:
- Confirmed container responds correctly to MCP initialize handshake
- Server returns 142 tools, correct version, and instructions

Claude Desktop:
- Config at ~/Library/Application Support/Claude/claude_desktop_config.json
- mcpServers key confirmed; already configured and working

Claude Code:
- Corrected: .mcp.json is gitignored (contains absolute paths); add
  .mcp.json.example template with placeholder paths for sharing
- Corrected: 'claude mcp list' shows user-scope only; project-scope
  .mcp.json loads at session start, not reflected in mcp list
- Add: verify registration with 'claude mcp add --scope project' which
  prints 'already exists' if correctly configured
- Add: .mcp.json.example to repo for team sharing

Gemini CLI:
- Corrected: homebrew formula has broken @google/gemini-cli-core dep;
  recommend npm install instead
- Add: 'gemini mcp list' verification command (no API call needed;
  shows Connected status directly)
- Confirmed mcpServers key format against ~/.gemini/settings.json

VS Code Copilot:
- Confirmed 'servers' key (not 'mcpServers') from actual user mcp.json
- Existing user config had relative paths for openstudio-mcp; fixed to
  absolute paths

Windsurf:
- Confirmed mcpServers key from ~/.codeium/windsurf/mcp_config.json
- Added openstudio-mcp to local Windsurf config for testing

Added .mcp.json to .gitignore (machine-specific absolute paths)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add .env to .gitignore (security: prevent accidental credential commit)
- Fix trace_operation() to yield _NoopSpan on ImportError so it is safe
  to call unconditionally in production (no opentelemetry-api installed)
- Add _NoopSpan class with no-op set_attribute/set_status/record_exception
- Add test_trace_operation_noop_span_on_import_error to cover new path
- Add docs/examples/20_deep_retrofit_package.md (Example 20: ECM package)
- Add tests/test_skill_ecm_package.py (integration test for Example 20)
- Add test_skill_ecm_package.py + test_telemetry.py to CI shard 2 / 5

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
create_standard_opaque_material creates wall/roof layers, not glazing.
Replaced incorrect tool call sequence with a correct note pointing to
SimpleGlazing via create_measure, matching actual SDK behaviour.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Telemetry is not dev tooling — it's a runtime feature for operators
who want distributed tracing. Moving it out of [dev] into a dedicated
[telemetry] extra makes the intent clear and lets production deployments
opt in without pulling test dependencies.

  pip install 'openstudio-mcp[telemetry]'

- pyproject.toml: new [telemetry] extra, remove from [dev]
- telemetry.py: update module docstring and install-hint warning message
- README.md: update Tracing section with correct install command

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add llama3.2:3b vs gemma3:4b benchmark results section
  - GSM8K: llama3.2:3b 0.67, gemma3:4b 0.55
  - IFEval prompt-level loose: llama3.2:3b 0.630, gemma3:4b 0.750
  - Key finding: gemma3:4b does not support native tool calling in Ollama
  - Recommended CI model: llama3.2:3b (tool calling + 2 GB fits in 16 GB runner)
- Add Token Overhead by Scenario section with Ollama prompt_eval_count measurements
  - Measured with llama3.2:3b: 36 tokens baseline → 11,763 tokens (142 synthetic tools)
  - Projected real overhead ~29K tokens (204 tokens/tool avg vs 83 tokens/tool synthetic)
- Add Jaeger tracing stack (docker-compose.tracing.yml + otel-collector-config.yaml)
- Update Dockerfile with optional telemetry install (ARG TELEMETRY=0)
- Correct schema size history: names+descriptions-only vs full JSON distinction
- Update per-client overhead table with live measurements (117K chars / ~29K tokens)
- Add reproduce instructions for benchmark (lm_eval + Ollama integration quirks)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…line comments, add CHANGELOG

- Add opentelemetry-sdk>=1.20 to [dev] optional-dependencies so the 6
  test_telemetry.py tests that import opentelemetry.sdk.trace don't fail
  on clean CI images (fastmcp only ships opentelemetry-api, not the SDK)
- Convert all 18 test_telemetry.py Validates/Regression docstrings to
  inline # comments per CLAUDE.md convention
- Add [Unreleased] CHANGELOG entry covering telemetry, client docs,
  SECURITY.md, ECM example, and stdout suppression tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- README: fix Docker tracing example to use openstudio-mcp:tracing image
  (openstudio-mcp:dev lacks traceloop-sdk; using it with TRACELOOP_BASE_URL
  set silently disables tracing with a warning)
- README/docs: expand TRACELOOP_TRACE_CONTENT description to warn that the
  default (true) exports tool args and outputs to the OTLP backend; recommend
  false as a safe starting point
- CI: add 'Smoke-install telemetry extra' step to build job that verifies
  pip install openstudio-mcp[telemetry] installs cleanly; catches packaging
  drift without a full TELEMETRY=1 Docker build
- tests: add test_server_main_calls_init_telemetry_before_fastmcp (AST-based)
  to guard the init_telemetry-before-FastMCP startup ordering contract
- tests: add test_traced_decorator_applied_to_promised_operations (AST-based)
  to guard that @Traced() remains on all 7 operations promised in CHANGELOG
- .env.example: commit template with TRACELOOP_* vars and privacy guidance
- CHANGELOG: update test count (18→20), add Fixed entries for docs corrections

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
traceloop-sdk>=0.49.2 requires opentelemetry-sdk>=1.38.0,<2.0.0.
The previous >=1.20 lower bound could allow pip to resolve an
incompatible version in constrained environments, causing install
failures when combining [dev] and [telemetry] extras.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@anchapin
Copy link
Copy Markdown
Collaborator Author

Closing in favor of a PR from the upstream branch directly.

@anchapin anchapin closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant