Automated conformance testing of GenAI framework instrumentations against the OpenTelemetry Semantic Conventions for Generative AI.
Test App ──OTLP──▶ Weaver registry live-check ──▶ Results JSON
│
▼
Mock LLM Server (no API keys needed)
- A mock LLM server serves deterministic responses for OpenAI, Anthropic, Google, AWS Bedrock, and Cohere APIs — no API keys or network access required.
- A language-specific test app makes LLM calls (chat, streaming, tool calls, embeddings) through the instrumented client library.
- The instrumentation exports telemetry via OTLP to
Weaver
registry live-check, which validates every span, metric, and log against the official semantic conventions registry and reports coverage statistics.
- Python 3.12+ (test runner and mock server)
- uv (for Python dependency installation)
- Node.js 24+ (for JS/TS tests), Java 17+ (for Java tests), .NET 8+ (for .NET tests)
Run the test runner from an activated virtual environment so Python dependency installs stay isolated and local validation is more predictable.
python -m venv .venv
source .venv/bin/activateOn Windows use:
.venv\Scripts\activate# Format: python run_test.py <lang>-<library>-<ecosystem>
python run_test.py python-openai-otelcontrib
python run_test.py js-anthropic-openllmetry
python run_test.py java-openai-otelcontrib
python run_test.py dotnet-extensions-ai-nativeThe test runner automatically:
- Starts the mock LLM server
- Installs shared Python dependencies for the mock server, and for Python tests also installs the shared test support package plus the selected test's requirements via
uv - Downloads the pinned Weaver release on first use if needed, then starts Weaver for OTLP ingestion and validation
- Discovers and runs the test command
- Writes results to
tests/<lang>/<lib>/results/<eco>/ - Updates
tests/<lang>/<lib>/data-<eco>.jsonwith committed span, event, and metric coverage data
The data-<eco>.json files are checked into the repository and CI verifies they are up
to date. Run the relevant test locally before pushing to keep them in sync.
The conformance dashboard is auto-generated by CI and deployed to
GitHub Pages on merge to main.
It shows per-attribute coverage heatmaps for each span type (inference, embeddings,
tool execution, agents, etc.).
Generate locally:
python generate_dashboard.pyThe main dashboard is generated from checked-in data-<eco>.json files.
The details page (details.html) is always generated so dashboard links resolve for
every known test. When local Weaver results are available, it includes detailed
coverage and violation data. Otherwise, it shows placeholder sections with checked-in
instrumentation metadata until local results are generated.