Skip to content

trask/genai-otel-conformance

Repository files navigation

GenAI OpenTelemetry Conformance Tests

Automated conformance testing of GenAI framework instrumentations against the OpenTelemetry Semantic Conventions for Generative AI.

How It Works

Test App  ──OTLP──▶  Weaver registry live-check  ──▶  Results JSON
   │
   ▼
Mock LLM Server (no API keys needed)
  1. A mock LLM server serves deterministic responses for OpenAI, Anthropic, Google, AWS Bedrock, and Cohere APIs — no API keys or network access required.
  2. A language-specific test app makes LLM calls (chat, streaming, tool calls, embeddings) through the instrumented client library.
  3. The instrumentation exports telemetry via OTLP to Weaver registry live-check, which validates every span, metric, and log against the official semantic conventions registry and reports coverage statistics.

Quick Start

Prerequisites

  • Python 3.12+ (test runner and mock server)
  • uv (for Python dependency installation)
  • Node.js 24+ (for JS/TS tests), Java 17+ (for Java tests), .NET 8+ (for .NET tests)

Recommended Local Setup

Run the test runner from an activated virtual environment so Python dependency installs stay isolated and local validation is more predictable.

python -m venv .venv
source .venv/bin/activate

On Windows use:

.venv\Scripts\activate

Running a Test

# Format: python run_test.py <lang>-<library>-<ecosystem>
python run_test.py python-openai-otelcontrib
python run_test.py js-anthropic-openllmetry
python run_test.py java-openai-otelcontrib
python run_test.py dotnet-extensions-ai-native

The test runner automatically:

  1. Starts the mock LLM server
  2. Installs shared Python dependencies for the mock server, and for Python tests also installs the shared test support package plus the selected test's requirements via uv
  3. Downloads the pinned Weaver release on first use if needed, then starts Weaver for OTLP ingestion and validation
  4. Discovers and runs the test command
  5. Writes results to tests/<lang>/<lib>/results/<eco>/
  6. Updates tests/<lang>/<lib>/data-<eco>.json with committed span, event, and metric coverage data

The data-<eco>.json files are checked into the repository and CI verifies they are up to date. Run the relevant test locally before pushing to keep them in sync.

Dashboard

The conformance dashboard is auto-generated by CI and deployed to GitHub Pages on merge to main. It shows per-attribute coverage heatmaps for each span type (inference, embeddings, tool execution, agents, etc.).

Generate locally:

python generate_dashboard.py

The main dashboard is generated from checked-in data-<eco>.json files. The details page (details.html) is always generated so dashboard links resolve for every known test. When local Weaver results are available, it includes detailed coverage and violation data. Otherwise, it shows placeholder sections with checked-in instrumentation metadata until local results are generated.

Releases

No releases published

Packages

 
 
 

Contributors