GenAI OpenTelemetry Conformance Tests

Automated conformance testing of GenAI framework instrumentations against the OpenTelemetry Semantic Conventions for Generative AI.

How It Works

Test App  ──OTLP──▶  Weaver registry live-check  ──▶  Results JSON
   │
   ▼
Mock LLM Server (no API keys needed)

A mock LLM server serves deterministic responses for OpenAI, Anthropic, Google, AWS Bedrock, and Cohere APIs — no API keys or network access required.
A language-specific test app makes LLM calls (chat, streaming, tool calls, embeddings) through the instrumented client library.
The instrumentation exports telemetry via OTLP to Weaver registry live-check, which validates every span, metric, and log against the official semantic conventions registry and reports coverage statistics.

Quick Start

Prerequisites

Python 3.12+ (test runner and mock server)
uv (for Python dependency installation)
Node.js 24+ (for JS/TS tests), Java 17+ (for Java tests), .NET 8+ (for .NET tests)

Recommended Local Setup

Run the test runner from an activated virtual environment so Python dependency installs stay isolated and local validation is more predictable.

python -m venv .venv
source .venv/bin/activate

On Windows use:

.venv\Scripts\activate

Running a Test

# Format: python run_test.py <lang>-<library>-<ecosystem>
python run_test.py python-openai-otelcontrib
python run_test.py js-anthropic-openllmetry
python run_test.py java-openai-otelcontrib
python run_test.py dotnet-extensions-ai-native

The test runner automatically:

Starts the mock LLM server
Installs shared Python dependencies for the mock server, and for Python tests also installs the shared test support package plus the selected test's requirements via uv
Downloads the pinned Weaver release on first use if needed, then starts Weaver for OTLP ingestion and validation
Discovers and runs the test command
Writes results to tests/<lang>/<lib>/results/<eco>/
Updates tests/<lang>/<lib>/data-<eco>.json with committed span, event, and metric coverage data

The data-<eco>.json files are checked into the repository and CI verifies they are up to date. Run the relevant test locally before pushing to keep them in sync.

Dashboard

The conformance dashboard is auto-generated by CI and deployed to GitHub Pages on merge to main. It shows per-attribute coverage heatmaps for each span type (inference, embeddings, tool execution, agents, etc.).

Generate locally:

python generate_dashboard.py

The main dashboard is generated from checked-in data-<eco>.json files. The details page (details.html) is always generated so dashboard links resolve for every known test. When local Weaver results are available, it includes detailed coverage and violation data. Otherwise, it shows placeholder sections with checked-in instrumentation metadata until local results are generated.

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.github		.github
genai_otel_conformance		genai_otel_conformance
templates		templates
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_dashboard.py		generate_dashboard.py
run_test.py		run_test.py
versions.env		versions.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI OpenTelemetry Conformance Tests

How It Works

Quick Start

Prerequisites

Recommended Local Setup

Running a Test

Dashboard

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GenAI OpenTelemetry Conformance Tests

How It Works

Quick Start

Prerequisites

Recommended Local Setup

Running a Test

Dashboard

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages