Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .github/workflows/ci-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,3 +163,45 @@ jobs:
overwrite: ${{ inputs.overwrite }}
tag: ${{ env.BRANCH }}
repo_token: ${{ secrets.GITHUB_TOKEN }}

- name: Download metrics YAML from latest CI run
# Download the metrics YAML artifact produced by ci-summary-report.yml
# on the main branch. This file documents all Prometheus metrics emitted
# by Jaeger and is consumed by the documentation website.
continue-on-error: true
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_REPO: ${{ github.repository }}
run: |
echo "Downloading jaeger-metrics-yaml artifact from latest CI Orchestrator run on main"
LATEST_RUN=$(gh run list \
--repo "${GH_REPO}" \
--workflow "CI Orchestrator" \
--branch main \
--status success \
--limit 1 \
--json databaseId \
--jq '.[0].databaseId')
if [ -z "$LATEST_RUN" ]; then
echo "::warning::No successful CI Orchestrator run found on main; skipping metrics YAML upload"
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This downloads the metrics YAML from the latest successful main-branch run, which may not correspond to the commit/tag being released (main could have advanced). To avoid publishing mismatched metrics for a release, consider selecting the CI Orchestrator run for the release commit (e.g., filter by commit SHA) or otherwise ensuring the artifact matches env.BRANCH/the tag.

Suggested change
echo "Downloading jaeger-metrics-yaml artifact from latest CI Orchestrator run on main"
LATEST_RUN=$(gh run list \
--repo "${GH_REPO}" \
--workflow "CI Orchestrator" \
--branch main \
--status success \
--limit 1 \
--json databaseId \
--jq '.[0].databaseId')
if [ -z "$LATEST_RUN" ]; then
echo "::warning::No successful CI Orchestrator run found on main; skipping metrics YAML upload"
if [ "${{ github.event_name }}" = "release" ]; then
TARGET_REF="${{ github.event.release.tag_name }}"
echo "Resolving release tag ${TARGET_REF} to a commit SHA"
TARGET_SHA=$(gh api \
--repo "${GH_REPO}" \
"/repos/${GH_REPO}/commits/${TARGET_REF}" \
--jq '.sha')
else
TARGET_SHA="${GITHUB_SHA}"
fi
if [ -z "$TARGET_SHA" ]; then
echo "::warning::Could not determine the target commit SHA for this release; skipping metrics YAML upload"
exit 0
fi
echo "Downloading jaeger-metrics-yaml artifact from the CI Orchestrator run for commit ${TARGET_SHA}"
LATEST_RUN=$(gh run list \
--repo "${GH_REPO}" \
--workflow "CI Orchestrator" \
--branch main \
--status success \
--limit 100 \
--json databaseId,headSha \
--jq ".[] | select(.headSha == \"${TARGET_SHA}\") | .databaseId" | head -n 1)
if [ -z "$LATEST_RUN" ]; then
echo "::warning::No successful CI Orchestrator run found on main for commit ${TARGET_SHA}; skipping metrics YAML upload"

Copilot uses AI. Check for mistakes.
exit 0
fi
Comment on lines +192 to +217
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gh run list ... --jq '.[0].databaseId' returns null when there are no matching runs; [ -z "$LATEST_RUN" ] won’t catch that and the next gh run download will fail noisily. Please treat null as “not found” (e.g., check for "null" or use a jq expression that yields an empty string).

Copilot uses AI. Check for mistakes.
echo "Using CI Orchestrator run: $LATEST_RUN"
gh run download "$LATEST_RUN" \
--repo "${GH_REPO}" \
--name jaeger-metrics-yaml \
--dir .metrics-export || {
echo "::warning::jaeger-metrics-yaml artifact not found in run $LATEST_RUN; skipping"
exit 0
}
ls -la .metrics-export/
Comment on lines +167 to +226
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step uses gh run list / gh run download, which requires the workflow token to have actions: read permission. The job currently sets an explicit permissions: block without actions, so these CLI calls will likely 403 and the metrics YAML will never be downloaded/uploaded. Add actions: read to jobs.publish-release.permissions (or adjust token usage) so the artifact fetch works as intended.

Copilot uses AI. Check for mistakes.

- name: Upload metrics YAML as release asset
if: ${{ inputs.dry_run != true }}
uses: svenstaro/upload-release-action@5e35e583720436a2cc5f9682b6f55657101c1ea1 # 2.11.1
continue-on-error: true
with:
file: .metrics-export/jaeger-metrics.yaml
overwrite: ${{ inputs.overwrite }}
tag: ${{ env.BRANCH }}
repo_token: ${{ secrets.GITHUB_TOKEN }}
19 changes: 18 additions & 1 deletion .github/workflows/ci-summary-report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
--repo "${{ github.repository }}" --dir .artifacts

- name: Install dependencies
run: python3 -m pip install prometheus-client
run: python3 -m pip install prometheus-client pyyaml

Comment on lines 38 to 40
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow now installs pyyaml, but .github/workflows/ci-lint-checks.yaml runs python3 -m unittest discover -s scripts/e2e and currently only installs prometheus-client. Since the new unit tests import yaml directly, CI will fail unless pyyaml is also installed there.

Copilot uses AI. Check for mistakes.
- name: Compare metrics and generate summary
id: compare-metrics
Expand All @@ -45,6 +45,23 @@ jobs:
shell: bash
run: bash ./scripts/e2e/metrics_summary.sh

- name: Export metrics to YAML for documentation
if: github.ref == 'refs/heads/main'
run: |
python3 ./scripts/e2e/export_metrics_to_yaml.py \
--snapshot-dir .artifacts \
--output .artifacts/jaeger-metrics.yaml || \
echo "::warning::Metrics YAML export failed (non-fatal)"

- name: Upload metrics YAML artifact
if: github.ref == 'refs/heads/main'
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
with:
name: jaeger-metrics-yaml
path: .artifacts/jaeger-metrics.yaml
retention-days: 90
if-no-files-found: ignore

- name: Set up Go for coverage tools
uses: ./.github/actions/setup-go
with:
Expand Down
303 changes: 303 additions & 0 deletions scripts/e2e/export_metrics_to_yaml.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,303 @@
# Copyright (c) 2025 The Jaeger Authors.
# SPDX-License-Identifier: Apache-2.0

"""Export Prometheus metrics snapshots to a structured YAML data file.

This script reads raw Prometheus text-format snapshot files (as scraped from
Jaeger's /metrics endpoint by the integration tests) and produces a single
YAML file suitable for consumption by the documentation website.

The output follows a similar pattern to the CLI flags YAML files stored in
``data/cli/{version}/`` in the jaegertracing/documentation repository. The
documentation site can place this output in ``data/metrics/{version}/`` and
render it with a Hugo template, keeping all styling and layout decisions in
the template layer rather than generating HTML or Markdown directly.

Usage::

python3 scripts/e2e/export_metrics_to_yaml.py \\
--snapshot-dir .metrics \\
--output metrics.yaml

The ``--snapshot-dir`` should contain one or more ``metrics_snapshot_*.txt``
files produced by the E2E integration tests.
"""

from __future__ import annotations

import argparse
import os
import re
import sys
from collections import defaultdict
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultdict is imported but never used in this script. Removing unused imports helps keep the script minimal and avoids lint noise if/when static checks are added.

Suggested change
from collections import defaultdict

Copilot uses AI. Check for mistakes.
from typing import Any

# PyYAML is preferred for output (human-readable, block style).
# prometheus_client is used for parsing the Prometheus text format.
try:
import yaml
except ImportError:
yaml = None # type: ignore[assignment]

try:
from prometheus_client.parser import text_string_to_metric_families
except ImportError:
text_string_to_metric_families = None # type: ignore[assignment]


# Labels that carry per-instance identity and should be stripped from the
# documentation output (they add noise without informational value).
_EXCLUDED_LABELS: frozenset[str] = frozenset({
"service_instance_id",
"otel_scope_version",
"otel_scope_schema_url",
})


def parse_snapshot(content: str) -> list[dict[str, Any]]:
"""Parse a Prometheus text-format snapshot into a list of metric dicts.

Each dict has the following structure::

{
"name": "http_server_duration_milliseconds_bucket",
"type": "histogram", # counter | gauge | histogram | summary | untyped
"help": "Duration of HTTP server requests.",
"labels": ["le", "http_method", "http_route"],
}

Metrics are deduplicated by name: the ``labels`` list is the union of all
label keys seen across all samples of that metric. Label *values* are
intentionally omitted -- the documentation page shows which metrics exist
and what dimensions they carry, not the actual runtime values.
Comment thread
yurishkuro marked this conversation as resolved.
Outdated
"""
if text_string_to_metric_families is None:
raise RuntimeError(
"prometheus_client is required: pip install prometheus-client"
)

metrics_map: dict[str, dict[str, Any]] = {}

for family in text_string_to_metric_families(content):
if family.name in metrics_map:
entry = metrics_map[family.name]
else:
entry = {
"name": family.name,
"type": family.type,
"help": family.documentation or "",
"labels": set(),
}
metrics_map[family.name] = entry

for sample in family.samples:
for label_key in sample.labels:
if label_key not in _EXCLUDED_LABELS:
entry["labels"].add(label_key)

# Convert label sets to sorted lists for deterministic output.
result: list[dict[str, Any]] = []
for entry in metrics_map.values():
result.append({
"name": entry["name"],
"type": entry["type"],
"help": entry["help"],
"labels": sorted(entry["labels"]),
})

result.sort(key=lambda m: m["name"])
return result


def collect_snapshots(
snapshot_dir: str,
) -> dict[str, list[dict[str, Any]]]:
"""Read all ``metrics_snapshot_*.txt`` files in *snapshot_dir*.

Supports two directory layouts:

1. **Flat**: snapshot files directly in *snapshot_dir*::

snapshot_dir/metrics_snapshot_memory.txt

2. **Artifact subdirectories** (as produced by ``gh run download``)::

snapshot_dir/metrics_snapshot_memory/metrics_snapshot_memory.txt

Returns a dict mapping the backend name (extracted from the filename,
e.g. ``"memory"``, ``"elasticsearch"``) to the parsed metric list.
When duplicate backend names are found (e.g. from matrix variations),
the metrics are merged.
"""
snapshots: dict[str, list[dict[str, Any]]] = {}
file_pattern = re.compile(r"^metrics_snapshot_(.+)\.txt$")

if not os.path.isdir(snapshot_dir):
print(f"Warning: snapshot directory does not exist: {snapshot_dir}",
file=sys.stderr)
return snapshots

snapshot_files: list[tuple[str, str]] = [] # (backend_name, filepath)

for entry in sorted(os.listdir(snapshot_dir)):
entry_path = os.path.join(snapshot_dir, entry)

# Case 1: file directly in snapshot_dir
if os.path.isfile(entry_path):
match = file_pattern.match(entry)
if match:
snapshot_files.append((match.group(1), entry_path))

# Case 2: subdirectory containing the snapshot file
elif os.path.isdir(entry_path):
for sub_entry in sorted(os.listdir(entry_path)):
match = file_pattern.match(sub_entry)
if match:
sub_path = os.path.join(entry_path, sub_entry)
if os.path.isfile(sub_path):
snapshot_files.append((match.group(1), sub_path))

for backend, filepath in snapshot_files:
with open(filepath, "r") as f:
content = f.read()
parsed = parse_snapshot(content)
if backend in snapshots:
# Merge with existing: union of metrics
existing_names = {m["name"] for m in snapshots[backend]}
for metric in parsed:
if metric["name"] not in existing_names:
snapshots[backend].append(metric)
existing_names.add(metric["name"])
Comment thread
yurishkuro marked this conversation as resolved.
Outdated
snapshots[backend].sort(key=lambda m: m["name"])
else:
snapshots[backend] = parsed

return snapshots


def merge_snapshots(
snapshots: dict[str, list[dict[str, Any]]],
) -> list[dict[str, Any]]:
"""Merge metrics from multiple backend snapshots into a unified list.

Because different backends exercise different code paths, the full set of
metrics is only available when all snapshots are combined. Merging
unions the label sets and keeps the help/type from the first occurrence.

An extra field ``"sources"`` lists the backend names where each metric
was observed.
"""
merged: dict[str, dict[str, Any]] = {}

for backend, metrics in sorted(snapshots.items()):
for metric in metrics:
name = metric["name"]
if name in merged:
existing = merged[name]
label_set = set(existing["labels"])
label_set.update(metric["labels"])
existing["labels"] = sorted(label_set)
if backend not in existing["sources"]:
existing["sources"].append(backend)
else:
merged[name] = {
"name": metric["name"],
"type": metric["type"],
"help": metric["help"],
"labels": list(metric["labels"]),
"sources": [backend],
}

result = sorted(merged.values(), key=lambda m: m["name"])
return result


def build_yaml_output(
merged_metrics: list[dict[str, Any]],
per_backend: dict[str, list[dict[str, Any]]],
) -> dict[str, Any]:
"""Build the top-level YAML structure.

The output has two sections:

- ``metrics``: the merged list of all metrics across all backends.
- ``backends``: a mapping from backend name to its metric list,
useful for backend-specific documentation pages.

Each metric entry has:

- ``name``: Prometheus metric name.
- ``type``: Prometheus metric type (counter, gauge, histogram, etc.).
- ``help``: the HELP string from the Prometheus exposition.
- ``labels``: sorted list of label keys (excluding instance-specific ones).
- ``sources`` (merged only): list of backend names where this metric
was observed.
"""
backends_summary: dict[str, dict[str, Any]] = {}
for backend_name in sorted(per_backend.keys()):
backend_metrics = per_backend[backend_name]
backends_summary[backend_name] = {
"count": len(backend_metrics),
"metrics": backend_metrics,
}

return {
"total_metrics": len(merged_metrics),
"total_backends": len(per_backend),
"metrics": merged_metrics,
"backends": backends_summary,
}


def write_yaml(data: dict[str, Any], output_path: str) -> None:
"""Write data to a YAML file."""
if yaml is None:
raise RuntimeError("PyYAML is required: pip install pyyaml")

with open(output_path, "w") as f:
yaml.dump(
data,
f,
default_flow_style=False,
sort_keys=False,
allow_unicode=True,
)
print(f"Metrics YAML written to {output_path}")


def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(
description="Export Prometheus metrics snapshots to YAML for documentation"
)
parser.add_argument(
"--snapshot-dir",
required=True,
help="Directory containing metrics_snapshot_*.txt files",
)
parser.add_argument(
"--output",
"-o",
default="metrics.yaml",
help="Output YAML file path (default: metrics.yaml)",
)

args = parser.parse_args(argv)

snapshots = collect_snapshots(args.snapshot_dir)
if not snapshots:
print("No metrics snapshot files found.", file=sys.stderr)
return 1

merged = merge_snapshots(snapshots)
output_data = build_yaml_output(merged, snapshots)
write_yaml(output_data, args.output)

print(
f"Exported {output_data['total_metrics']} unique metrics "
f"from {output_data['total_backends']} backend(s)"
)
return 0


if __name__ == "__main__":
sys.exit(main())
Loading
Loading