Skip to content

feat: add metrics YAML export for documentation website#8385

Open
yosofbadr wants to merge 3 commits intojaegertracing:mainfrom
yosofbadr:feat/metrics-compat-validation
Open

feat: add metrics YAML export for documentation website#8385
yosofbadr wants to merge 3 commits intojaegertracing:mainfrom
yosofbadr:feat/metrics-compat-validation

Conversation

@yosofbadr
Copy link
Copy Markdown

@yosofbadr yosofbadr commented Apr 15, 2026

Summary

Addresses the final task in #6278: implement a way to incorporate the metrics report into documentation website.

This PR adds the tooling and CI integration to export Prometheus metrics snapshots to a structured YAML data file, following the same pattern as the CLI flags YAML files that were previously used by the documentation website (data/cli/{version}/).

Changes

  • scripts/e2e/export_metrics_to_yaml.py: New Python script that parses raw Prometheus text-format snapshot files (as scraped from Jaeger's /metrics endpoint during E2E integration tests) and produces a single YAML file containing:

    • Metric names, types (counter/gauge/histogram/summary), and HELP strings
    • Label keys (with instance-specific labels like service_instance_id excluded)
    • A merged view across all backends with source tracking
    • Per-backend metric lists for backend-specific documentation
  • scripts/e2e/export_metrics_to_yaml_test.py: 26 unit tests covering parsing, collection (both flat and gh run download subdirectory layouts), merging, YAML serialization, and the full end-to-end pipeline.

  • .github/workflows/ci-summary-report.yml: On main branch CI runs, exports the combined metrics as a YAML artifact (jaeger-metrics-yaml, 90-day retention) so it persists beyond the default 7-day artifact window.

  • .github/workflows/ci-release.yml: During releases, downloads the metrics YAML artifact from the latest successful CI Orchestrator run on main and uploads it as a release asset (jaeger-metrics.yaml). The documentation repo can then fetch this asset during its release process and place it in data/metrics/{version}/ for Hugo template rendering.

How it fits together

E2E tests (scrapeMetrics) -> metrics_snapshot_*.txt artifacts
                                    |
                          ci-summary-report.yml
                                    |
                     export_metrics_to_yaml.py
                                    |
                        jaeger-metrics.yaml artifact
                                    |
                            ci-release.yml
                                    |
                    jaeger-metrics.yaml release asset
                                    |
                    documentation repo release process
                                    |
                     data/metrics/{version}/metrics.yaml
                                    |
                         Hugo template rendering

Test plan

  • All 26 new unit tests pass locally
  • All 35 existing Python tests (compare_metrics, metrics_summary) still pass
  • Verify CI pipeline runs the export step on main branch
  • Verify release workflow downloads and uploads the metrics YAML
  • Verify the YAML output is suitable for Hugo template consumption in the documentation repo

Closes #6278

Add a script and CI pipeline integration to export Prometheus metrics
snapshots to a structured YAML data file suitable for consumption by
the jaegertracing/documentation website.

This addresses the final task in jaegertracing#6278: incorporating the metrics report
into the documentation website. The approach follows the same pattern as
CLI flags (YAML data files rendered by Hugo templates):

- scripts/e2e/export_metrics_to_yaml.py: Parses raw Prometheus
  text-format snapshot files from E2E integration tests and produces a
  single YAML file with metric names, types, help strings, labels, and
  backend sources.

- scripts/e2e/export_metrics_to_yaml_test.py: 26 unit tests covering
  parsing, collection (flat and artifact subdirectory layouts), merging,
  YAML output, and the full pipeline.

- ci-summary-report.yml: On main branch runs, exports the combined
  metrics YAML and uploads it as an artifact (90-day retention) so it
  is available for release asset upload.

- ci-release.yml: Downloads the metrics YAML artifact from the latest
  CI Orchestrator run and uploads it as a release asset
  (jaeger-metrics.yaml), so the documentation repo can fetch it during
  its release process and place it in data/metrics/{version}/.

Signed-off-by: Yosof Badr <yosof@hey.com>
Signed-off-by: Yosof Badr <23705518+YosofBadr@users.noreply.github.com>
@yosofbadr yosofbadr marked this pull request as ready for review April 15, 2026 20:21
@yosofbadr yosofbadr requested a review from a team as a code owner April 15, 2026 20:21
Copilot AI review requested due to automatic review settings April 15, 2026 20:21
@dosubot dosubot Bot added the enhancement label Apr 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds tooling and CI/release integration to export Jaeger’s Prometheus /metrics snapshots (captured during E2E tests) into a structured YAML file intended for consumption by the documentation website.

Changes:

  • Added a Python exporter that parses Prometheus text-format snapshots, merges metrics across backends, and emits a single YAML document.
  • Added unit tests covering parsing, snapshot collection layouts, merging, and YAML serialization.
  • Updated CI Summary Report to generate and retain a jaeger-metrics-yaml artifact on main, and updated the release workflow to attach that YAML as a release asset.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
scripts/e2e/export_metrics_to_yaml.py New exporter script: parse snapshots, merge across backends, write deterministic YAML output.
scripts/e2e/export_metrics_to_yaml_test.py New unittest suite validating parsing/collection/merge/YAML output end-to-end.
.github/workflows/ci-summary-report.yml Installs YAML dependency and (on main) generates + uploads a long-retention YAML artifact.
.github/workflows/ci-release.yml Downloads the metrics YAML artifact from CI and uploads it as a release asset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/ci-release.yml Outdated
Comment on lines +176 to +186
echo "Downloading jaeger-metrics-yaml artifact from latest CI Orchestrator run on main"
LATEST_RUN=$(gh run list \
--repo "${GH_REPO}" \
--workflow "CI Orchestrator" \
--branch main \
--status success \
--limit 1 \
--json databaseId \
--jq '.[0].databaseId')
if [ -z "$LATEST_RUN" ]; then
echo "::warning::No successful CI Orchestrator run found on main; skipping metrics YAML upload"
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This downloads the metrics YAML from the latest successful main-branch run, which may not correspond to the commit/tag being released (main could have advanced). To avoid publishing mismatched metrics for a release, consider selecting the CI Orchestrator run for the release commit (e.g., filter by commit SHA) or otherwise ensuring the artifact matches env.BRANCH/the tag.

Suggested change
echo "Downloading jaeger-metrics-yaml artifact from latest CI Orchestrator run on main"
LATEST_RUN=$(gh run list \
--repo "${GH_REPO}" \
--workflow "CI Orchestrator" \
--branch main \
--status success \
--limit 1 \
--json databaseId \
--jq '.[0].databaseId')
if [ -z "$LATEST_RUN" ]; then
echo "::warning::No successful CI Orchestrator run found on main; skipping metrics YAML upload"
if [ "${{ github.event_name }}" = "release" ]; then
TARGET_REF="${{ github.event.release.tag_name }}"
echo "Resolving release tag ${TARGET_REF} to a commit SHA"
TARGET_SHA=$(gh api \
--repo "${GH_REPO}" \
"/repos/${GH_REPO}/commits/${TARGET_REF}" \
--jq '.sha')
else
TARGET_SHA="${GITHUB_SHA}"
fi
if [ -z "$TARGET_SHA" ]; then
echo "::warning::Could not determine the target commit SHA for this release; skipping metrics YAML upload"
exit 0
fi
echo "Downloading jaeger-metrics-yaml artifact from the CI Orchestrator run for commit ${TARGET_SHA}"
LATEST_RUN=$(gh run list \
--repo "${GH_REPO}" \
--workflow "CI Orchestrator" \
--branch main \
--status success \
--limit 100 \
--json databaseId,headSha \
--jq ".[] | select(.headSha == \"${TARGET_SHA}\") | .databaseId" | head -n 1)
if [ -z "$LATEST_RUN" ]; then
echo "::warning::No successful CI Orchestrator run found on main for commit ${TARGET_SHA}; skipping metrics YAML upload"

Copilot uses AI. Check for mistakes.
Comment thread scripts/e2e/export_metrics_to_yaml.py Outdated
Comment thread scripts/e2e/export_metrics_to_yaml.py Outdated
import os
import re
import sys
from collections import defaultdict
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultdict is imported but never used in this script. Removing unused imports helps keep the script minimal and avoids lint noise if/when static checks are added.

Suggested change
from collections import defaultdict

Copilot uses AI. Check for mistakes.
Comment thread scripts/e2e/export_metrics_to_yaml.py Outdated
Comment on lines 38 to 40
- name: Install dependencies
run: python3 -m pip install prometheus-client
run: python3 -m pip install prometheus-client pyyaml

Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow now installs pyyaml, but .github/workflows/ci-lint-checks.yaml runs python3 -m unittest discover -s scripts/e2e and currently only installs prometheus-client. Since the new unit tests import yaml directly, CI will fail unless pyyaml is also installed there.

Copilot uses AI. Check for mistakes.
Comment on lines +177 to +188
LATEST_RUN=$(gh run list \
--repo "${GH_REPO}" \
--workflow "CI Orchestrator" \
--branch main \
--status success \
--limit 1 \
--json databaseId \
--jq '.[0].databaseId')
if [ -z "$LATEST_RUN" ]; then
echo "::warning::No successful CI Orchestrator run found on main; skipping metrics YAML upload"
exit 0
fi
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gh run list ... --jq '.[0].databaseId' returns null when there are no matching runs; [ -z "$LATEST_RUN" ] won’t catch that and the next gh run download will fail noisily. Please treat null as “not found” (e.g., check for "null" or use a jq expression that yields an empty string).

Copilot uses AI. Check for mistakes.
@yurishkuro
Copy link
Copy Markdown
Member

Why does any of that need to be in this repository? The metrics are already captured as Prometheus-compatible plain text as workflow artifacts, reading them is the job for documentation.

@github-actions github-actions Bot added the waiting-for-author PR is waiting for author to respond to maintainer's comments label Apr 15, 2026
Signed-off-by: Yosof Badr <23705518+YosofBadr@users.noreply.github.com>
@github-actions github-actions Bot removed the waiting-for-author PR is waiting for author to respond to maintainer's comments label Apr 16, 2026
@Amaan729
Copy link
Copy Markdown

Hi! I've opened jaegertracing/documentation#1083 which sets up the documentation side of this — the data/metrics/{version}/ YAML structure, Hugo shortcode for rendering the metrics reference table, and a conversion script. It's designed to receive exactly the YAML format your export script produces. Happy to coordinate and align the two PRs.

- ci-release.yml: resolve release tag to commit SHA and filter the CI
  Orchestrator run by that commit, so the metrics YAML shipped with a
  release matches the release commit (not latest main). Falls back to
  latest-on-main when no per-commit run exists. All user-controlled
  inputs passed via env to avoid GHA injection.
- export_metrics_to_yaml.py: merge per-metric when collect_snapshots
  sees duplicate backend names, so label keys from later snapshots are
  not dropped.
Copilot AI review requested due to automatic review settings April 21, 2026 19:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +167 to +226
- name: Download metrics YAML for release commit
# Download the metrics YAML artifact produced by ci-summary-report.yml
# for the *release commit specifically* (not "latest on main"), so the
# documentation published alongside the release matches the code shipped
# in that release. Falls back to latest main if no run is found for the
# release commit (e.g. dry-run from a branch that hasn't hit main yet).
continue-on-error: true
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_REPO: ${{ github.repository }}
EVENT_NAME: ${{ github.event_name }}
RELEASE_TAG: ${{ github.event.release.tag_name }}
HEAD_SHA: ${{ github.sha }}
run: |
if [ "$EVENT_NAME" = "release" ]; then
echo "Resolving release tag ${RELEASE_TAG} to a commit SHA"
TARGET_SHA=$(gh api \
--repo "${GH_REPO}" \
"/repos/${GH_REPO}/commits/${RELEASE_TAG}" \
--jq '.sha')
else
TARGET_SHA="${HEAD_SHA}"
fi
echo "Target commit: ${TARGET_SHA}"

LATEST_RUN=$(gh run list \
--repo "${GH_REPO}" \
--workflow "CI Orchestrator" \
--commit "${TARGET_SHA}" \
--status success \
--limit 1 \
--json databaseId \
--jq '.[0].databaseId // ""')

if [ -z "$LATEST_RUN" ] || [ "$LATEST_RUN" = "null" ]; then
echo "::warning::No successful CI Orchestrator run found for commit ${TARGET_SHA}; \
falling back to latest successful run on main"
LATEST_RUN=$(gh run list \
--repo "${GH_REPO}" \
--workflow "CI Orchestrator" \
--branch main \
--status success \
--limit 1 \
--json databaseId \
--jq '.[0].databaseId // ""')
fi

if [ -z "$LATEST_RUN" ] || [ "$LATEST_RUN" = "null" ]; then
echo "::warning::No successful CI Orchestrator run found; skipping metrics YAML upload"
exit 0
fi
echo "Using CI Orchestrator run: $LATEST_RUN"
gh run download "$LATEST_RUN" \
--repo "${GH_REPO}" \
--name jaeger-metrics-yaml \
--dir .metrics-export || {
echo "::warning::jaeger-metrics-yaml artifact not found in run $LATEST_RUN; skipping"
exit 0
}
ls -la .metrics-export/
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step uses gh run list / gh run download, which requires the workflow token to have actions: read permission. The job currently sets an explicit permissions: block without actions, so these CLI calls will likely 403 and the metrics YAML will never be downloaded/uploaded. Add actions: read to jobs.publish-release.permissions (or adjust token usage) so the artifact fetch works as intended.

Copilot uses AI. Check for mistakes.
@yurishkuro
Copy link
Copy Markdown
Member

repeating my question - Why does any of that need to be in this repository? The metrics are already captured as Prometheus-compatible plain text as workflow artifacts, reading them is the job for documentation.

@github-actions github-actions Bot added the waiting-for-author PR is waiting for author to respond to maintainer's comments label Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement waiting-for-author PR is waiting for author to respond to maintainer's comments

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement framework to validate backwards compatibility of metrics

4 participants