feat: add metrics YAML export for documentation website#8385
feat: add metrics YAML export for documentation website#8385yosofbadr wants to merge 3 commits intojaegertracing:mainfrom
Conversation
Add a script and CI pipeline integration to export Prometheus metrics snapshots to a structured YAML data file suitable for consumption by the jaegertracing/documentation website. This addresses the final task in jaegertracing#6278: incorporating the metrics report into the documentation website. The approach follows the same pattern as CLI flags (YAML data files rendered by Hugo templates): - scripts/e2e/export_metrics_to_yaml.py: Parses raw Prometheus text-format snapshot files from E2E integration tests and produces a single YAML file with metric names, types, help strings, labels, and backend sources. - scripts/e2e/export_metrics_to_yaml_test.py: 26 unit tests covering parsing, collection (flat and artifact subdirectory layouts), merging, YAML output, and the full pipeline. - ci-summary-report.yml: On main branch runs, exports the combined metrics YAML and uploads it as an artifact (90-day retention) so it is available for release asset upload. - ci-release.yml: Downloads the metrics YAML artifact from the latest CI Orchestrator run and uploads it as a release asset (jaeger-metrics.yaml), so the documentation repo can fetch it during its release process and place it in data/metrics/{version}/. Signed-off-by: Yosof Badr <yosof@hey.com> Signed-off-by: Yosof Badr <23705518+YosofBadr@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds tooling and CI/release integration to export Jaeger’s Prometheus /metrics snapshots (captured during E2E tests) into a structured YAML file intended for consumption by the documentation website.
Changes:
- Added a Python exporter that parses Prometheus text-format snapshots, merges metrics across backends, and emits a single YAML document.
- Added unit tests covering parsing, snapshot collection layouts, merging, and YAML serialization.
- Updated CI Summary Report to generate and retain a
jaeger-metrics-yamlartifact onmain, and updated the release workflow to attach that YAML as a release asset.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
scripts/e2e/export_metrics_to_yaml.py |
New exporter script: parse snapshots, merge across backends, write deterministic YAML output. |
scripts/e2e/export_metrics_to_yaml_test.py |
New unittest suite validating parsing/collection/merge/YAML output end-to-end. |
.github/workflows/ci-summary-report.yml |
Installs YAML dependency and (on main) generates + uploads a long-retention YAML artifact. |
.github/workflows/ci-release.yml |
Downloads the metrics YAML artifact from CI and uploads it as a release asset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| echo "Downloading jaeger-metrics-yaml artifact from latest CI Orchestrator run on main" | ||
| LATEST_RUN=$(gh run list \ | ||
| --repo "${GH_REPO}" \ | ||
| --workflow "CI Orchestrator" \ | ||
| --branch main \ | ||
| --status success \ | ||
| --limit 1 \ | ||
| --json databaseId \ | ||
| --jq '.[0].databaseId') | ||
| if [ -z "$LATEST_RUN" ]; then | ||
| echo "::warning::No successful CI Orchestrator run found on main; skipping metrics YAML upload" |
There was a problem hiding this comment.
This downloads the metrics YAML from the latest successful main-branch run, which may not correspond to the commit/tag being released (main could have advanced). To avoid publishing mismatched metrics for a release, consider selecting the CI Orchestrator run for the release commit (e.g., filter by commit SHA) or otherwise ensuring the artifact matches env.BRANCH/the tag.
| echo "Downloading jaeger-metrics-yaml artifact from latest CI Orchestrator run on main" | |
| LATEST_RUN=$(gh run list \ | |
| --repo "${GH_REPO}" \ | |
| --workflow "CI Orchestrator" \ | |
| --branch main \ | |
| --status success \ | |
| --limit 1 \ | |
| --json databaseId \ | |
| --jq '.[0].databaseId') | |
| if [ -z "$LATEST_RUN" ]; then | |
| echo "::warning::No successful CI Orchestrator run found on main; skipping metrics YAML upload" | |
| if [ "${{ github.event_name }}" = "release" ]; then | |
| TARGET_REF="${{ github.event.release.tag_name }}" | |
| echo "Resolving release tag ${TARGET_REF} to a commit SHA" | |
| TARGET_SHA=$(gh api \ | |
| --repo "${GH_REPO}" \ | |
| "/repos/${GH_REPO}/commits/${TARGET_REF}" \ | |
| --jq '.sha') | |
| else | |
| TARGET_SHA="${GITHUB_SHA}" | |
| fi | |
| if [ -z "$TARGET_SHA" ]; then | |
| echo "::warning::Could not determine the target commit SHA for this release; skipping metrics YAML upload" | |
| exit 0 | |
| fi | |
| echo "Downloading jaeger-metrics-yaml artifact from the CI Orchestrator run for commit ${TARGET_SHA}" | |
| LATEST_RUN=$(gh run list \ | |
| --repo "${GH_REPO}" \ | |
| --workflow "CI Orchestrator" \ | |
| --branch main \ | |
| --status success \ | |
| --limit 100 \ | |
| --json databaseId,headSha \ | |
| --jq ".[] | select(.headSha == \"${TARGET_SHA}\") | .databaseId" | head -n 1) | |
| if [ -z "$LATEST_RUN" ]; then | |
| echo "::warning::No successful CI Orchestrator run found on main for commit ${TARGET_SHA}; skipping metrics YAML upload" |
| import os | ||
| import re | ||
| import sys | ||
| from collections import defaultdict |
There was a problem hiding this comment.
defaultdict is imported but never used in this script. Removing unused imports helps keep the script minimal and avoids lint noise if/when static checks are added.
| from collections import defaultdict |
| - name: Install dependencies | ||
| run: python3 -m pip install prometheus-client | ||
| run: python3 -m pip install prometheus-client pyyaml | ||
|
|
There was a problem hiding this comment.
This workflow now installs pyyaml, but .github/workflows/ci-lint-checks.yaml runs python3 -m unittest discover -s scripts/e2e and currently only installs prometheus-client. Since the new unit tests import yaml directly, CI will fail unless pyyaml is also installed there.
| LATEST_RUN=$(gh run list \ | ||
| --repo "${GH_REPO}" \ | ||
| --workflow "CI Orchestrator" \ | ||
| --branch main \ | ||
| --status success \ | ||
| --limit 1 \ | ||
| --json databaseId \ | ||
| --jq '.[0].databaseId') | ||
| if [ -z "$LATEST_RUN" ]; then | ||
| echo "::warning::No successful CI Orchestrator run found on main; skipping metrics YAML upload" | ||
| exit 0 | ||
| fi |
There was a problem hiding this comment.
gh run list ... --jq '.[0].databaseId' returns null when there are no matching runs; [ -z "$LATEST_RUN" ] won’t catch that and the next gh run download will fail noisily. Please treat null as “not found” (e.g., check for "null" or use a jq expression that yields an empty string).
|
Why does any of that need to be in this repository? The metrics are already captured as Prometheus-compatible plain text as workflow artifacts, reading them is the job for documentation. |
Signed-off-by: Yosof Badr <23705518+YosofBadr@users.noreply.github.com>
|
Hi! I've opened jaegertracing/documentation#1083 which sets up the documentation side of this — the |
- ci-release.yml: resolve release tag to commit SHA and filter the CI Orchestrator run by that commit, so the metrics YAML shipped with a release matches the release commit (not latest main). Falls back to latest-on-main when no per-commit run exists. All user-controlled inputs passed via env to avoid GHA injection. - export_metrics_to_yaml.py: merge per-metric when collect_snapshots sees duplicate backend names, so label keys from later snapshots are not dropped.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: Download metrics YAML for release commit | ||
| # Download the metrics YAML artifact produced by ci-summary-report.yml | ||
| # for the *release commit specifically* (not "latest on main"), so the | ||
| # documentation published alongside the release matches the code shipped | ||
| # in that release. Falls back to latest main if no run is found for the | ||
| # release commit (e.g. dry-run from a branch that hasn't hit main yet). | ||
| continue-on-error: true | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| GH_REPO: ${{ github.repository }} | ||
| EVENT_NAME: ${{ github.event_name }} | ||
| RELEASE_TAG: ${{ github.event.release.tag_name }} | ||
| HEAD_SHA: ${{ github.sha }} | ||
| run: | | ||
| if [ "$EVENT_NAME" = "release" ]; then | ||
| echo "Resolving release tag ${RELEASE_TAG} to a commit SHA" | ||
| TARGET_SHA=$(gh api \ | ||
| --repo "${GH_REPO}" \ | ||
| "/repos/${GH_REPO}/commits/${RELEASE_TAG}" \ | ||
| --jq '.sha') | ||
| else | ||
| TARGET_SHA="${HEAD_SHA}" | ||
| fi | ||
| echo "Target commit: ${TARGET_SHA}" | ||
|
|
||
| LATEST_RUN=$(gh run list \ | ||
| --repo "${GH_REPO}" \ | ||
| --workflow "CI Orchestrator" \ | ||
| --commit "${TARGET_SHA}" \ | ||
| --status success \ | ||
| --limit 1 \ | ||
| --json databaseId \ | ||
| --jq '.[0].databaseId // ""') | ||
|
|
||
| if [ -z "$LATEST_RUN" ] || [ "$LATEST_RUN" = "null" ]; then | ||
| echo "::warning::No successful CI Orchestrator run found for commit ${TARGET_SHA}; \ | ||
| falling back to latest successful run on main" | ||
| LATEST_RUN=$(gh run list \ | ||
| --repo "${GH_REPO}" \ | ||
| --workflow "CI Orchestrator" \ | ||
| --branch main \ | ||
| --status success \ | ||
| --limit 1 \ | ||
| --json databaseId \ | ||
| --jq '.[0].databaseId // ""') | ||
| fi | ||
|
|
||
| if [ -z "$LATEST_RUN" ] || [ "$LATEST_RUN" = "null" ]; then | ||
| echo "::warning::No successful CI Orchestrator run found; skipping metrics YAML upload" | ||
| exit 0 | ||
| fi | ||
| echo "Using CI Orchestrator run: $LATEST_RUN" | ||
| gh run download "$LATEST_RUN" \ | ||
| --repo "${GH_REPO}" \ | ||
| --name jaeger-metrics-yaml \ | ||
| --dir .metrics-export || { | ||
| echo "::warning::jaeger-metrics-yaml artifact not found in run $LATEST_RUN; skipping" | ||
| exit 0 | ||
| } | ||
| ls -la .metrics-export/ |
There was a problem hiding this comment.
This step uses gh run list / gh run download, which requires the workflow token to have actions: read permission. The job currently sets an explicit permissions: block without actions, so these CLI calls will likely 403 and the metrics YAML will never be downloaded/uploaded. Add actions: read to jobs.publish-release.permissions (or adjust token usage) so the artifact fetch works as intended.
|
repeating my question - Why does any of that need to be in this repository? The metrics are already captured as Prometheus-compatible plain text as workflow artifacts, reading them is the job for documentation. |
Summary
Addresses the final task in #6278: implement a way to incorporate the metrics report into documentation website.
This PR adds the tooling and CI integration to export Prometheus metrics snapshots to a structured YAML data file, following the same pattern as the CLI flags YAML files that were previously used by the documentation website (
data/cli/{version}/).Changes
scripts/e2e/export_metrics_to_yaml.py: New Python script that parses raw Prometheus text-format snapshot files (as scraped from Jaeger's/metricsendpoint during E2E integration tests) and produces a single YAML file containing:service_instance_idexcluded)scripts/e2e/export_metrics_to_yaml_test.py: 26 unit tests covering parsing, collection (both flat andgh run downloadsubdirectory layouts), merging, YAML serialization, and the full end-to-end pipeline..github/workflows/ci-summary-report.yml: On main branch CI runs, exports the combined metrics as a YAML artifact (jaeger-metrics-yaml, 90-day retention) so it persists beyond the default 7-day artifact window..github/workflows/ci-release.yml: During releases, downloads the metrics YAML artifact from the latest successful CI Orchestrator run on main and uploads it as a release asset (jaeger-metrics.yaml). The documentation repo can then fetch this asset during its release process and place it indata/metrics/{version}/for Hugo template rendering.How it fits together
Test plan
Closes #6278