Address copilot-pull-request-reviewer findings

jpayne3506 · Copilot · jpayne3506 · commit ad56b892c996 · 2026-06-02T20:11:26.000-05:00
Four issues raised by the GitHub Copilot bot review on PR #4440: 1. Per-workflow run-ID disambiguation. Discovery now binds GOVULNCHECK_RUN_ID / GOVULNCHECK_RUN_URL / GOVULNCHECK_SOURCE_SHA / GOVULNCHECK_SOURCE_BRANCH and BASEIMAGES_* separately, plus PRIMARY_SOURCE_SHA / PRIMARY_RUN_URL for downstream sections that need a single canonical reference (fix branch name, Fix-PR body). The previous single RUN_ID made multi-workflow invocations ambiguous and could feed the wrong log into the govulncheck version-banner parser. 2. Govulncheck version detection now reads GOVULNCHECK_RUN_ID explicitly, so it parses the right run's log even when both workflows failed in the same invocation. 3. Drop the 'gh pr view --json headRefOid' fallback in Fix-mode setup step 3. PRIMARY_SOURCE_SHA from Discovery (which comes from the workflow run's head_sha field) is the failing run's exact head SHA. The PR-view fallback returned the PR's CURRENT head, which can differ from the failing run after a force-push - directly violating the section's own 'exact head SHA' guarantee. 4. Govulncheck playbook: replace 'git add -A' with explicit allowlist-only staging per touched module. The BPF setup step (make bpf-lib + go generate ./...) can regenerate bpf2go output files (*_bpfel.go, *_bpfeb.go) under the module package; under git add -A those generated files would land in the commit alongside go.mod/go.sum, silently breaching the govulncheck allowlist. Also added 'git checkout -- . && git clean -fd' after the commit so the baseimages playbook starts from a clean tree (its first_diff check would otherwise observe leftover BPF artifacts as drift). Baseimages playbook now asserts a clean tree at start as a defensive precondition. Track FIXED_MODULES across the per-module loop so commit-time staging knows which paths to add. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
diff --git a/.github/agents/ci-mx.md b/.github/agents/ci-mx.md
@@ -115,26 +115,56 @@ The flow is: (1) find the workflow's most recent failure regardless of
 branch, then (2) decide whether that failure applies to `$target_branch`
 by reading the target's actual contents.
 
-### 1. Workflow-scoped query
+### 1. Workflow-scoped query (per workflow)
 
 For each in-scope workflow, fetch the most recent failed run regardless
-of which branch it ran on:
+of which branch it ran on. **Bind separate variables per workflow** —
+both can fail in the same invocation, and downstream steps need to
+distinguish them (e.g., the govulncheck version banner is in the
+govulncheck run's log, not the baseimages run's).
 
 ```bash
-# Govulncheck (repeat for baseimages.yaml).
-read RUN_ID run_url SOURCE_SHA SOURCE_BRANCH < <(
+# Govulncheck.
+read GOVULNCHECK_RUN_ID GOVULNCHECK_RUN_URL \
+     GOVULNCHECK_SOURCE_SHA GOVULNCHECK_SOURCE_BRANCH < <(
   gh api "/repos/$GH_OWNER/$GH_REPO/actions/workflows/govulncheck.yaml/runs?per_page=20&status=failure" \
     --jq '.workflow_runs[0] | "\(.id) \(.html_url) \(.head_sha) \(.head_branch)"'
 )
-# When the user supplied an explicit run URL or ID, use it directly
-# (preserving its head_sha and head_branch).
+
+# Baseimages.
+read BASEIMAGES_RUN_ID BASEIMAGES_RUN_URL \
+     BASEIMAGES_SOURCE_SHA BASEIMAGES_SOURCE_BRANCH < <(
+  gh api "/repos/$GH_OWNER/$GH_REPO/actions/workflows/baseimages.yaml/runs?per_page=20&status=failure" \
+    --jq '.workflow_runs[0] | "\(.id) \(.html_url) \(.head_sha) \(.head_branch)"'
+)
+
+# When the user supplied an explicit run URL or ID, set the matching
+# pair from `gh run view <id> --json databaseId,url,headSha,headBranch`
+# and clear the other pair if it was incidental.
 ```
 
-`$SOURCE_BRANCH` / `$SOURCE_SHA` describe **where** the failure was
+`$*_SOURCE_BRANCH` / `$*_SOURCE_SHA` describe **where** each failure was
 first observed — usually a feature branch or a merge-queue ref. They
 are NOT the target the agent fixes; they're the evidence we reason
 from.
 
+For downstream sections that need a single canonical reference (the
+fix-branch name and the Fix-PR body), define:
+
+```bash
+# Most-recent failure SHA across both workflows. The fix is applied at
+# this SHA so the fix PR cleanly applies to the most current known
+# broken state.
+if [ -n "$GOVULNCHECK_SOURCE_SHA" ] && [ -n "$BASEIMAGES_SOURCE_SHA" ]; then
+  # Prefer whichever run is newer (compare via createdAt from above).
+  PRIMARY_SOURCE_SHA="$GOVULNCHECK_SOURCE_SHA"   # or BASEIMAGES_SOURCE_SHA
+  PRIMARY_RUN_URL="$GOVULNCHECK_RUN_URL"         # or BASEIMAGES_RUN_URL
+else
+  PRIMARY_SOURCE_SHA="${GOVULNCHECK_SOURCE_SHA:-$BASEIMAGES_SOURCE_SHA}"
+  PRIMARY_RUN_URL="${GOVULNCHECK_RUN_URL:-$BASEIMAGES_RUN_URL}"
+fi
+```
+
 ### 2. Tiebreaker: prefer a newer target-scoped signal if it exists
 
 A run scoped directly to `$target_branch` that's **newer** than the
@@ -146,10 +176,14 @@ read TARGET_RUN_ID target_run_conclusion < <(
     --limit 1 --json databaseId,conclusion,createdAt \
     --jq '.[0] | "\(.databaseId) \(.conclusion)"'
 )
-# If TARGET_RUN_ID exists and its createdAt > SOURCE workflow run's createdAt:
-#   - conclusion=success → DEFINITIVE NEGATIVE: workflow is healthy on target.
-#   - conclusion=failure → DEFINITIVE POSITIVE: re-anchor RUN_ID, run_url,
-#     SOURCE_SHA, SOURCE_BRANCH to the target run; proceed.
+# If TARGET_RUN_ID exists and its createdAt > the matching workflow run's
+# createdAt:
+#   - conclusion=success → DEFINITIVE NEGATIVE: workflow is healthy on
+#     target.
+#   - conclusion=failure → DEFINITIVE POSITIVE: re-anchor that
+#     workflow's *_RUN_ID, *_RUN_URL, *_SOURCE_SHA, *_SOURCE_BRANCH
+#     (and PRIMARY_* if applicable) to the target run; proceed.
+# Repeat for baseimages.yaml.
 ```
 
 This also handles fork PRs: when `--branch` returns nothing (fork heads
@@ -176,16 +210,16 @@ reads (base64-decoded). The conclusion bucket per failure is one of:
 
 | Signal on `$target_branch` | Bucket |
 |---|---|
-| Target-scoped run newer than `$RUN_ID` and passed | `does-not-apply` |
+| Target-scoped run newer than the matching `*_RUN_ID` and passed | `does-not-apply` |
 | Target-scoped run newer and also failed | `fixable` (re-anchor to target run) |
-| No newer target-scoped run; target's render-input files (`build/images.mk`, every `*/Dockerfile.tmpl`, every `*/manifests/*` referenced by the renderkit) are **byte-identical** to those on `$SOURCE_SHA` (compare SHAs via `gh api /git/trees/{sha}?recursive=1` or per-file blob SHAs) | `fixable` (inferred positive — same templates + same external images = same render diff) |
-| Render-input files differ between target and `$SOURCE_SHA` | `needs-probe` (drift may or may not produce a diff on target — only an actual render can tell) |
+| No newer target-scoped run; target's render-input files (`build/images.mk`, every `*/Dockerfile.tmpl`, every `*/manifests/*` referenced by the renderkit) are **byte-identical** to those on `$BASEIMAGES_SOURCE_SHA` (compare SHAs via `gh api /git/trees/{sha}?recursive=1` or per-file blob SHAs) | `fixable` (inferred positive — same templates + same external images = same render diff) |
+| Render-input files differ between target and `$BASEIMAGES_SOURCE_SHA` | `needs-probe` (drift may or may not produce a diff on target — only an actual render can tell) |
 
 #### Govulncheck applicability table (per finding)
 
 Each finding identifies a vulnerable module path, vulnerable version
 range, fixed version, package, and **proven call-graph reachability on
-`$SOURCE_SHA`**. For each finding, read `$target_branch`'s
+`$GOVULNCHECK_SOURCE_SHA`**. For each finding, read `$target_branch`'s
 `<matrix-module>/go.mod` and `<matrix-module>/go.sum`:
 
 | Signal on `$target_branch` | Bucket |
@@ -194,7 +228,7 @@ range, fixed version, package, and **proven call-graph reachability on
 | Target-scoped run newer and reports the same finding | `fixable` (re-anchor) |
 | Vulnerable module path is **not** required by target's `go.mod` and is **absent** from target's `go.sum` | `does-not-apply` |
 | Target's `go.sum` resolves the vulnerable module to a version **outside** the vulnerable range | `does-not-apply` |
-| Target's `go.sum` resolves the vulnerable module to a version **inside** the vulnerable range AND the diff between `$SOURCE_SHA` and target's HEAD does **not** touch the affected packages | `fixable` (reachability proven on source carries over to target) |
+| Target's `go.sum` resolves the vulnerable module to a version **inside** the vulnerable range AND the diff between `$GOVULNCHECK_SOURCE_SHA` and target's HEAD does **not** touch the affected packages | `fixable` (reachability proven on source carries over to target) |
 | Same as above, but the source↔target diff **does** touch the affected packages | `needs-probe` (reachability may have changed; fix mode's post-bump re-run will prove or disprove) |
 
 The remaining classification rules still apply to whatever survives as
@@ -220,7 +254,7 @@ version in the first few log lines:
 
 ```bash
 GOVULNCHECK_VERSION=$(
-  gh run view "$RUN_ID" --log-failed \
+  gh run view "$GOVULNCHECK_RUN_ID" --log-failed \
     | grep -oE 'govulncheck@v[0-9][0-9.]*' | head -1 \
     | sed 's/^govulncheck@//'
 )
@@ -234,7 +268,7 @@ isn't present.
 
 Read-only. For every in-scope workflow with a current failure, emit:
 
-- Workflow name, run URL, head SHA (`$SOURCE_BRANCH` + `$SOURCE_SHA`).
+- Workflow name, run URL, head SHA (`$*_SOURCE_BRANCH` + `$*_SOURCE_SHA`).
 - Per failing job, the **applicability bucket** from Discovery:
   - `fixable` — include the exact `go get <module>@<fixed>` command(s)
     or the `make dockerfiles` action that fix mode would run, plus a
@@ -282,15 +316,20 @@ if [ -n "$source_pr_number" ]; then
   fi
 fi
 
-# 3. Require the run's exact head SHA. Do NOT fall back to the branch tip:
-#    the branch may have been force-pushed since the failing run.
-if [ -z "${RUN_HEAD_SHA:-}" ] && [ -n "$source_pr_number" ]; then
-  RUN_HEAD_SHA=$(gh pr view "$source_pr_number" --json headRefOid \
-                 -q .headRefOid)
+# 3. Use the failing run's exact head SHA. Discovery already bound
+#    PRIMARY_SOURCE_SHA from the workflow run's head_sha — that is the
+#    exact commit the workflow ran against. Do NOT fall back to
+#    `gh pr view --json headRefOid` (the PR's CURRENT head), because if
+#    the PR was force-pushed after the failing run, those two SHAs
+#    differ and we would fix the wrong commit.
+target_head_sha="${PRIMARY_SOURCE_SHA:-}"
+if [ -z "$target_head_sha" ] && [ -n "${USER_PROVIDED_RUN_ID:-}" ]; then
+  # User supplied a run URL/ID without Discovery; resolve directly.
+  target_head_sha=$(gh run view "$USER_PROVIDED_RUN_ID" \
+                    --json headSha -q .headSha)
 fi
-[ -n "${RUN_HEAD_SHA:-}" ] || \
-  { echo "stop:input-invalid (no run head SHA)"; exit 1; }
-target_head_sha="$RUN_HEAD_SHA"
+[ -n "$target_head_sha" ] || \
+  { echo "stop:input-invalid (could not resolve failing run's head SHA)"; exit 1; }
 
 # 4. Fetch the ref so the SHA is present locally.
 if [ -n "$source_pr_number" ]; then
@@ -301,15 +340,15 @@ fi
 git cat-file -e "$target_head_sha^{commit}" \
   || { echo "stop:input-invalid (could not fetch $target_head_sha)"; exit 1; }
 
-# 5. Names. Include RUN_ID for collision-avoidance.
+# 5. Names. Include the primary run ID for collision-avoidance.
 short_sha=${target_head_sha:0:8}
-fix_branch="ci-mx/fix-${source_pr_number:-$short_sha}-${RUN_ID:-$$}"
+fix_branch="ci-mx/fix-${source_pr_number:-$short_sha}-${GOVULNCHECK_RUN_ID:-${BASEIMAGES_RUN_ID:-$$}}"
 
 # 6. Capture main_repo BEFORE entering the worktree, so teardown is reliable.
 main_repo="$(git rev-parse --show-toplevel)"
 
 # 7. Worktree as sibling of the repo (not inside .git/).
-work_dir="$(dirname "$main_repo")/ci-mx-work-${RUN_ID:-$$}"
+work_dir="$(dirname "$main_repo")/ci-mx-work-${GOVULNCHECK_RUN_ID:-${BASEIMAGES_RUN_ID:-$$}}"
 
 git worktree add --detach "$work_dir" "$target_head_sha"
 cd "$work_dir"
@@ -348,10 +387,20 @@ Run only for jobs Discovery classified as `fixable`. If **any** job in the
 failing matrix is `stop:*`, do not start the playbook — report all
 blockers and exit. No partial fixes.
 
+Track every module the playbook actually touches in a shell array so
+the commit step can stage allowlisted paths explicitly:
+
+```bash
+FIXED_MODULES=()   # populated per successful module below
+```
+
 For each `fixable` matrix module (in matrix order):
 
 1. `cd "$work_dir/<module>"` (`.` means `$work_dir`).
-2. If the module is BPF, mirror the workflow setup:
+2. If the module is BPF, mirror the workflow setup so `govulncheck`
+   can load the package. These regenerated artifacts (e.g.
+   `*_bpfel.go`, `*_bpfeb.go`) are build-verification side-effects,
+   **not** allowlisted edits — they must not be committed:
    ```bash
    ( cd "$work_dir" && make bpf-lib )
    go generate ./...
@@ -396,20 +445,53 @@ For each `fixable` matrix module (in matrix order):
    ```
    Findings remain → run Cleanup snippet, then
    `stop:unfixable (post-bump findings)`.
-
-After all `fixable` modules complete:
+10. Module succeeded; record it for staging:
+    ```bash
+    module_path="$(realpath --relative-to="$work_dir" .)"
+    FIXED_MODULES+=("$module_path")
+    ```
+
+After all `fixable` modules complete, commit **only the allowlisted
+paths** for each touched module. `git add -A` is **forbidden** here — it
+would sweep in BPF regen output (`*_bpfel.go`, `*_bpfeb.go`) and any
+other build-verification artifacts, violating the govulncheck
+allowlist:
 
 ```bash
 cd "$work_dir"
-git add -A
+for mod in "${FIXED_MODULES[@]}"; do
+  git add "$mod/go.mod" "$mod/go.sum"
+  [ -d "$mod/vendor" ] && git add "$mod/vendor"
+done
 git commit -m "fix(deps): resolve govulncheck findings"
+
+# Reset every other tracked path the BPF setup may have touched, and
+# remove any untracked artifacts it produced, so the baseimages
+# playbook starts from a clean tree.
+git checkout -- .
+git clean -fd
 ```
 
 ## Baseimages playbook (only in `op_mode=fix`)
 
 Assumes Fix-mode setup is done. Runs in the same `$work_dir`; if both
 playbooks run, commits stack on the same `$fix_branch`.
 
+If the govulncheck playbook ran first, it already left the tree clean
+(`git checkout -- .` + `git clean -fd` after its commit), so the
+`first_diff` check below faithfully reflects only `make dockerfiles`'s
+output. **If you skip govulncheck**, assert a clean tree before step
+2 — anything left over from a prior step would corrupt `first_diff`:
+
+```bash
+cd "$work_dir"
+if [ -n "$(git status --porcelain)" ]; then
+  echo "stop:env-broken (baseimages playbook requires a clean tree at start)"
+  # Run the Cleanup snippet, then:
+  exit 1
+fi
+```
+
 1. Preflight: confirm `go` and `skopeo` on `PATH`. Missing → run Cleanup
    snippet, then `stop:env-broken (missing tooling: <name>)`.
 2. From `$work_dir`:
@@ -430,7 +512,10 @@ playbooks run, commits stack on the same `$fix_branch`.
      exit 1
    fi
    ```
-5. Commit:
+5. Commit. `git add -A` is acceptable here because `make dockerfiles`
+   only writes rendered Dockerfile outputs into known paths, and the
+   pre-step tree-clean assertion above guarantees nothing else is in
+   play:
    ```bash
    git add -A
    git commit -m "chore(images): re-render Dockerfiles"
@@ -451,7 +536,7 @@ fix_pr_title="ci-mx: fix CI failures on $target_branch"
 fix_pr_body=$(cat <<EOF
 Automated fix from \`ci-mx\` for CI failures on \`$target_branch\` at \`$target_head_sha\`.
 
-- Failing run: $run_url
+- Failing run: $PRIMARY_RUN_URL
 - Verified by re-running the failing CI check locally per the ci-mx contract.
 
 Scope: govulncheck dependency bumps and/or \`make dockerfiles\` re-render