Skip to content

ci: re-render Dockerfiles#4443

Draft
jpayne3506 wants to merge 1 commit into
masterfrom
ci-mx/fix-5c65686c-24103108573
Draft

ci: re-render Dockerfiles#4443
jpayne3506 wants to merge 1 commit into
masterfrom
ci-mx/fix-5c65686c-24103108573

Conversation

@jpayne3506
Copy link
Copy Markdown
Contributor

PoC for Azure/azure-container-networking#4440 — the ci-mx custom agent definition. This PR demonstrates the agent operating in op_mode=fix against master. It may be closed without merging once the agent PR is reviewed.

Automated fix from ci-mx for CI failures on master at 5c65686c218c2109f433511942512ed8da2f9a86.

Scope: govulncheck dependency bumps and/or make dockerfiles re-render only. Never edits workflow YAML, the matrix, Makefiles, Dockerfile templates, or the Go toolchain version. Never auto-merged.

Blockers surfaced (out of ci-mx scope, human action required)

ci-mx detected 9 govulncheck failures classified as stop:out-of-scope (stdlib) on master. They require a Go toolchain bump and are NOT included in this fix PR:

  • Run govulncheck (.), azure-ip-masq-merger, azure-ipam, azure-iptables-monitor, bpf-prog/ipv6-hp-bpf, cilium-log-collector, dropgz, tools/azure-npm-to-cilium-validator, zapai — all blocked on stdlib crypto/x509 / net/textproto findings fixed in go1.26.4.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jpayne3506 jpayne3506 changed the title ci-mx: PoC fix CI failures on master ci: re-render Dockerfiles Jun 3, 2026
@jpayne3506 jpayne3506 added ci Infra or tooling. cni Related to CNI. cns Related to CNS. labels Jun 3, 2026
jpayne3506 added a commit that referenced this pull request Jun 3, 2026
Surfaced during the fix-mode PoC against master/v1.6/v1.7. The agent
opened three valid fix PRs (#4442, #4443, #4444) but with generic
'ci-mx: PoC fix CI failures on <branch>' titles and zero labels. The
repo convention is 'ci: <description>' with optional '(release/vX.Y)'
suffix, plus flat labels like ci/cni/cns/cilium/dependencies.

Fix: in the Fix-PR creation section, generate:

- Title: ci: <description> [(release/vX.Y) | for #N [(release/vX.Y)]]
  where <description> is 're-render Dockerfiles', 'resolve govulncheck
  findings', or 're-render Dockerfiles and resolve govulncheck
  findings' depending on which playbooks committed.

- Labels: always 'ci'; add 'dependencies' when govulncheck ran; add
  area labels (cni, cns, cilium) by matching paths in the commit's
  diff against a small known-area map. Filter the final set against
  'gh label list' so labels that don't exist in the repo are skipped
  (no auto-creation).

The three existing PoC PRs were retitled and relabeled to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jpayne3506 added a commit that referenced this pull request Jun 3, 2026
The repo already has an 'Agent-Generated' label that other automation
uses. Adding it to ci-mx's fix-PR creation so reviewers can filter all
agent-authored PRs uniformly.

The three existing PoC PRs (#4442, #4443, #4444) were also relabeled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jpayne3506 jpayne3506 removed cns Related to CNS. cni Related to CNI. labels Jun 3, 2026
jpayne3506 added a commit that referenced this pull request Jun 3, 2026
Reverting the file-touched -> component-label mapping (cni/cns/cilium)
added in the previous label-generation commit. Component ownership is
something humans should triage; AI guessing at it from path prefixes
is brittle and easy to get wrong (e.g., a render that touches
cni/Dockerfile isn't necessarily a 'cni' change, and refactors that
move files between components would silently mislabel).

Agent now applies only non-component labels:
- ci (always)
- Agent-Generated (always)
- dependencies (only when the govulncheck playbook ran)

The label-existence filter against gh label list is kept (still no
auto-creation).

The three existing PoC PRs (#4442, #4443, #4444) were stripped of
their cni and cns labels.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jpayne3506 added a commit that referenced this pull request Jun 3, 2026
Adds .github/agents/ci-mx.md, a custom Copilot agent (invocable from
PR comments to @copilot, the GitHub Agents tab, and Copilot CLI
sub-agent calls) that resolves CI failures in two narrowly-scoped
workflows: govulncheck.yaml and baseimages.yaml.

Design properties:

- Two operating modes: diagnose (read-only triage, default) and fix.
  Inferred from invocation language.
- Workflow-scoped Discovery with per-failure applicability inference:
  reads the failing workflow's repo-wide state, then reads the target
  branch's actual contents (go.mod, go.sum, render-input tree SHAs)
  via gh api to decide whether the failure applies. No reliance on
  recent branch-scoped runs or nightly triggers.
- Five canonical STOP categories (out-of-scope, unfixable,
  cannot-publish, env-broken, input-invalid) with reason text.
- Never commits to the source PR. Opens a separate ci-mx-owned fix
  PR on a fresh branch, cross-linked from the source PR. Isolated
  worktree at the failing run's exact head SHA (no force-push hazard).
- Strict edit allowlist: go.mod / go.sum / vendor/** for govulncheck;
  only files make dockerfiles rewrites for baseimages. Never workflow
  YAML, Makefile, matrix, Dockerfile templates, or Go toolchain.
- Directive guards: go get / tidy that bumps go or toolchain
  directive triggers stop:out-of-scope.
- Allowlist-explicit git add per touched module + clean-tree handoff
  between playbooks (BPF go generate outputs never reach commits).
- Conventional-commits PR titles (ci: <description> [(release/vX.Y)])
  + always-on labels (ci, Agent-Generated, +dependencies for
  govulncheck). Component labels left to human reviewers.
- Duplicate-detection at fix-PR creation: open ci-mx fix PRs targeting
  the same branch trigger a first-encounter STOP that surfaces three
  resolution options (supersede, update, defer) in the assistant
  response, on the existing fix PR, and on the source PR.
- Cleanup snippet runs at every STOP and on the success path
  (releases worktree + local branch ref).

Validation during authoring:

- Three rubber-duck critique rounds folded into the design.
- Four review findings from copilot-pull-request-reviewer[bot] fixed
  (per-workflow run-ID disambiguation, dropped unsafe PR-headRefOid
  fallback, allowlist-explicit staging, govulncheck version pinning).
- Two PoC iterations against master / release/v1.7 / release/v1.6:
  - Diagnose mode confirmed correct classification across branches,
    including suppressing modules that don't exist on release branches
    via does-not-apply, and STOP:unfixable for non-vuln BPF build
    failures.
  - Fix mode opened three draft PRs (#4442, #4443, #4444) with valid
    baseimages re-renders against the three release trains; govulncheck
    correctly skipped per the all-or-nothing matrix rule when stdlib
    STOPs were present.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agent-Generated ci Infra or tooling.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant