Skip to content

feat(ci): M-sec CI/build/gates v1#3

Merged
devonartis merged 32 commits into
developfrom
feature/ci-msec
Apr 10, 2026
Merged

feat(ci): M-sec CI/build/gates v1#3
devonartis merged 32 commits into
developfrom
feature/ci-msec

Conversation

@devonartis

Copy link
Copy Markdown
Owner

Summary

Implements the M-sec CI/build/gates pipeline per Obsidian KB Decision 015 and the design doc at .plans/designs/2026-04-10-ci-build-gates-msec-design.md. Follows the 31-task plan at .plans/specs/2026-04-10-ci-build-gates-msec-plan.md.

What changes

Workflows (.github/workflows/)

  • ci.yml — 13 parallel gates + dep-review + changelog + gate-parity + gates-passed aggregator
  • codeql.yml — Go SAST with security-extended queries (PR + push + weekly)
  • scorecard.yml — OpenSSF supply-chain Scorecard (weekly + main)
  • nightly.yml — L4 full regression with auto-issue-on-failure
  • contribution-policy.yml — Decision 014 auto-close, pull_request_target with NO PR-branch checkout (documented threat model)

Local gate infrastructure

  • scripts/gates.sh — extended from 4 gates to 13, modulefull, gosec flipped to blocking
  • scripts/smoke/core-contract.sh — L2.5 smoke (10-step issue/verify/revoke/deny flow with Ed25519 challenge-response)
  • scripts/test-gate-parity.sh — enforces gate list alignment between local and CI
  • .gosec.yml — documented G117/G304/G101 exclusions for a credential broker
  • .golangci.yml — security-aware linter set with mirrored gosec excludes

Governance

  • .github/dependabot.yml — github-actions, gomod, docker (weekly)
  • .github/CODEOWNERS + .github/MAINTAINERS — ownership + contribution-policy allowlist

Dependencies

  • go.mod toolchain go1.25.7go1.25.9 — resolves TD-VUL-001..004 (four stdlib CVEs: GO-2026-4947, -4946, -4870, -4601)

Code quality (discovered by new gates, fixed before first CI run)

  • internal/keystore/parseKey — defensive type assertion on priv.Public()
  • internal/mutauth/heartbeat.sweep — log auto-revoke failures instead of dropping
  • cmd/aactl/client — propagate json.Marshal/io.ReadAll errors
  • internal/store/sql_store — documented #nosec G202 on audit SELECT
  • gofmt normalization across 24 drifted files

What this is NOT

  • No release automation, GHCR publish, SLSA provenance (later cycle)
  • No pre-commit hook updates (separate small cycle)
  • No README badges (Task 30, post-merge)

Local verification (on feature/ci-msec)

  • go build ./cmd/broker ./cmd/aactl — OK
  • go test -short -count=1 ./... — 15/15 packages PASS
  • go test -race -count=1 ./... — 15/15 packages PASS
  • golangci-lint run ./... — clean
  • gosec -conf .gosec.yml -exclude=G117,G304,G101 -severity=medium ./... — 0 findings
  • govulncheck ./... — "No vulnerabilities found"
  • actionlint .github/workflows/*.yml — clean
  • ./scripts/test-gate-parity.sh — 13 gates match
  • Live Docker broker + ./scripts/smoke/core-contract.sh10/10 PASS (issue → verify → revoke → reject → out-of-scope denied)

PR tasks

  • First CI run completes (this may iterate — SHA-pinned action majors are newer than what the plan assumed)
  • Merge this PR to develop
  • Configure branch protection on develop (Task 27)
  • Merge develop → main via scripts/strip_for_main.sh
  • Configure branch protection on main (Task 29)
  • README badges (Task 30)
  • 7-day observation: first Dependabot PR, first nightly run

Rationale

Decision 015 lays out why M-sec (not generic M) is the right scope for a credential broker, and why CI must exist before the rebrand PR lands. This PR is the safety net that will catch regressions during and after the rebrand.

Related:

  • Obsidian KB Decision 015 — CI/Gates Strategy — Security-First, Rebrand-Resilient
  • Obsidian KB Decision 014 — No External Contributions (enforced by contribution-policy.yml)
  • .plans/designs/2026-04-10-ci-build-gates-msec-design.md
  • .plans/specs/2026-04-10-ci-build-gates-msec-plan.md

Catches up FLOW.md + MEMORY.md with the prior 2026-04-10 session
notes (ADR vs Decision split, obsidian:decision skill build,
branch cleanup, Decision 014 no-external-contributions) that had
been sitting uncommitted on develop.

Adds the M-sec CI/build/gates v1 design doc from the current
brainstorm cycle:
.plans/designs/2026-04-10-ci-build-gates-msec-design.md

The design doc is the implementation-level companion to Obsidian
KB Decision 015 (CI/Gates Strategy — Security-First, Rebrand-
Resilient). It covers:

- Five workflow files: ci.yml, codeql.yml, scorecard.yml,
  nightly.yml, contribution-policy.yml
- gates.sh extension: add contamination, govulncheck, docker-build,
  smoke-l2.5, sbom; flip gosec from warn to blocking
- scripts/smoke/core-contract.sh — L2.5 core contract smoke
  (issue/verify/revoke/deny out-of-scope)
- Pinned action SHAs + Dependabot maintenance
- Rollout plan with branch-protection sequencing

Deferred to later cycles: release automation, pre-commit hooks,
coverage gating, matrix builds.
Adds the task-level implementation plan for the M-sec CI pipeline:
.plans/specs/2026-04-10-ci-build-gates-msec-plan.md

31 tasks across 4 phases:
- Phase A (Tasks 1-11): Local infrastructure — .gosec.yml, .golangci.yml,
  scripts/gates.sh extension, L2.5 smoke script, parity test, local
  verification
- Phase B (Tasks 12-20): GitHub Actions workflows — dependabot config,
  CODEOWNERS/MAINTAINERS, ci.yml, codeql.yml, scorecard.yml, nightly.yml,
  contribution-policy.yml
- Phase C (Tasks 21-25): actionlint, pin action SHAs, push feature branch,
  iterate on CI-only issues, open PR
- Phase D (Tasks 26-31): merge, branch protection on develop, merge to main,
  branch protection on main, README badges, observation period

Each task has exact file paths, complete code/YAML, verification commands,
and commit messages. No placeholders.

Updates FLOW.md with the current-session block capturing:
- Decision 015 reference (Obsidian KB)
- Design + plan artifact paths
- Where to start next session (read order, first action, tool guidance)
- Explicit instructions to not re-brainstorm and not run devflow ceremony
  steps that don't apply to infrastructure cycles
Previous entry was ~70 lines of MEMORY.md-shaped prose (reasoning,
trade-offs, next-session instructions). FLOW.md is decisions and
actions only — the context lives in Decision 015, the design doc,
and the implementation plan.

Now 3 lines of content: decision, action (with paths to design +
plan), status pointing at Task 1 of the plan.
JSON format (gosec does not support YAML). Three globally-excluded
rules with documented rationale for why each is product-incompatible
for a credential broker: G117 (broker API returns tokens by design),
G304 (all file paths come from operator config, not user input),
G101 (every domain identifier trips the credential-name heuristic).

Severity gate: MEDIUM and HIGH block; LOW is advisory. Scope of
exclusions reviewed against code — zero MEDIUM/HIGH findings remain
across 48 files / 7393 lines.
Pre-existing formatting drift — trailing whitespace, blank-line
spacing after doc comments, align-after-annotate in struct literals.
Surfaced by adding gofmt as a blocking gate (M-sec).

Zero behavior change: every edit is a gofmt -w rewrite of files that
were already compiling and passing tests. Verified:
  - go build ./cmd/broker ./cmd/aactl: OK
  - go test -short ./...: 15/15 packages PASS
Before: priv.Public().(ed25519.PublicKey) — unchecked type assertion,
would panic if the stdlib contract ever broke (though it never has).

After: comma-ok form with an explicit error return. Unreachable in
practice — ed25519.PrivateKey.Public() is documented to return
ed25519.PublicKey — but the guard satisfies errcheck's
check-type-assertions rule and makes the invariant explicit for
future readers.

Scope: internal/keystore/parseKey (called at broker startup when
loading the persistent signing key). No behavior change on the
happy path. Keystore unit tests: PASS.
Before: sweep() called h.revSvc.Revoke("agent", id) with the return
assigned to blank, then unconditionally logged 'agent auto-revoked' —
even when the revocation actually failed. A persistent store error
would silently leave a missed-heartbeat agent marked as revoked in
logs while its tokens were still valid.

After: check the error; on failure log 'agent auto-revoke failed'
with err detail; on success log the existing auto-revoked message.
Agent stays tracked on failure so the next sweep retries.

Also fixes a misspell flagged by golangci-lint/misspell in the
doc comment ('cancelled' → 'canceled', Go convention).

Scope: internal/mutauth/heartbeat.go sweep(). Background goroutine
started by StartMonitor. Unit tests for mutauth PASS.
Three errcheck findings in cmd/aactl/client.go flagged by golangci-lint:
- authenticate(): json.Marshal of the auth request body discarded its
  error (practically unreachable for a map[string]string, but the
  pattern sets a bad example elsewhere in the codebase).
- authenticate(): io.ReadAll of the failure-response body discarded
  its error, so a truncated body on a 5xx would produce a misleading
  'auth failed (HTTP 500): ' with no body context.
- doPostWithToken(): io.ReadAll of the response body discarded its
  error — callers would see a success status with an empty body and
  no indication the read had failed.

All three now return wrapped errors. aactl is the operator-facing CLI
so loud failures are always preferable to silent truncation.

Unit tests: cmd/aactl PASS.
internal/store/sql_store.go QueryAuditEvents() concatenates fragments
into selectQ to assemble the optional WHERE clause. gosec G202 flags
the concatenation as potential SQL injection.

This is a documented false positive: the `where` value is built
entirely from fixed template strings (see whereClauses above), and
every user-supplied value becomes a bound `?` parameter in queryArgs.
No untrusted text enters the SQL string itself.

Added an inline `#nosec G202` comment with the explicit rationale
so reviewers don't have to re-derive the proof each time.

Also contains gofmt struct-field-alignment fixes for ErrAgent/AppNotFound,
AgentRecord.Scope, and AppRecord that were missed in the prior gofmt
commit (gofmt-only, no behavior change).

Store tests: PASS.
govet's unusedwrite analyzer was flagging every field-write in the
LaunchTokenRecord literal inside TestLaunchTokenRecord_SpecCompliance
because only rec.ConsumedAt is ever read. The flag was technically
correct — and entirely missing the test's purpose.

The test exists precisely so that an exhaustive field literal will
fail to compile if any field is renamed or removed from
store.LaunchTokenRecord. That's the contract: upstream refactor
breaks this test, which is the early-warning signal.

Changes:
- Expanded the doc comment to state the contract explicitly so a
  future reader (or linter-driven refactor) doesn't 'simplify' it.
- Added `_ = rec` with a comment explaining it silences unusedwrite
  on purpose. This is the idiomatic way to tell govet 'I know, it's
  intentional.'
- Incidental gofmt fix on testSecret+"..." string concatenation
  that was drifted.

Admin tests: PASS.
Security-aware linter set: errcheck, gosec, govet, ineffassign,
staticcheck, unused, gosimple, bodyclose, misspell, gofmt, goimports.

Tuning decisions documented inline:
- govet fieldalignment DISABLED: stylistic (struct memory layout),
  not a correctness class. Would force churn across every public DTO
  for sub-word savings that don't matter in an HTTP broker.
- govet shadow DISABLED: triggers on idiomatic nested `if err := ...`
  inside functions that already have an outer err. Not a bug class
  we've seen.
- gosec excludes: G117/G304/G101 mirrored from .gosec.yml (same
  rationale — broker API returns tokens, paths come from operator
  config, domain identifiers trip credential-name heuristic).
- gosec + errcheck suppressed in _test.go: weak-random and
  unchecked-setup-helper patterns are standard test-code practice.

First clean run: `golangci-lint run ./...` exits 0 (after the
lint-fix commits earlier in this branch). Build and unit tests
remain green (15/15 packages PASS).
First run of govulncheck on the M-sec baseline flagged 4 vulnerabilities,
all in the Go standard library, all fixable by bumping go.mod's
`toolchain` directive from go1.25.7 to go1.25.9:

  TD-VUL-001  GO-2026-4947   crypto/x509
  TD-VUL-002  GO-2026-4946   crypto/x509
  TD-VUL-003  GO-2026-4870   crypto/tls  (TLS 1.3 KeyUpdate DoS)
  TD-VUL-004  GO-2026-4601   net/url     (IPv6 host literal parsing)

Not fixing now: the branch is pre-merge, the fix is a one-line bump,
and doing it at Task 23 (right before the first push to origin) avoids
Dependabot opening a competing toolchain bump PR during rollout.

Consequence: local `./scripts/gates.sh full` will show govulncheck as
RED on this branch until Task 23. CI will be clean from the first push
because Task 23 lands before then.
New blocking gates (task mode):
  - contamination grep (enterprise refs in core)
  - govulncheck (stdlib/dep vulnerabilities)
  - go-mod-verify (tidy drift + module integrity)
  - format (gofmt -l empty)
  - vet (previously only in lint fallback)

Full mode adds:
  - unit-tests-race (go test -race)
  - docker-build
  - smoke-l2.5 (scripts/smoke/core-contract.sh — added in next task)
  - sbom (syft spdx-json)

Policy changes:
  - gosec flipped from warn_gate to run_gate (BLOCKING) per Decision 015
  - golangci-lint and gosec are now required — no fallback, fail-fast
    if the operator hasn't installed them
  - GOSEC_EXCLUDE=G117,G304,G101 kept in one variable so ci.yml, this
    script, and .golangci.yml all reference the same documented list
  - --list-gates flag added for scripts/test-gate-parity.sh (next task)
  - 'module' renamed to 'full'; 'module' retained as deprecated alias
    (prints a stderr NOTE) so muscle-memory still works
  - Dead references to live_test.sh / live_test_docker.sh removed —
    both scripts no longer exist

Verification: `./scripts/gates.sh task` on this commit: 8 PASS,
1 FAIL (govulncheck — TD-VUL-001..004, scheduled for Task 23).
Build + unit tests + lint + format + contamination + gosec all green.
10-step smoke verifying the credential broker's core contract:
  1. /v1/health 200
  2. admin auth (POST /v1/admin/auth)
  3. launch token creation (POST /v1/admin/launch-tokens)
  4. challenge nonce fetch (GET /v1/challenge)
  5. agent register via Ed25519 challenge-response (POST /v1/register)
  6. JWT structure check (alg=EdDSA, kid, exp>iat, jti)
  7. /v1/token/validate accepts (valid=true)
  8. /v1/revoke level=agent
  9. /v1/token/validate rejects after revoke (valid=false)
 10. out-of-scope requested_scope on register → 4xx (enforcement)

Deviation from the plan's original draft:
- Real API shapes — the plan had placeholder endpoints (/v1/agent/verify,
  revoke-by-jti). Actual broker uses /v1/token/validate and
  /v1/revoke {level, target}. Cross-checked against cmd/broker/main.go
  route table and internal/handler/revoke_hdl.go request DTO.
- Challenge-response is real crypto: the broker's /v1/register requires
  launch_token + nonce + Ed25519 public_key + signature(nonce). Pure
  bash can't do Ed25519, so we use python3 + cryptography (same pattern
  as tests/sec-l2b/integration.sh — already an established dependency).
- jq '.valid // empty' gotcha: jq's // operator treats `false` as
  empty, so .valid is extracted without // empty. Learned from step 9
  failing on the first run against a live broker.

Called by:
  - scripts/gates.sh full (after broker is up via scripts/stack_up.sh)
  - .github/workflows/ci.yml smoke-l2.5 job (Task 14)

Determinism notes: fresh Ed25519 keys and fresh nonces per run — this
is unavoidable for challenge-response. The contract check (what the
script verifies) is deterministic; only the wire values are not.

Verified: ran against a live Dockerized broker on localhost:8090:
  10/10 PASS (agent registered, revoked, rejected, OOS denied 403).
Reads gate IDs from two sources and fails if they diverge:
  A. scripts/gates.sh --list-gates (local source of truth)
  B. .github/workflows/ci.yml GATE_LIST_START/END comment block
     (CI source of truth)

Prevents silent drift: a developer adding a gate locally but
forgetting ci.yml will see this script fail and know to update
both. Conversely, a CI-only gate addition forces a gates.sh update.

Runs as its own gate both locally (in 'full' mode once ci.yml
exists) and in CI (as the gate-parity job in ci.yml — Task 14).

Currently exits 1 because .github/workflows/ci.yml doesn't exist
yet — will be created in Task 14 and immediately exercise this
script for the first time.
syft 1.x renamed the 'packages' subcommand to 'scan'. Running
'syft packages' still works but prints a deprecation warning to
stderr, which is noise in CI output.

Verified: syft scan dir:. -o spdx-json=sbom.spdx.json --quiet
produces an identical 27-package SPDX-2.3 SBOM to the old command.

Also affects the anchore/sbom-action used in ci.yml (Task 14 will
pin the action version that defaults to 'scan').
Covers tasks 1-9 of the M-sec plan: all local infrastructure changes
from feature/ci-msec — the five new/modified configs (.gosec.yml,
.golangci.yml, gates.sh, core-contract.sh, test-gate-parity.sh),
the six lint-fix commits (keystore, heartbeat, aactl client,
sql_store nosec, admin test doc, gofmt normalize), and the tech
debt tracker entry for the stdlib CVE baseline.

Phase B (GitHub Actions workflows) will add its own entry when
those files land.
Three ecosystems, weekly (Monday 06:00 UTC):
  - github-actions: rotates SHA-pinned workflow steps (Task 22 pins
    every action to a 40-char SHA; without Dependabot those stale)
  - gomod: direct and indirect Go module updates, grouped so PRs are
    reviewable as 'direct deps bump' vs 'indirect deps bump'
  - docker: Dockerfile base image updates, kept ungrouped with a
    lower PR limit because base image bumps often need individual
    testing

All PRs get the 'dependencies' label plus an ecosystem-specific
label for filtering. Commit prefix 'chore(deps)' matches the rest
of the repo's conventional-commits style.
CODEOWNERS:
  Global wildcard pointing at @devonartis. Primarily serves as
  documentation and as the review-required set for branch protection
  (Task 27/29). Per Decision 014, external contributions aren't
  accepted, so CODEOWNERS is not a gatekeeping mechanism for PRs
  from outside — that job belongs to contribution-policy.yml.

MAINTAINERS:
  Allowlist consumed by .github/workflows/contribution-policy.yml
  (Task 18). Workflow reads this file via the GitHub API (not via
  checkout) and exempts listed users from the auto-close policy.
  Anyone not in this file, not a bot, and not a repo collaborator
  with write access gets their PR closed with a templated comment
  pointing to the issues-only policy.
Parallel per-gate jobs (GATE_LIST_START/END matches gates.sh):
  build, vet, lint, format, contamination, unit-tests,
  unit-tests-race, gosec, govulncheck, go-mod-verify,
  docker-build, smoke-l2.5, sbom.

PR-only jobs:
  dep-review  — blocks on 'moderate' severity dep CVEs
  changelog   — requires CHANGELOG.md diff, skippable via label

Always-on:
  gate-parity   — runs scripts/test-gate-parity.sh
  gates-passed  — aggregator job that branch protection will gate on;
                  survives individual gate renames

smoke-l2.5 job details:
  Depends on docker-build so the image exists when the smoke runs.
  Installs python3 cryptography (required by the L2.5 script for
  the Ed25519 challenge-response). Runs scripts/stack_up.sh with
  the known test fixture AA_ADMIN_SECRET, waits up to 30s for
  /v1/health, runs scripts/smoke/core-contract.sh, tears down with
  stack_down.sh in always() so broker doesn't linger on failures.

gosec job:
  Uses securego/gosec@master with the documented exclusions
  (G117,G304,G101) and severity=medium — matches scripts/gates.sh.

Triggers: pull_request and push to develop/main.
Concurrency: cancel-in-progress per ref so superseded runs don't
  pile up on a busy branch.

All action refs use tags at plan time; Task 22 replaces every
@v<N> with a 40-char SHA + version comment before first push.
Runs on PR and push to develop/main, plus weekly scheduled scan
(Monday 07:31 UTC — off-peak, off-round-minute so we don't join
the thundering herd on :00).

Query suites: security-extended + security-and-quality. These
are stricter than the default 'security' suite but appropriate
for a security product.

Results populate the repo's Security tab (via security-events
write permission) and the CodeQL badge (added in Task 30). Will
be listed as a required status check on branch protection alongside
gates-passed.
Runs on push to main, weekly schedule (Tuesday 03:25 UTC —
staggered from CodeQL to avoid compounding load), and when
branch protection rules change.

Publishes results to:
  - OpenSSF Scorecard badge (added in Task 30)
  - SARIF uploaded to the repo's Security tab
  - 5-day artifact retention for audit trail

persist-credentials: false on checkout so the workflow can't
accidentally push. publish_results: true is required for the
public badge to update.

Informational only — NOT a required check. Scorecard's signal
value shows up once the repo flips public, because it scans
branch protection, code review practices, and publishing hygiene
from an outside-in perspective.
Runs ./scripts/gates.sh regression nightly at 05:17 UTC against
develop (off-peak, off-round-minute). Also triggerable via
workflow_dispatch for ad-hoc catches.

Failure handling:
  - continue-on-error on the test step so we can upload evidence
    before failing the workflow
  - tests/**/evidence/ uploaded as 14-day artifact
  - actions/github-script opens an issue tagged
    'regression/nightly/needs-triage' with commit, branch, run
    URL, and a triage checklist — maintainers see it without
    watching the Actions tab
  - Final 'exit 1' step turns the workflow red after evidence
    capture

Informational gate per Decision 015: does NOT block in-flight
PRs. The 24-hour lag is acceptable because L2.5 core contract
smoke catches the headline regressions on every PR. Nightly is
for the long-tail acceptance stories.
Auto-closes PRs from non-maintainers with a templated comment
pointing to the issues-only contribution policy (Decision 014).

Exemption tiers (checked in order):
  1. Bots — dependabot, github-actions, renovate
  2. Repo collaborators with admin/maintain/write access
  3. Users listed in .github/MAINTAINERS (read via API, not checkout)

Non-exempt authors get:
  - A policy comment explaining why the PR is closed, with links
    to the issues-only policy, bug-report template, and SECURITY.md
  - The PR state set to 'closed'

SECURITY NOTE (critical): this workflow uses pull_request_target,
which runs in the BASE branch context with write permissions
(required to close PRs). It MUST NEVER check out the PR branch —
doing so is the 'pwn-request' attack class where untrusted PR
code runs with write tokens. The workflow only reads PR metadata
via the GitHub API and fetches MAINTAINERS from the BASE ref
(not the PR ref) so a PR can't alter its own allowlist.

Entire threat model documented inline at the top of the file so
future editors have a reason to pause before adding a checkout step.
actionlint caught the issue: GitHub Actions job IDs must start
with a letter or underscore and contain only alphanumerics, -,
or _. The '.' in 'smoke-l2.5' made it invalid — CI would have
rejected the workflow on first push.

Renamed in four places (kept in sync):
  - .github/workflows/ci.yml job ID
  - .github/workflows/ci.yml gates-passed needs list
  - .github/workflows/ci.yml GATE_LIST_START/END block
  - scripts/gates.sh GATES_FULL array
  - scripts/gates.sh smoke-l25 run_gate invocation

Verified: actionlint exits 0 on all workflows,
scripts/test-gate-parity.sh passes (13 gates match between
gates.sh and ci.yml).

The 'L2.5' name is a test-taxonomy reference (unit L1 /
component L2 / integration L2.5 / full E2E L3 etc.) — we
keep the documentation using 'L2.5' but the machine-readable
identifier drops the period. First lesson of the actionlint
gate: yes, we needed it.
Both files are generated by ./scripts/gates.sh full:
  - coverage.out      — go test -race -coverprofile=coverage.out
  - sbom.spdx.json    — syft scan dir:. -o spdx-json=sbom.spdx.json

They change on every run and should never be committed. Seen as
untracked after the first local 'gates.sh full' invocation on
feature/ci-msec.
Every 'uses:' across the 5 workflow files now references a
40-character commit SHA with an inline version comment. Dependabot
rotates these weekly per .github/dependabot.yml — the SHA pin
plus managed rotation is the recommended discipline for
security-adjacent repos (per Obsidian KB Decision 015).

Also bumped several actions past the plan's placeholder versions
to current stable — the plan was written against v4/v5/v6 refs,
but most of these are now at higher majors:

  actions/checkout                v4 → v6.0.2
  actions/setup-go                v5 → v6.4.0
  actions/upload-artifact         v4 → v7.0.0
  actions/github-script           v7 → v9.0.0
  actions/dependency-review       v4 → v4.9.0
  golangci/golangci-lint-action   v6 → v9.2.0
  codecov/codecov-action          v4 → v6.0.0
  securego/gosec                  master → v2.25.0
  ossf/scorecard-action           v2 → v2.4.3
  anchore/sbom-action             v0 → v0.24.0
  github/codeql-action            v3 → codeql-bundle-v2.25.1

SHAs resolved via `gh api repos/<owner>/<repo>/releases/latest`
and dereferenced through git/refs/tags and git/tags (for
annotated tags). securego/gosec@master was replaced with v2.25.0
— pinning @master was a documented temporary per the original
plan.

Verified: actionlint exits 0 on all 5 workflows post-pin.
test-gate-parity still passes (13 gates).
Resolves TD-VUL-001..004 (all 4 Go stdlib CVEs flagged by the
baseline govulncheck run):

  TD-VUL-001  GO-2026-4947   crypto/x509
  TD-VUL-002  GO-2026-4946   crypto/x509
  TD-VUL-003  GO-2026-4870   crypto/tls  (TLS 1.3 KeyUpdate DoS)
  TD-VUL-004  GO-2026-4601   net/url     (IPv6 host literal)

One-line bump to go.mod's toolchain directive. No dependency
changes (go.sum untouched by 'go mod tidy').

Landing this immediately before the first CI push so:
  - govulncheck gate on feature/ci-msec goes green from the first
    CI run instead of failing-then-fixing-then-passing
  - Dependabot's first rotation doesn't open a competing PR
    bumping the toolchain

Verification:
  go build ./cmd/broker ./cmd/aactl    — OK
  go test -short ./...                 — 15/15 packages PASS
  govulncheck ./...                    — 'No vulnerabilities found'

Also expected to resolve the 'go1.25.7 vs go1.25.4' compile error
seen on the unit-tests-race gate during local gates.sh full runs
(the standalone go1.25.7 binary and the go tool's embedded
version diverged). Awaiting race test result.
First CI run on PR #3 flagged two failures:

1. lint (exit 3): golangci-lint-action@v9 requires golangci-lint v2
   with a new config schema. Our .golangci.yml is v1 format. Pinned
   back to v6.5.2 (last major compatible with v1 configs) — when we
   migrate the config to v2, we can bump the action pin again.

2. dep-review (action step failure): actions/dependency-review-action
   requires GitHub Advanced Security on private repos, which this
   repo does not have. Removed the dep-review job with a comment
   explaining the re-enable conditions. Tracked as TD-VUL-005 in
   TECH-DEBT.md — revert the comment when the repo flips public
   (Phase 4) or GHAS is purchased.

Remaining security coverage is still strong: govulncheck (stdlib +
Go module CVEs), gosec (application-layer static analysis),
contamination grep (enterprise refs), CodeQL (SAST, separate
workflow). Only the license/metadata coverage from dep-review is
temporarily out.

Verified: actionlint exit 0. All 14 remaining gates (13 matrix +
analyze) expected to pass next run.
Second CI run still failed on lint even after pinning the action
back to v6.5.2. Root cause: `version: latest` in the job spec
tells golangci-lint-action v6 to install whatever 'latest' resolves
to, which is now golangci-lint v2.x. The v2 binary rejects our v1
config schema.

Fix: pin version to v1.64.8 — the exact version the local
developer gates run against (brew-installed golangci-lint on
macOS, matching the gates.sh reference install command). This is
now the only place outside gates.sh that references a golangci-lint
version; when we migrate the config to v2 we bump both together.

Verified: actionlint exit 0.
Both workflows require GitHub Code Scanning, which is a GHAS feature
on private repos. devonartis/agentauth is private without GHAS, so:
  - codeql-action/analyze fails uploading SARIF → Security tab
  - scorecard-action fails upload-sarif (same underlying endpoint)

Instead of deleting either workflow, both are parked on
`workflow_dispatch` only — the job logic, SHA pins, and comment
history are preserved so re-enabling is a one-line trigger swap when
the repo flips public (Phase 4 of release strategy).

The file header in each explains:
  - why it's disabled
  - the re-enable conditions
  - the tech debt cross-reference

TD-VUL-005/006 consolidated in TECH-DEBT.md with a single fix
sequence for all three GHAS-gated workflows (dep-review + codeql +
scorecard). Remaining security coverage while these are parked:
govulncheck, gosec, contamination — still blocking in ci.yml.

Verified: actionlint exit 0.
Third lint failure on PR #3 made the root cause clear: the
golangci/golangci-lint-action v6.x ships pre-built binaries compiled
against Go ≤1.23. Our go.mod has 'toolchain go1.25.9' (required to
fix the stdlib CVEs — TD-VUL-001..004). golangci-lint v1.64.8
built against 1.23 crashes with exit 3 on the SSA pass when it tries
to parse code compiled by 1.25.

Local developers don't see this because brew-installed golangci-lint
is built with whatever Go the homebrew bottle was compiled against
(currently 1.25.7 in Cellar).

Fix: drop the action entirely. 'go install golangci-lint@v1.64.8'
on the CI runner compiles the linter with the runner's Go — which
is whatever actions/setup-go@v6 resolved from go.mod (1.25.9). The
CI now matches local.

This is actually the approach golangci-lint's own docs recommend
when the pre-built action's Go version is behind: just install via
'go install' from the commit step. The action is a convenience
layer that falls over when your toolchain is ahead of its bundle.

Migration to golangci-lint v2 is still planned — v2's action
bundles a newer Go — but not in this cycle.
@devonartis devonartis merged commit e560301 into develop Apr 10, 2026
16 checks passed
@devonartis devonartis deleted the feature/ci-msec branch April 10, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant