feat(ci): M-sec CI/build/gates v1 by devonartis · Pull Request #3 · devonartis/agentwrit

devonartis · 2026-04-10T08:56:32Z

Summary

Implements the M-sec CI/build/gates pipeline per Obsidian KB Decision 015 and the design doc at .plans/designs/2026-04-10-ci-build-gates-msec-design.md. Follows the 31-task plan at .plans/specs/2026-04-10-ci-build-gates-msec-plan.md.

What changes

Workflows (.github/workflows/)

ci.yml — 13 parallel gates + dep-review + changelog + gate-parity + gates-passed aggregator
codeql.yml — Go SAST with security-extended queries (PR + push + weekly)
scorecard.yml — OpenSSF supply-chain Scorecard (weekly + main)
nightly.yml — L4 full regression with auto-issue-on-failure
contribution-policy.yml — Decision 014 auto-close, pull_request_target with NO PR-branch checkout (documented threat model)

Local gate infrastructure

scripts/gates.sh — extended from 4 gates to 13, module → full, gosec flipped to blocking
scripts/smoke/core-contract.sh — L2.5 smoke (10-step issue/verify/revoke/deny flow with Ed25519 challenge-response)
scripts/test-gate-parity.sh — enforces gate list alignment between local and CI
.gosec.yml — documented G117/G304/G101 exclusions for a credential broker
.golangci.yml — security-aware linter set with mirrored gosec excludes

Governance

.github/dependabot.yml — github-actions, gomod, docker (weekly)
.github/CODEOWNERS + .github/MAINTAINERS — ownership + contribution-policy allowlist

Dependencies

go.mod toolchain go1.25.7 → go1.25.9 — resolves TD-VUL-001..004 (four stdlib CVEs: GO-2026-4947, -4946, -4870, -4601)

Code quality (discovered by new gates, fixed before first CI run)

internal/keystore/parseKey — defensive type assertion on priv.Public()
internal/mutauth/heartbeat.sweep — log auto-revoke failures instead of dropping
cmd/aactl/client — propagate json.Marshal/io.ReadAll errors
internal/store/sql_store — documented #nosec G202 on audit SELECT
gofmt normalization across 24 drifted files

What this is NOT

No release automation, GHCR publish, SLSA provenance (later cycle)
No pre-commit hook updates (separate small cycle)
No README badges (Task 30, post-merge)

Local verification (on feature/ci-msec)

go build ./cmd/broker ./cmd/aactl — OK
go test -short -count=1 ./... — 15/15 packages PASS
go test -race -count=1 ./... — 15/15 packages PASS
golangci-lint run ./... — clean
gosec -conf .gosec.yml -exclude=G117,G304,G101 -severity=medium ./... — 0 findings
govulncheck ./... — "No vulnerabilities found"
actionlint .github/workflows/*.yml — clean
./scripts/test-gate-parity.sh — 13 gates match
Live Docker broker + ./scripts/smoke/core-contract.sh — 10/10 PASS (issue → verify → revoke → reject → out-of-scope denied)

PR tasks

First CI run completes (this may iterate — SHA-pinned action majors are newer than what the plan assumed)
Merge this PR to develop
Configure branch protection on develop (Task 27)
Merge develop → main via scripts/strip_for_main.sh
Configure branch protection on main (Task 29)
README badges (Task 30)
7-day observation: first Dependabot PR, first nightly run

Rationale

Decision 015 lays out why M-sec (not generic M) is the right scope for a credential broker, and why CI must exist before the rebrand PR lands. This PR is the safety net that will catch regressions during and after the rebrand.

Obsidian KB Decision 015 — CI/Gates Strategy — Security-First, Rebrand-Resilient
Obsidian KB Decision 014 — No External Contributions (enforced by contribution-policy.yml)
.plans/designs/2026-04-10-ci-build-gates-msec-design.md
.plans/specs/2026-04-10-ci-build-gates-msec-plan.md

Catches up FLOW.md + MEMORY.md with the prior 2026-04-10 session notes (ADR vs Decision split, obsidian:decision skill build, branch cleanup, Decision 014 no-external-contributions) that had been sitting uncommitted on develop. Adds the M-sec CI/build/gates v1 design doc from the current brainstorm cycle: .plans/designs/2026-04-10-ci-build-gates-msec-design.md The design doc is the implementation-level companion to Obsidian KB Decision 015 (CI/Gates Strategy — Security-First, Rebrand- Resilient). It covers: - Five workflow files: ci.yml, codeql.yml, scorecard.yml, nightly.yml, contribution-policy.yml - gates.sh extension: add contamination, govulncheck, docker-build, smoke-l2.5, sbom; flip gosec from warn to blocking - scripts/smoke/core-contract.sh — L2.5 core contract smoke (issue/verify/revoke/deny out-of-scope) - Pinned action SHAs + Dependabot maintenance - Rollout plan with branch-protection sequencing Deferred to later cycles: release automation, pre-commit hooks, coverage gating, matrix builds.

Adds the task-level implementation plan for the M-sec CI pipeline: .plans/specs/2026-04-10-ci-build-gates-msec-plan.md 31 tasks across 4 phases: - Phase A (Tasks 1-11): Local infrastructure — .gosec.yml, .golangci.yml, scripts/gates.sh extension, L2.5 smoke script, parity test, local verification - Phase B (Tasks 12-20): GitHub Actions workflows — dependabot config, CODEOWNERS/MAINTAINERS, ci.yml, codeql.yml, scorecard.yml, nightly.yml, contribution-policy.yml - Phase C (Tasks 21-25): actionlint, pin action SHAs, push feature branch, iterate on CI-only issues, open PR - Phase D (Tasks 26-31): merge, branch protection on develop, merge to main, branch protection on main, README badges, observation period Each task has exact file paths, complete code/YAML, verification commands, and commit messages. No placeholders. Updates FLOW.md with the current-session block capturing: - Decision 015 reference (Obsidian KB) - Design + plan artifact paths - Where to start next session (read order, first action, tool guidance) - Explicit instructions to not re-brainstorm and not run devflow ceremony steps that don't apply to infrastructure cycles

Previous entry was ~70 lines of MEMORY.md-shaped prose (reasoning, trade-offs, next-session instructions). FLOW.md is decisions and actions only — the context lives in Decision 015, the design doc, and the implementation plan. Now 3 lines of content: decision, action (with paths to design + plan), status pointing at Task 1 of the plan.

JSON format (gosec does not support YAML). Three globally-excluded rules with documented rationale for why each is product-incompatible for a credential broker: G117 (broker API returns tokens by design), G304 (all file paths come from operator config, not user input), G101 (every domain identifier trips the credential-name heuristic). Severity gate: MEDIUM and HIGH block; LOW is advisory. Scope of exclusions reviewed against code — zero MEDIUM/HIGH findings remain across 48 files / 7393 lines.

Pre-existing formatting drift — trailing whitespace, blank-line spacing after doc comments, align-after-annotate in struct literals. Surfaced by adding gofmt as a blocking gate (M-sec). Zero behavior change: every edit is a gofmt -w rewrite of files that were already compiling and passing tests. Verified: - go build ./cmd/broker ./cmd/aactl: OK - go test -short ./...: 15/15 packages PASS

Before: priv.Public().(ed25519.PublicKey) — unchecked type assertion, would panic if the stdlib contract ever broke (though it never has). After: comma-ok form with an explicit error return. Unreachable in practice — ed25519.PrivateKey.Public() is documented to return ed25519.PublicKey — but the guard satisfies errcheck's check-type-assertions rule and makes the invariant explicit for future readers. Scope: internal/keystore/parseKey (called at broker startup when loading the persistent signing key). No behavior change on the happy path. Keystore unit tests: PASS.

Before: sweep() called h.revSvc.Revoke("agent", id) with the return assigned to blank, then unconditionally logged 'agent auto-revoked' — even when the revocation actually failed. A persistent store error would silently leave a missed-heartbeat agent marked as revoked in logs while its tokens were still valid. After: check the error; on failure log 'agent auto-revoke failed' with err detail; on success log the existing auto-revoked message. Agent stays tracked on failure so the next sweep retries. Also fixes a misspell flagged by golangci-lint/misspell in the doc comment ('cancelled' → 'canceled', Go convention). Scope: internal/mutauth/heartbeat.go sweep(). Background goroutine started by StartMonitor. Unit tests for mutauth PASS.

Three errcheck findings in cmd/aactl/client.go flagged by golangci-lint: - authenticate(): json.Marshal of the auth request body discarded its error (practically unreachable for a map[string]string, but the pattern sets a bad example elsewhere in the codebase). - authenticate(): io.ReadAll of the failure-response body discarded its error, so a truncated body on a 5xx would produce a misleading 'auth failed (HTTP 500): ' with no body context. - doPostWithToken(): io.ReadAll of the response body discarded its error — callers would see a success status with an empty body and no indication the read had failed. All three now return wrapped errors. aactl is the operator-facing CLI so loud failures are always preferable to silent truncation. Unit tests: cmd/aactl PASS.

internal/store/sql_store.go QueryAuditEvents() concatenates fragments into selectQ to assemble the optional WHERE clause. gosec G202 flags the concatenation as potential SQL injection. This is a documented false positive: the `where` value is built entirely from fixed template strings (see whereClauses above), and every user-supplied value becomes a bound `?` parameter in queryArgs. No untrusted text enters the SQL string itself. Added an inline `#nosec G202` comment with the explicit rationale so reviewers don't have to re-derive the proof each time. Also contains gofmt struct-field-alignment fixes for ErrAgent/AppNotFound, AgentRecord.Scope, and AppRecord that were missed in the prior gofmt commit (gofmt-only, no behavior change). Store tests: PASS.

govet's unusedwrite analyzer was flagging every field-write in the LaunchTokenRecord literal inside TestLaunchTokenRecord_SpecCompliance because only rec.ConsumedAt is ever read. The flag was technically correct — and entirely missing the test's purpose. The test exists precisely so that an exhaustive field literal will fail to compile if any field is renamed or removed from store.LaunchTokenRecord. That's the contract: upstream refactor breaks this test, which is the early-warning signal. Changes: - Expanded the doc comment to state the contract explicitly so a future reader (or linter-driven refactor) doesn't 'simplify' it. - Added `_ = rec` with a comment explaining it silences unusedwrite on purpose. This is the idiomatic way to tell govet 'I know, it's intentional.' - Incidental gofmt fix on testSecret+"..." string concatenation that was drifted. Admin tests: PASS.

Security-aware linter set: errcheck, gosec, govet, ineffassign, staticcheck, unused, gosimple, bodyclose, misspell, gofmt, goimports. Tuning decisions documented inline: - govet fieldalignment DISABLED: stylistic (struct memory layout), not a correctness class. Would force churn across every public DTO for sub-word savings that don't matter in an HTTP broker. - govet shadow DISABLED: triggers on idiomatic nested `if err := ...` inside functions that already have an outer err. Not a bug class we've seen. - gosec excludes: G117/G304/G101 mirrored from .gosec.yml (same rationale — broker API returns tokens, paths come from operator config, domain identifiers trip credential-name heuristic). - gosec + errcheck suppressed in _test.go: weak-random and unchecked-setup-helper patterns are standard test-code practice. First clean run: `golangci-lint run ./...` exits 0 (after the lint-fix commits earlier in this branch). Build and unit tests remain green (15/15 packages PASS).

First run of govulncheck on the M-sec baseline flagged 4 vulnerabilities, all in the Go standard library, all fixable by bumping go.mod's `toolchain` directive from go1.25.7 to go1.25.9: TD-VUL-001 GO-2026-4947 crypto/x509 TD-VUL-002 GO-2026-4946 crypto/x509 TD-VUL-003 GO-2026-4870 crypto/tls (TLS 1.3 KeyUpdate DoS) TD-VUL-004 GO-2026-4601 net/url (IPv6 host literal parsing) Not fixing now: the branch is pre-merge, the fix is a one-line bump, and doing it at Task 23 (right before the first push to origin) avoids Dependabot opening a competing toolchain bump PR during rollout. Consequence: local `./scripts/gates.sh full` will show govulncheck as RED on this branch until Task 23. CI will be clean from the first push because Task 23 lands before then.

New blocking gates (task mode): - contamination grep (enterprise refs in core) - govulncheck (stdlib/dep vulnerabilities) - go-mod-verify (tidy drift + module integrity) - format (gofmt -l empty) - vet (previously only in lint fallback) Full mode adds: - unit-tests-race (go test -race) - docker-build - smoke-l2.5 (scripts/smoke/core-contract.sh — added in next task) - sbom (syft spdx-json) Policy changes: - gosec flipped from warn_gate to run_gate (BLOCKING) per Decision 015 - golangci-lint and gosec are now required — no fallback, fail-fast if the operator hasn't installed them - GOSEC_EXCLUDE=G117,G304,G101 kept in one variable so ci.yml, this script, and .golangci.yml all reference the same documented list - --list-gates flag added for scripts/test-gate-parity.sh (next task) - 'module' renamed to 'full'; 'module' retained as deprecated alias (prints a stderr NOTE) so muscle-memory still works - Dead references to live_test.sh / live_test_docker.sh removed — both scripts no longer exist Verification: `./scripts/gates.sh task` on this commit: 8 PASS, 1 FAIL (govulncheck — TD-VUL-001..004, scheduled for Task 23). Build + unit tests + lint + format + contamination + gosec all green.

10-step smoke verifying the credential broker's core contract: 1. /v1/health 200 2. admin auth (POST /v1/admin/auth) 3. launch token creation (POST /v1/admin/launch-tokens) 4. challenge nonce fetch (GET /v1/challenge) 5. agent register via Ed25519 challenge-response (POST /v1/register) 6. JWT structure check (alg=EdDSA, kid, exp>iat, jti) 7. /v1/token/validate accepts (valid=true) 8. /v1/revoke level=agent 9. /v1/token/validate rejects after revoke (valid=false) 10. out-of-scope requested_scope on register → 4xx (enforcement) Deviation from the plan's original draft: - Real API shapes — the plan had placeholder endpoints (/v1/agent/verify, revoke-by-jti). Actual broker uses /v1/token/validate and /v1/revoke {level, target}. Cross-checked against cmd/broker/main.go route table and internal/handler/revoke_hdl.go request DTO. - Challenge-response is real crypto: the broker's /v1/register requires launch_token + nonce + Ed25519 public_key + signature(nonce). Pure bash can't do Ed25519, so we use python3 + cryptography (same pattern as tests/sec-l2b/integration.sh — already an established dependency). - jq '.valid // empty' gotcha: jq's // operator treats `false` as empty, so .valid is extracted without // empty. Learned from step 9 failing on the first run against a live broker. Called by: - scripts/gates.sh full (after broker is up via scripts/stack_up.sh) - .github/workflows/ci.yml smoke-l2.5 job (Task 14) Determinism notes: fresh Ed25519 keys and fresh nonces per run — this is unavoidable for challenge-response. The contract check (what the script verifies) is deterministic; only the wire values are not. Verified: ran against a live Dockerized broker on localhost:8090: 10/10 PASS (agent registered, revoked, rejected, OOS denied 403).

Reads gate IDs from two sources and fails if they diverge: A. scripts/gates.sh --list-gates (local source of truth) B. .github/workflows/ci.yml GATE_LIST_START/END comment block (CI source of truth) Prevents silent drift: a developer adding a gate locally but forgetting ci.yml will see this script fail and know to update both. Conversely, a CI-only gate addition forces a gates.sh update. Runs as its own gate both locally (in 'full' mode once ci.yml exists) and in CI (as the gate-parity job in ci.yml — Task 14). Currently exits 1 because .github/workflows/ci.yml doesn't exist yet — will be created in Task 14 and immediately exercise this script for the first time.

syft 1.x renamed the 'packages' subcommand to 'scan'. Running 'syft packages' still works but prints a deprecation warning to stderr, which is noise in CI output. Verified: syft scan dir:. -o spdx-json=sbom.spdx.json --quiet produces an identical 27-package SPDX-2.3 SBOM to the old command. Also affects the anchore/sbom-action used in ci.yml (Task 14 will pin the action version that defaults to 'scan').

Covers tasks 1-9 of the M-sec plan: all local infrastructure changes from feature/ci-msec — the five new/modified configs (.gosec.yml, .golangci.yml, gates.sh, core-contract.sh, test-gate-parity.sh), the six lint-fix commits (keystore, heartbeat, aactl client, sql_store nosec, admin test doc, gofmt normalize), and the tech debt tracker entry for the stdlib CVE baseline. Phase B (GitHub Actions workflows) will add its own entry when those files land.

Three ecosystems, weekly (Monday 06:00 UTC): - github-actions: rotates SHA-pinned workflow steps (Task 22 pins every action to a 40-char SHA; without Dependabot those stale) - gomod: direct and indirect Go module updates, grouped so PRs are reviewable as 'direct deps bump' vs 'indirect deps bump' - docker: Dockerfile base image updates, kept ungrouped with a lower PR limit because base image bumps often need individual testing All PRs get the 'dependencies' label plus an ecosystem-specific label for filtering. Commit prefix 'chore(deps)' matches the rest of the repo's conventional-commits style.

@devonartis

CODEOWNERS: Global wildcard pointing at @devonartis. Primarily serves as documentation and as the review-required set for branch protection (Task 27/29). Per Decision 014, external contributions aren't accepted, so CODEOWNERS is not a gatekeeping mechanism for PRs from outside — that job belongs to contribution-policy.yml. MAINTAINERS: Allowlist consumed by .github/workflows/contribution-policy.yml (Task 18). Workflow reads this file via the GitHub API (not via checkout) and exempts listed users from the auto-close policy. Anyone not in this file, not a bot, and not a repo collaborator with write access gets their PR closed with a templated comment pointing to the issues-only policy.

@v

Parallel per-gate jobs (GATE_LIST_START/END matches gates.sh): build, vet, lint, format, contamination, unit-tests, unit-tests-race, gosec, govulncheck, go-mod-verify, docker-build, smoke-l2.5, sbom. PR-only jobs: dep-review — blocks on 'moderate' severity dep CVEs changelog — requires CHANGELOG.md diff, skippable via label Always-on: gate-parity — runs scripts/test-gate-parity.sh gates-passed — aggregator job that branch protection will gate on; survives individual gate renames smoke-l2.5 job details: Depends on docker-build so the image exists when the smoke runs. Installs python3 cryptography (required by the L2.5 script for the Ed25519 challenge-response). Runs scripts/stack_up.sh with the known test fixture AA_ADMIN_SECRET, waits up to 30s for /v1/health, runs scripts/smoke/core-contract.sh, tears down with stack_down.sh in always() so broker doesn't linger on failures. gosec job: Uses securego/gosec@master with the documented exclusions (G117,G304,G101) and severity=medium — matches scripts/gates.sh. Triggers: pull_request and push to develop/main. Concurrency: cancel-in-progress per ref so superseded runs don't pile up on a busy branch. All action refs use tags at plan time; Task 22 replaces every @v<N> with a 40-char SHA + version comment before first push.

Runs on PR and push to develop/main, plus weekly scheduled scan (Monday 07:31 UTC — off-peak, off-round-minute so we don't join the thundering herd on :00). Query suites: security-extended + security-and-quality. These are stricter than the default 'security' suite but appropriate for a security product. Results populate the repo's Security tab (via security-events write permission) and the CodeQL badge (added in Task 30). Will be listed as a required status check on branch protection alongside gates-passed.

Runs on push to main, weekly schedule (Tuesday 03:25 UTC — staggered from CodeQL to avoid compounding load), and when branch protection rules change. Publishes results to: - OpenSSF Scorecard badge (added in Task 30) - SARIF uploaded to the repo's Security tab - 5-day artifact retention for audit trail persist-credentials: false on checkout so the workflow can't accidentally push. publish_results: true is required for the public badge to update. Informational only — NOT a required check. Scorecard's signal value shows up once the repo flips public, because it scans branch protection, code review practices, and publishing hygiene from an outside-in perspective.

Runs ./scripts/gates.sh regression nightly at 05:17 UTC against develop (off-peak, off-round-minute). Also triggerable via workflow_dispatch for ad-hoc catches. Failure handling: - continue-on-error on the test step so we can upload evidence before failing the workflow - tests/**/evidence/ uploaded as 14-day artifact - actions/github-script opens an issue tagged 'regression/nightly/needs-triage' with commit, branch, run URL, and a triage checklist — maintainers see it without watching the Actions tab - Final 'exit 1' step turns the workflow red after evidence capture Informational gate per Decision 015: does NOT block in-flight PRs. The 24-hour lag is acceptable because L2.5 core contract smoke catches the headline regressions on every PR. Nightly is for the long-tail acceptance stories.

Auto-closes PRs from non-maintainers with a templated comment pointing to the issues-only contribution policy (Decision 014). Exemption tiers (checked in order): 1. Bots — dependabot, github-actions, renovate 2. Repo collaborators with admin/maintain/write access 3. Users listed in .github/MAINTAINERS (read via API, not checkout) Non-exempt authors get: - A policy comment explaining why the PR is closed, with links to the issues-only policy, bug-report template, and SECURITY.md - The PR state set to 'closed' SECURITY NOTE (critical): this workflow uses pull_request_target, which runs in the BASE branch context with write permissions (required to close PRs). It MUST NEVER check out the PR branch — doing so is the 'pwn-request' attack class where untrusted PR code runs with write tokens. The workflow only reads PR metadata via the GitHub API and fetches MAINTAINERS from the BASE ref (not the PR ref) so a PR can't alter its own allowlist. Entire threat model documented inline at the top of the file so future editors have a reason to pause before adding a checkout step.

actionlint caught the issue: GitHub Actions job IDs must start with a letter or underscore and contain only alphanumerics, -, or _. The '.' in 'smoke-l2.5' made it invalid — CI would have rejected the workflow on first push. Renamed in four places (kept in sync): - .github/workflows/ci.yml job ID - .github/workflows/ci.yml gates-passed needs list - .github/workflows/ci.yml GATE_LIST_START/END block - scripts/gates.sh GATES_FULL array - scripts/gates.sh smoke-l25 run_gate invocation Verified: actionlint exits 0 on all workflows, scripts/test-gate-parity.sh passes (13 gates match between gates.sh and ci.yml). The 'L2.5' name is a test-taxonomy reference (unit L1 / component L2 / integration L2.5 / full E2E L3 etc.) — we keep the documentation using 'L2.5' but the machine-readable identifier drops the period. First lesson of the actionlint gate: yes, we needed it.

Both files are generated by ./scripts/gates.sh full: - coverage.out — go test -race -coverprofile=coverage.out - sbom.spdx.json — syft scan dir:. -o spdx-json=sbom.spdx.json They change on every run and should never be committed. Seen as untracked after the first local 'gates.sh full' invocation on feature/ci-msec.

@master

Every 'uses:' across the 5 workflow files now references a 40-character commit SHA with an inline version comment. Dependabot rotates these weekly per .github/dependabot.yml — the SHA pin plus managed rotation is the recommended discipline for security-adjacent repos (per Obsidian KB Decision 015). Also bumped several actions past the plan's placeholder versions to current stable — the plan was written against v4/v5/v6 refs, but most of these are now at higher majors: actions/checkout v4 → v6.0.2 actions/setup-go v5 → v6.4.0 actions/upload-artifact v4 → v7.0.0 actions/github-script v7 → v9.0.0 actions/dependency-review v4 → v4.9.0 golangci/golangci-lint-action v6 → v9.2.0 codecov/codecov-action v4 → v6.0.0 securego/gosec master → v2.25.0 ossf/scorecard-action v2 → v2.4.3 anchore/sbom-action v0 → v0.24.0 github/codeql-action v3 → codeql-bundle-v2.25.1 SHAs resolved via `gh api repos/<owner>/<repo>/releases/latest` and dereferenced through git/refs/tags and git/tags (for annotated tags). securego/gosec@master was replaced with v2.25.0 — pinning @master was a documented temporary per the original plan. Verified: actionlint exits 0 on all 5 workflows post-pin. test-gate-parity still passes (13 gates).

Resolves TD-VUL-001..004 (all 4 Go stdlib CVEs flagged by the baseline govulncheck run): TD-VUL-001 GO-2026-4947 crypto/x509 TD-VUL-002 GO-2026-4946 crypto/x509 TD-VUL-003 GO-2026-4870 crypto/tls (TLS 1.3 KeyUpdate DoS) TD-VUL-004 GO-2026-4601 net/url (IPv6 host literal) One-line bump to go.mod's toolchain directive. No dependency changes (go.sum untouched by 'go mod tidy'). Landing this immediately before the first CI push so: - govulncheck gate on feature/ci-msec goes green from the first CI run instead of failing-then-fixing-then-passing - Dependabot's first rotation doesn't open a competing PR bumping the toolchain Verification: go build ./cmd/broker ./cmd/aactl — OK go test -short ./... — 15/15 packages PASS govulncheck ./... — 'No vulnerabilities found' Also expected to resolve the 'go1.25.7 vs go1.25.4' compile error seen on the unit-tests-race gate during local gates.sh full runs (the standalone go1.25.7 binary and the go tool's embedded version diverged). Awaiting race test result.

First CI run on PR #3 flagged two failures: 1. lint (exit 3): golangci-lint-action@v9 requires golangci-lint v2 with a new config schema. Our .golangci.yml is v1 format. Pinned back to v6.5.2 (last major compatible with v1 configs) — when we migrate the config to v2, we can bump the action pin again. 2. dep-review (action step failure): actions/dependency-review-action requires GitHub Advanced Security on private repos, which this repo does not have. Removed the dep-review job with a comment explaining the re-enable conditions. Tracked as TD-VUL-005 in TECH-DEBT.md — revert the comment when the repo flips public (Phase 4) or GHAS is purchased. Remaining security coverage is still strong: govulncheck (stdlib + Go module CVEs), gosec (application-layer static analysis), contamination grep (enterprise refs), CodeQL (SAST, separate workflow). Only the license/metadata coverage from dep-review is temporarily out. Verified: actionlint exit 0. All 14 remaining gates (13 matrix + analyze) expected to pass next run.

Second CI run still failed on lint even after pinning the action back to v6.5.2. Root cause: `version: latest` in the job spec tells golangci-lint-action v6 to install whatever 'latest' resolves to, which is now golangci-lint v2.x. The v2 binary rejects our v1 config schema. Fix: pin version to v1.64.8 — the exact version the local developer gates run against (brew-installed golangci-lint on macOS, matching the gates.sh reference install command). This is now the only place outside gates.sh that references a golangci-lint version; when we migrate the config to v2 we bump both together. Verified: actionlint exit 0.

Both workflows require GitHub Code Scanning, which is a GHAS feature on private repos. devonartis/agentauth is private without GHAS, so: - codeql-action/analyze fails uploading SARIF → Security tab - scorecard-action fails upload-sarif (same underlying endpoint) Instead of deleting either workflow, both are parked on `workflow_dispatch` only — the job logic, SHA pins, and comment history are preserved so re-enabling is a one-line trigger swap when the repo flips public (Phase 4 of release strategy). The file header in each explains: - why it's disabled - the re-enable conditions - the tech debt cross-reference TD-VUL-005/006 consolidated in TECH-DEBT.md with a single fix sequence for all three GHAS-gated workflows (dep-review + codeql + scorecard). Remaining security coverage while these are parked: govulncheck, gosec, contamination — still blocking in ci.yml. Verified: actionlint exit 0.

Third lint failure on PR #3 made the root cause clear: the golangci/golangci-lint-action v6.x ships pre-built binaries compiled against Go ≤1.23. Our go.mod has 'toolchain go1.25.9' (required to fix the stdlib CVEs — TD-VUL-001..004). golangci-lint v1.64.8 built against 1.23 crashes with exit 3 on the SSA pass when it tries to parse code compiled by 1.25. Local developers don't see this because brew-installed golangci-lint is built with whatever Go the homebrew bottle was compiled against (currently 1.25.7 in Cellar). Fix: drop the action entirely. 'go install golangci-lint@v1.64.8' on the CI runner compiles the linter with the runner's Go — which is whatever actions/setup-go@v6 resolved from go.mod (1.25.9). The CI now matches local. This is actually the approach golangci-lint's own docs recommend when the pre-built action's Go version is behind: just install via 'go install' from the commit step. The action is a convenience layer that falls over when your toolchain is ahead of its bundle. Migration to golangci-lint v2 is still planned — v2's action bundles a newer Go — but not in this cycle.

devonartis added 30 commits April 10, 2026 03:51

devonartis added 2 commits April 10, 2026 05:08

devonartis merged commit e560301 into develop Apr 10, 2026
16 checks passed

devonartis deleted the feature/ci-msec branch April 10, 2026 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): M-sec CI/build/gates v1#3

feat(ci): M-sec CI/build/gates v1#3
devonartis merged 32 commits into
developfrom
feature/ci-msec

devonartis commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devonartis commented Apr 10, 2026

Summary

What changes

Local verification (on feature/ci-msec)

PR tasks

Rationale

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant