Problem
When agents (Claude, Copilot, etc.) are asked to add tests or validation for a change, they default to patterns that already exist in the repo:
- A Cypress spec gets added to
cypress/e2e/content/ because that's where other specs live
- A Vale rule gets added because Vale is the existing lint layer
- A pre-commit hook gets added because Lefthook is already wired up
This produces local-only validation that runs on developer machines and blocks commits, but doesn't run as an independent PR gate. When the test is wrong, flaky, or skipped (--no-verify), regressions ship.
Recent example
PR for #7089 (Enterprise feedback button). The original fix targeted the wrong variable. A rendered-HTML visual check caught it — but only because a human spot-checked the PR preview. No automated gate existed that would have caught the same class of bug in CI.
Goals
Agents should design verification for:
- PR gates, not just local hooks — treat CI as the primary gate, local hooks as the early-warning layer. Local checks can be skipped (
--no-verify); CI cannot.
- Early-warning signals in the dev loop — local hooks should give fast feedback, but never be the sole line of defense.
- Continuous improvement — when a bug class is discovered, the agent should propose a test that prevents the bug class, not just the specific bug. Structural refactors (e.g., data-driven config replacing conditional logic) should be preferred when they make the bug impossible.
- Minimum-viable verification — pick the cheapest reliable test for the assertion. Static grep for static HTML. Cypress for interactive JS. Unit tests for pure functions. Don't reach for the heavy tool out of habit.
- Awareness of CI infrastructure — agents should know which CI system runs (
.github/workflows vs .circleci/config.yml), which jobs already exist, and which workflow is the right home for a new check. They should prefer extending an existing workflow over creating a new one.
Proposed skill additions
Create .claude/skills/ci-verification-design/SKILL.md covering:
Inventory step
Before adding any test, the agent must list existing CI workflows in .github/workflows/ and identify which ones run on PRs. Reference existing jobs as templates.
Test pyramid decision tree
- Does the behavior involve JS execution or user interaction? → Cypress/Playwright
- Does the behavior involve rendered HTML from static inputs? → Node script +
rg/grep on public/
- Does the behavior involve pure data transformations or validation? → Unit test or JSON Schema validation
- Does the behavior involve the build process itself? → Shell script invoked by the build job
CI gating checklist
- Does this run on every PR that touches the relevant files?
- Does it fail-fast with a clear error message?
- Is it annotated with
actions/github-script or equivalent for inline PR annotations?
- Does it have a local equivalent (Lefthook hook) for dev-loop speed?
- Is the CI job named clearly so reviewers understand what it protects?
Refactor-before-test principle
If the bug was caused by a conditional or string-munge pattern, propose a data-driven refactor first. Tests should protect against regression of the refactored architecture, not the broken original.
Proposed workflow additions
- Update
.github/copilot-instructions.md / CLAUDE.md: when a task adds verification logic, agents must document both local and CI enforcement paths in the PR description.
- New PR checklist item in
.github/pull_request_template.md: "Verification added (CI gate / local hook / manual check) — which?"
- Audit existing checks: identify which Lefthook-only checks could/should be promoted to PR gates (e.g., Vale runs in CI already; does prettier? does shellcheck? does the product config schema validation?)
Acceptance criteria
Related
Problem
When agents (Claude, Copilot, etc.) are asked to add tests or validation for a change, they default to patterns that already exist in the repo:
cypress/e2e/content/because that's where other specs liveThis produces local-only validation that runs on developer machines and blocks commits, but doesn't run as an independent PR gate. When the test is wrong, flaky, or skipped (
--no-verify), regressions ship.Recent example
PR for #7089 (Enterprise feedback button). The original fix targeted the wrong variable. A rendered-HTML visual check caught it — but only because a human spot-checked the PR preview. No automated gate existed that would have caught the same class of bug in CI.
Goals
Agents should design verification for:
--no-verify); CI cannot..github/workflowsvs.circleci/config.yml), which jobs already exist, and which workflow is the right home for a new check. They should prefer extending an existing workflow over creating a new one.Proposed skill additions
Create
.claude/skills/ci-verification-design/SKILL.mdcovering:Inventory step
Before adding any test, the agent must list existing CI workflows in
.github/workflows/and identify which ones run on PRs. Reference existing jobs as templates.Test pyramid decision tree
rg/greponpublic/CI gating checklist
actions/github-scriptor equivalent for inline PR annotations?Refactor-before-test principle
If the bug was caused by a conditional or string-munge pattern, propose a data-driven refactor first. Tests should protect against regression of the refactored architecture, not the broken original.
Proposed workflow additions
.github/copilot-instructions.md/CLAUDE.md: when a task adds verification logic, agents must document both local and CI enforcement paths in the PR description..github/pull_request_template.md: "Verification added (CI gate / local hook / manual check) — which?"Acceptance criteria
.claude/skills/ci-verification-design/SKILL.mdcreated with decision tree and inventory guidanceCLAUDE.md/AGENTS.mdreferences the new skillRelated