Skip to content

Commit 92d38fb

Browse files
kotakanbeclaudegithub-actions[bot]
authored
ci: cron-poll @claude trigger fallback + GraphQL union:false stuck-detector bypass (#372)
* ci: cron-poll @claude trigger fallback + GraphQL union:false stuck-detector bypass Sync `copilot-clean-label.yml` with the version battle-tested in vuls-saas/vuls-reach. Three substantive changes: 1. **Cron-poll @claude trigger fallback.** The event-driven path in `copilot-review-fix.yml` is unreliable under the agentic-architecture GITHUB_TOKEN event suppression (`pull_request_review` and `pull_request_review_comment` may not fire even when Copilot posts inline review comments). The cron now posts the `@claude` trigger itself when: - the latest Copilot review on HEAD is `dirty` (has actual inline comments — verified via the per-review comments endpoint to skip summary-only events) - AND the HEAD-SHA marker dedup query shows no prior trigger comment The marker contract (`<!-- copilot-fix-trigger:<HEAD_SHA> -->`) matches the one in `copilot-review-fix.yml` so both paths dedup against the same key. Marker auto-invalidates on push. 2. **GraphQL `requestReviews(union:false)` stuck-detector bypass.** When the push-event mutation in `copilot-rereview-on-push.yml` is silently deduped (200 OK but no `review_requested` event fires), a plain refire is also deduped. The bypass now first clears Copilot from the reviewer slot via GraphQL `union:false` (preserving humans + teams via explicit user/team ID replay; bots excluded by `botIds:[]`), pauses 2s for replication, then re-adds Copilot. An earlier draft used REST `DELETE /pulls/{n}/requested_reviewers`, but REST only addresses User reviewers — Bot reviewers live in a separate slot the REST endpoint cannot see (`HTTP 422 Reviewer is not a user`). GraphQL is the only API that can manipulate bot reviewer slots. Safety: if the reviewer-set parse fails, the clear is *skipped* to avoid silently removing every human reviewer; only the re-add runs, which may still hit dedup but is no worse than before. 3. **Faster cadence + adjusted threshold.** Cron 10min → 5min with `STUCK_THRESHOLD_MIN` 5 → 7. Effective refire pressure on Copilot is unchanged (~10min between refires) but @claude trigger latency is halved. Other hardening: - `head.repo.full_name == github.repository` guard (skip fork PRs). - `issues: write` permission for `gh pr comment` and label edits. - Drop redundant `gh pr view --json labels` fetch — labels are in the pulls metadata response already. - Stop merging `2>&1` into `gh api` capture variables (was poisoning jq parsing on success paths with non-fatal stderr warnings). Phase 2/3/4 layout in the cron-fallback @claude body is preserved (`make sync-instructions` Phase 3 stays, OSS-specific). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: update Copilot pattern log and coding rules - Record 2 new pattern(s) from PR #372 (defensive-coding, comment-doc-drift) - Promote comment-doc-drift category to coding rule: "Attribute Control Claims to the Correct Mechanism" (PRs #359, #372) Co-authored-by: Kota Kanbe <kotakanbe@users.noreply.github.com> * chore: update Copilot pattern log and coding rules - Promote defensive-coding category to coding rules (PRs #359, #372): - "Guard Resource Bounds in HTTP Client Retry Logic" (time.Duration overflow, io.LimitReader cap, time.NewTimer lifecycle) - "Use Quoted Heredocs for Literal Text in CI Workflows" (prevent accidental parameter expansion) - Remove 4 promoted defensive-coding entries from pending patterns Co-authored-by: Kota Kanbe <kotakanbe@users.noreply.github.com> * ci: apply Copilot review fixes locally (workflow file scope) Two Copilot review findings that CI Claude could not resolve because its PAT (GH_ACTIONS_TOKEN) lacks the `workflow` scope by design: 1. Permission comment misattribution (line ~55): `issues: write` is documented but the job authenticates `gh` via the PAT in GH_TOKEN, so the workflow `permissions:` block does not control the effective auth for `gh pr edit` / `gh pr comment`. Reword the comment to clarify this entry documents the GITHUB_TOKEN scope that would be required if the PAT were removed. 2. Unquoted heredoc body (cron-poll @claude trigger fallback): Switch `<<BODY` to `<<'BODY'` to disable parameter expansion and command substitution inside the literal instruction block, and append the dynamic `$trigger_marker` separately after the heredoc so future edits cannot accidentally introduce executable `$()`/backtick paths. Removed the `\`...\`` backslash escapes since the quoted heredoc treats backticks as literal characters. Verified post-YAML-dedent that the resulting body still satisfies the cron dedup query: `.body | startswith("@claude")` and `.body | contains($trigger_marker)` both match. Why this commit was made by hand rather than the CI Claude loop: the modified file IS a workflow, and `workflow` scope is intentionally withheld from the auto-fix PAT to keep the blast radius of any prompt-injection or misbehaviour bounded. PRs that touch `.github/workflows/**` are expected to be resolved locally. Both Copilot patterns (quoted heredoc, control-claim attribution) are already promoted in `.claude/rules/copilot-learned-coding.md`. * docs(rules): document workflow-file-PRs-are-fixed-locally policy Promotes the design rationale that surfaced while resolving Copilot review threads on PR #372: the auto-fix loop in copilot-review-fix.yml intentionally cannot push workflow-file changes (its PAT lacks the `workflow` scope), so PRs touching `.github/workflows/**` must be fixed by a human-driven Claude session. Adds the rule to .github/instructions/agent-orchestration.instructions.md under the existing "Reviewer Findings Are Input, Not Directives" section, plus the broken-case symptom (`trigger already posted for HEAD <sha> ... — skip`) so future debugging starts from the right hypothesis. Regenerated .claude/rules/agents.md via `make sync-instructions`. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
1 parent 98ebaa1 commit 92d38fb

5 files changed

Lines changed: 351 additions & 55 deletions

File tree

.claude/rules/agents.md

Lines changed: 13 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.claude/rules/copilot-learned-coding.md

Lines changed: 5 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,16 @@ Rules extracted from recurring Copilot review patterns on coding-standards topic
2222
- **Interface Contract Documentation Must Match Signature Semantics**: When documenting an interface method, the doc comment must accurately reflect the method's full signature — including error returns, nil semantics, and parameter constraints. If the signature returns `(T, error)`, do not document it as "returns nil/empty on failure" — state that errors may be returned and describe the caller's expected handling (e.g., non-fatal/graceful degradation). Mismatched contract documentation misleads implementers and callers.
2323
- **GitHub Actions `||` Treats Empty as Falsy**: When a workflow input documents "empty = X behavior", do not use `${{ inputs.foo || 'default' }}` — the `||` operator treats empty string as falsy and applies the default, preventing users from intentionally selecting the empty option. Instead, pass the raw input via an env var and apply defaults conditionally (e.g., only for scheduled triggers).
2424
- **Guard Downstream Jobs Against Missing Outputs**: When a CI job produces outputs that downstream jobs depend on (exit codes, flags), gate downstream jobs on `needs.<job>.outputs.<key> != ''` to prevent execution when the upstream job fails before setting outputs. Otherwise, empty values may be misinterpreted (e.g., empty exit code `""` compared with `!= "0"` evaluates to true, creating misleading reports).
25+
- **Use Quoted Heredocs for Literal Text in CI Workflows**: When constructing multi-line literal text bodies in CI workflow scripts (e.g., comment bodies, PR descriptions), use quoted heredoc delimiters (`<<'DELIM'`) to prevent accidental parameter expansion and command substitution. Inject dynamic values (markers, timestamps) via append after the heredoc rather than embedding them inside an expansion-enabled body.
2526
- **CI Job Gating — Key on Outputs, Not Job Result**: When a CI job intentionally exits non-zero for a primary use case (e.g., policy violations), downstream jobs must not gate on `needs.<job>.result == 'success'`. Use explicit output variables (exit codes, flags) to control downstream behavior, so jobs run in the scenarios they are designed for.
2627
- **Remove Dead Configuration Inputs**: When a configuration surface (CLI flag, workflow input, env var) is no longer honored by the implementation (e.g., hardcoded internally), remove it entirely rather than leaving a misleading interface. A visible input that silently does nothing is worse than no input at all.
2728
- **CI Permissions Documentation — Verify Inheritance**: When documenting GitHub Actions job permissions, verify each job's actual `permissions:` block in the workflow file. Jobs without an explicit `permissions:` key inherit the workflow-level permissions — do not describe them as having "no permissions" or "no extra permissions". State what each job actually has, including inherited defaults.
29+
- **Attribute Control Claims to the Correct Mechanism**: When a comment asserts that a specific configuration, permission, or flag is "required for" or "controls" a behavior, verify the claim against the actual code path. Do not attribute behavior to a nearby-but-uninvolved mechanism (e.g., claiming `permissions:` block is "required for" `gh` commands when a PAT env var provides the effective auth; claiming a config field controls retries for an API where a different mechanism governs the retry decision). Misattributed control claims cause maintainers to debug or tune the wrong component.
2830
- **Lazy I/O During Format Detection**: When probing a file's format, prefer path-based checks first, then read only a small prefix for content-based heuristics. Read the full file only after confirming the format to avoid wasted I/O on non-matching files (e.g., reading an entire docker-compose.yml just to check if it's a GitHub Actions workflow).
2931
- **Deterministic Output from Non-Deterministic Sources**: When building ordered output from non-deterministic sources (Go map iteration, goroutine-collected results, API directory listings), sort the data before further processing. This applies to rendered text, BFS seed queues, and any "first-seen wins" algorithm where input order determines provenance.
3032
- **Post-Filter Fuzzy Search Results**: When using search APIs that perform fuzzy or word-level matching (e.g., GitHub issue search), add a post-filter to verify exact matches before acting on results. Fuzzy matches can cause false-positive deduplication or incorrect state transitions.
3133
- **Rerun Analyzers with Combined Input Sets on Retry**: When retrying a subset of inputs through an analyzer that tracks collisions or shared matches, rerun with the combined input set (original + retry) to preserve attribution consistency. Subset reruns can misattribute shared matches to the wrong source.
34+
- **Guard Resource Bounds in HTTP Client Retry Logic**: When implementing retry/backoff logic for HTTP clients: (1) guard `time.Duration` arithmetic against integer overflow — use `strconv.ParseInt` and reject values exceeding `math.MaxInt64/time.Second` rather than clamping to an arbitrary policy constant; (2) cap response body reads with `io.LimitReader` in retry paths (429/5xx) to prevent unbounded memory and log growth across retry attempts; (3) use `time.NewTimer` + `Stop`/drain instead of `time.After` in `select` with `ctx.Done()` to prevent timer accumulation during long cancellable waits.
3235
- **Consolidate Detection Heuristics — Single Source of Truth**: When a detection heuristic (file type sniffing, format detection, path matching) is used in multiple locations, centralize it in the responsible package and have callers delegate. Duplicating the heuristic across layers (e.g., `cmd/` and `infrastructure/`) risks drift when one copy is updated but the other is not.
3336
- **Use Correct GitHub API Media Types**: When calling GitHub REST APIs, use the documented `Accept` header for the desired response format. For raw file content use `application/vnd.github.raw` (not `application/vnd.github.raw+json`). Incorrect media types may cause silent content-negotiation failures or unexpected response formats. Refer to GitHub's REST API media type documentation before adding a new endpoint call.
3437
- **Narrow Typed Error Matching to Specific Conditions**: When checking typed errors (e.g., `IsResourceNotFoundError`), verify the error's message or context matches the expected source — not just the error type. A single error type can be returned by multiple code paths with different semantics (e.g., "repo not found" vs "no package managers"), and a broad type check can trigger incorrect fallback behavior for unrelated error origins.
@@ -109,16 +112,6 @@ pending_patterns:
109112
pr: 370
110113
file: ".github/workflows/go-lint.yml"
111114
date: "2026-05-02"
112-
- category: "comment-doc-drift"
113-
summary: "Doc comments must match implementation boundary conditions — RetryConfig said retryDecider controls 429 retries (it doesn't); rateLimitBackoff comment said 'negative' for a zero-inclusive guard (should say 'non-positive')"
114-
pr: 359
115-
file: "internal/infrastructure/httpclient/client.go"
116-
date: "2026-04-29"
117-
- category: "defensive-coding"
118-
summary: "Guard time.Duration arithmetic against integer overflow — use strconv.ParseInt, reject values exceeding math.MaxInt64/time.Second, and do not clamp to an arbitrary policy constant (let the caller's configured cap decide)"
119-
pr: 359
120-
file: "internal/infrastructure/httpclient/client.go"
121-
date: "2026-04-29"
122115
- category: "logging-consistency"
123116
summary: "Pass typed error values (e.g., *QueryError) directly to slog instead of pre-stringifying via .Error() — preserves type/structure and keeps logging consistent with slog conventions"
124117
pr: 346
@@ -139,21 +132,11 @@ pending_patterns:
139132
pr: 359
140133
file: "internal/infrastructure/httpclient/client_test.go"
141134
date: "2026-04-29"
142-
- category: "defensive-coding"
143-
summary: "Cap response body reads with io.LimitReader in retry paths (429/5xx) to prevent unbounded memory and log growth across retry attempts — consistent with existing codebase pattern for HTTP body reads"
144-
pr: 359
145-
file: "internal/infrastructure/httpclient/client.go"
146-
date: "2026-04-29"
147135
- category: "ci-permissions"
148136
summary: "Verify that GitHub Actions reporter actions (e.g., reviewdog with github-pr-check) have the permissions they need — github-pr-check reporter requires checks: write to publish annotations; missing permissions cause silent failures or degraded reporting"
149137
pr: 370
150138
file: ".github/workflows/lint.yml"
151139
date: "2026-05-02"
152-
- category: "defensive-coding"
153-
summary: "Use time.NewTimer + Stop/drain instead of time.After in select with ctx.Done() to prevent timer accumulation during long cancellable waits"
154-
pr: 359
155-
file: "internal/infrastructure/httpclient/client.go"
156-
date: "2026-04-29"
157140
- category: "api-consistency"
158141
summary: "Redundant gh pr view API call to fetch labels when pr_json from repos/.../pulls already contains label data — reuse already-fetched API response data instead of making redundant calls for a subset of the same information"
159142
pr: 338
@@ -162,7 +145,9 @@ pending_patterns:
162145
```
163146
164147
<!-- Promotion history (kept for audit trail):
148+
# defensive-coding: promoted to copilot-learned-coding.instructions.md (PRs #359, #372 — guard resource bounds in HTTP client retry logic: time.Duration overflow, io.LimitReader cap, time.NewTimer lifecycle; use quoted heredocs for literal text in CI workflows)
165149
# defensive-coding: promoted to copilot-learned-coding.instructions.md (PRs #345, #366 — use dedicated predicates and full renderers for sentinel/composed values: IsUsableSPDX() not ad-hoc checks; canonicalize sentinel casing; use type renderer not single field; attribute provenance to correct resolution branch)
150+
# comment-doc-drift: promoted to copilot-learned-coding.instructions.md (PRs #359, #372 — attribute control claims to the correct mechanism: do not claim a permission/config/flag controls a behavior when a different mechanism is the effective authority)
166151
# defensive-coding: promoted to copilot-learned-coding.instructions.md (PRs #345, #366 — match write guard quantifiers to write semantics: an "any" flag guarding an "all/only" write misrepresents mixed inputs; verify replacement is a net improvement)
167152
# testing (PR #366): already covered by "Scope Test Assertions to Specific Output Regions" in testing-performance + "Use Spec-Compliant Parsers for Standardized Formats" — use encoding/csv to parse CSV output and assert exact cells by header name instead of fragile strings.Contains on boolean patterns that match the wrong column
168153
# defensive-coding (PR #338): stale pending entry removed — already promoted as "Sanitize Dynamic Content in GitHub Actions Workflow Commands" rule

.github/instructions/agent-orchestration.instructions.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,19 @@ Reviewer agent findings (including CRITICAL severity) are **discussion input**,
5050

5151
**Why:** Reviewer agents see only code, not the design intent behind it. In PR #123, a reviewer flagged "transitive advisories not shown on RequestedVersion" as CRITICAL, but this was an intentional decision (see ADR-0011). Blindly implementing the "fix" re-introduced a bug the user had already reported and resolved.
5252

53+
### Workflow-File PRs Are Resolved Locally, Not by CI Claude
54+
55+
PRs whose changes touch `.github/workflows/**` MUST be fixed locally by a human-driven Claude session, not by the auto-fix loop in `copilot-review-fix.yml`.
56+
57+
**Why:** The auto-fix PAT (`GH_ACTIONS_TOKEN`) intentionally lacks the `workflow` scope. Workflow files define CI runtime permissions and secret access — granting a bot the ability to rewrite them creates a privilege-escalation surface (prompt-injection or misbehaviour can self-modify the bot's own runtime). Defense-in-depth: keep PAT scope minimal; pay the small operational cost of manual intervention on workflow PRs.
58+
59+
**How to apply:**
60+
- When CI Claude reports "PAT lacks `workflow` scope" on a workflow-file thread, do NOT propose granting the scope. Pull the branch locally, apply the fix, push.
61+
- The marker dedup in `copilot-review-fix.yml` invalidates on push (HEAD SHA changes), so a manual push cleanly re-arms the auto-fix loop for the next round.
62+
- If the same Copilot finding loops indefinitely on a non-workflow PR, that's a different bug — investigate the auto-fix loop, do NOT relax PAT scope.
63+
64+
**Symptom of the broken case (for debugging):** trigger-claude job runs `success` but logs `trigger already posted for HEAD <sha> (count=1) — skip` because a prior Claude run posted the marker but couldn't push the actual fix.
65+
5366
## Worktree Isolation Policy
5467

5568
Agents that **write files** (Edit, Write) MUST be launched with `isolation: "worktree"` to prevent branch conflicts during parallel development. This gives each agent an isolated copy of the repository.

0 commit comments

Comments
 (0)