Conversation
…view-fix.yml Two issues caught by Copilot review on the parallel vuls-reach back-port (vuls-saas/vuls-reach#22), addressed there in 821e3da. The same bugs exist verbatim in the OSS copy of `copilot-review-fix.yml` (workflow files were copy-portable across catalog/oss/vuls-reach in the recent unification). Mirror the fix here so OSS doesn't ship the bugs. 1. **Stale "~5 min" cron-fallback claim** (lines 7-11): comment said "covers suppressed cases within ~5 min" but `copilot-clean-label.yml` actually uses `*/30` (cron fires every 30 minutes, worst case ~60 min — marker posted just after a tick stays fresh on the next tick). Updated to match the documented 30–60 min recovery window used elsewhere. 2. **`cancel-in-progress: true` race with label removal** (line 29): the first step on the labeled-PR path is "Remove copilot-review label if present" (so the label acts as a single-shot trigger). With `cancel-in-progress: true`, a new Copilot review/comment event for the same PR cancels the label-triggered run before that first step completes, leaving the label stuck on the PR. Re-adding an already- present label fires no event, so the manual retry path silently breaks. Switched to `cancel-in-progress: false` so queued events run sequentially; the marker-based dedup later in the job (`already_triggered` / `local_in_progress` jq passes) already prevents duplicate `@claude` posts, so queueing is the safe choice. The same bugs exist in catalog (#223 already merged) — follow-up PR will mirror the fix there.
There was a problem hiding this comment.
Pull request overview
Updates the copilot-review-fix.yml GitHub Actions workflow documentation and concurrency behavior to better match how the Copilot-triggered “auto-fix” pipeline actually recovers from suppressed events and to prevent a label-trigger race that can strand the manual retry label on PRs.
Changes:
- Correct the inline docs to reflect that the cron fallback in
copilot-clean-label.ymlruns every 30 minutes (30–60 min recovery window). - Change the per-PR concurrency policy from canceling in-progress runs to queueing them (
cancel-in-progress: false) to avoid canceling label-triggered runs before they remove thecopilot-reviewlabel.
…x.yml Same two issues caught on catalog #225 round 2 (commit 76cb3b9). Mirror to OSS #379 so they ship together. 1. Comment naming: replace `already_triggered` → `already` to match the actual variable name used in this workflow's script (`already_triggered` only appears in copilot-clean-label.yml's cron path). 2. Stale HEAD_SHA risk introduced by `cancel-in-progress: false`: refresh HEAD via `gh pr view --json headRefOid` at runtime and exit early when the payload SHA no longer matches the current PR HEAD.
|
@claude Run |
|
🛑 CI auto-fix declined: this PR touches |
|
Claude finished @kotakanbe's task in 2m 48s —— View job
|
| Finding | Status |
|---|---|
| Cron timing doc fix | ✅ Correct |
cancel-in-progress: false |
✅ Correct |
| Runtime HEAD_SHA refresh | ✅ Correct |
| Stale-event guard blocks manual label path |
The PR is nearly merge-ready — the one remaining item is gating the stale-SHA early exit to auto-event paths only (or using HEAD_SHA=$current_sha and continuing for the manual label path, as Copilot suggested).
Round-2 stale-SHA guard (added to copilot-review-fix.yml in catalog #225 / oss #379 / vuls-reach #23) regressed the manual `copilot-review` label retry path. The flow that breaks: 1. Operator attaches `copilot-review` → label event queues a job (under `cancel-in-progress: false` from round 1). 2. A new push lands before the queued job starts. 3. Job runs; first step removes the label. 4. Stale-SHA guard sees `payload_sha != current_sha` and exits 0. 5. Outcome: label removed, no `@claude` trigger posted, no feedback comment, manual retry path silently broken (re-attaching an absent label fires no event, but the label-add event was the one that just no-op'd). The fix splits the stale-SHA branch by event type: - **Auto event paths** (`pull_request_review` / `pull_request_review_comment`): keep the early-exit. A newer push triggers a fresh auto event that will reach this workflow with the up-to-date HEAD, so skipping this stale-SHA run is correct. - **Manual label path** (`pull_request` event): proceed using `current_sha` instead of exiting. The operator already chose to engage CI, and no newer label-add event will fire (the label-add event has been consumed; re-adding an absent label means re-add must come from the operator). The `@claude` trigger should be posted for the current HEAD so the operator gets the CI run they asked for.
…rect stale-event comment
Two follow-up issues from round-3 Copilot review:
1. **`gh pr view` failure path repeats round-2's manual-label silent
no-op bug**: round-3 split the stale-SHA branch by event type, but
the `gh pr view --json headRefOid` failure path right above kept a
single `exit 0` that applies to all event types. On the manual
label path the earlier step has already removed the
`copilot-review` label, so an exit here is a silent no-op (no
`@claude` trigger, no feedback). Fix: split the failure path the
same way as the stale-SHA branch. Manual label path falls back to
the payload SHA and proceeds (operator gets a CI run for what we
can verify, the marker dedup downstream still skips cleanly if the
payload SHA was stale, and the operator at least sees a real run).
Auto event paths keep the fail-closed `exit 0` (Copilot's next
review or the cron fallback drives the PR forward).
2. **Stale-event comment overstates push semantics**: round-3's
"stale auto event" comment said "a newer push triggers a fresh
review/comment event that will reach this workflow". A bare push
does NOT emit `pull_request_review` / `pull_request_review_comment`
events; only Copilot's review/comment does. Reworded to "a fresh
Copilot review/comment for the newer HEAD will fire its own
event" and added the explicit cron-fallback escape hatch
("otherwise cron fallback handles within ~30–60 min") so the
reader understands the actual recovery path, not an aspirational
one.
|
@claude Run |
|
🛑 CI auto-fix declined: this PR touches |
|
Claude finished @kotakanbe's task in 2m 56s —— View job
|
| Finding | Classification | Status |
|---|---|---|
Cron timing doc (*/30 = 30–60 min) |
✅ Correct | Merged in commit 0ac3cb2 |
cancel-in-progress: false |
✅ Correct | Merged in commit 0ac3cb2 |
| Runtime HEAD_SHA refresh | ✅ Correct | Merged in commit 339f07c |
| Stale-SHA guard preserves manual label path | ✅ Correct | Merged in commit 4c77a4d |
gh pr view failure preserves manual retry |
✅ Correct | Merged in commit 3d526d5 |
| "Fail-closed" comment accuracy (line 135) | Exact patch provided above | |
| "Dedup will skip" comment accuracy (line 145) | Exact patch provided above |
PR is one commit away from merge-ready — apply the two comment corrections above locally and push.
Round-5 Copilot review caught comment-vs-code drift introduced by my
own iterative changes in rounds 2-4:
1. **Header timing comment** was too coarse: "~30–60 min" elided the
distinction between (a) suppressed-event recovery (≤30 min — next
cron tick fires) and (b) local-marker TTL recovery (30–60 min — a
fresh marker may cause the first tick to skip). Reworded to enumerate
both cases explicitly.
2. **HEAD-refresh block comment** said "exit early on mismatch" and
"fail-closed on `gh pr view` error", but rounds 3-4 split both paths
by event type:
- AUTO event paths (`pull_request_review` /
`pull_request_review_comment`): fail-CLOSED + exit-on-mismatch
(cron / next Copilot review drives forward).
- MANUAL label path (`pull_request`): fail-OPEN (payload-SHA
fallback) + proceed-on-mismatch (no newer label-add event
reaches this workflow because the label-add was consumed).
Comment now lays out the four-cell matrix (auto vs manual × error
vs mismatch) so future debugging matches the actual control flow.
3. **Manual-label fallback warning** said the marker dedup "further
down will skip cleanly" — but the manual label path explicitly
BYPASSES `already` / `local_in_progress` dedup (escape-hatch
semantics, see "Manual label path bypasses both suppressions"
later in the file). Rewrote the warning to acknowledge the
bypass and call out the documented trade-off (stale-SHA trigger
may be posted in this rare error path; operator can manually
reset by removing+re-attaching the label).
No behavior change. Comments only.
…hecklist (#380) Add a "Phase B exit criteria — DO NOT MISREAD" preamble to Step 8 that formalizes the four conditions Step 8.2 checks (HEAD_SHA captured / review fetched / review_commit == HEAD_SHA / body matches "no (new) comments"). Calls out the specific operator-mistake mode that recurred during catalog #225 / oss #379 / vuls-reach #23 round-1: confusing "unresolved Copilot thread count == 0" (a Phase C / GraphQL `reviewThreads` metric) with "Phase B converged" — that count can be 0 between rounds while Copilot has not yet re-reviewed the new HEAD. Exiting on the thread-count signal is how round-N regressions slip through and the user has to point out "ループまだ終わってない". Also adds the rule: when relaying state to the user mid-loop, re-run all four conditions every time (a push invalidates condition 3). No code-flow change. Pure prose hardening of the existing Step 8.2 classification table.
Summary
Two Copilot-flagged bugs in
copilot-review-fix.yml(caught on the parallel vuls-reach back-port vuls-saas/vuls-reach#22). The same bugs exist verbatim in this OSS copy and in catalog post-#223 (which is already merged — separate fix PR will follow there).copilot-clean-label.ymlactually uses*/30(cron fires every 30 minutes, worst case ~60 min — marker posted just after a tick stays fresh on the next tick). Updated to match the documented 30–60 min recovery window used elsewhere.cancel-in-progress: truerace with label removal: the first step on the labeled-PR path is "Remove copilot-review label if present" (so the label acts as a single-shot trigger). Withcancel-in-progress: true, a new Copilot review/comment event for the same PR cancels the label-triggered run before that first step completes, leaving the label stuck on the PR. Re-adding an already-present label fires no event, so the manual retry path silently breaks. Switched tocancel-in-progress: falseso queued events run sequentially; the marker-based dedup later in the job (already_triggered/local_in_progressjq passes) already prevents duplicate@claudeposts, so queueing is the safe choice.Test plan
actionlintclean (verified: exit 0).821e3da).🤖 Generated with Claude Code