Skip to content

Commit 202eed9

Browse files
authored
fix(ci): make /security-review fail loudly when the model never runs (#1482)
* fix(ci): make /security-review fail loudly when the model never runs PR #1474 surfaced a silent failure mode: the bundled /security-review skill's SessionStart hook runs `git diff --name-only origin/HEAD...` as its first command. actions/checkout@v6 with `ref: <fork-head-sha>` and fetch-depth: 0 fetches the PR head's history but does not always fetch origin/<base>, so the diff resolves to `fatal: no merge base`. The hook errors, the SDK exits cleanly with num_turns=0 and zero tokens, the inline-comment buffer is empty, and the workflow falsely reports "no high-confidence findings." Two changes, both narrow: 1. Explicitly fetch origin/<base> before invoking the action and verify `git merge-base` resolves. Fail the step if it doesn't. 2. After the action runs, read its execution_file transcript and require num_turns > 0 / output_tokens > 0 / is_error == false. If the model never productively ran, fail the step and have the summary comment say so instead of pretending the review was clean. * style: prettier * fix(ci): also fail closed when model-ran is unknown Per @aidandaly24 review feedback: switch the summary-comment branch from `modelRan === 'false'` to `modelRan !== 'true'` so the 'unknown' case (claude-execution-output.json missing) also routes to the "did not actually analyze" message instead of falling through to "no findings". Both cases mean we couldn't verify the model ran.
1 parent ea2df09 commit 202eed9

1 file changed

Lines changed: 82 additions & 7 deletions

File tree

.github/workflows/pr-security-review.yml

Lines changed: 82 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -167,19 +167,39 @@ jobs:
167167
# the base branch locally too. fetch-depth: 0 grabs the full history.
168168
fetch-depth: 0
169169

170-
- name: Set origin/HEAD for /security-review skill
170+
- name: Prepare base ref for /security-review skill
171171
env:
172172
BASE_REF: ${{ steps.pr.outputs.base_ref }}
173173
run: |
174174
set -euo pipefail
175-
# actions/checkout doesn't set up the remote's symbolic HEAD ref, so
176-
# `git diff origin/HEAD...` (the first command the bundled
177-
# /security-review skill runs) fails with "ambiguous argument
178-
# 'origin/HEAD...': unknown revision". Point origin/HEAD at the PR's
179-
# base branch so the skill resolves the diff against the right ref.
175+
# The bundled /security-review skill's SessionStart hook runs
176+
# `git diff --name-only origin/HEAD...` as its first command. Two
177+
# things have to be true for that to succeed:
178+
# 1. origin/HEAD has to be a valid ref. actions/checkout doesn't
179+
# set up the remote's symbolic HEAD, so we point it at the PR's
180+
# base branch.
181+
# 2. The PR head and origin/<base> have to share a merge base in
182+
# the local clone. actions/checkout@v6 with `ref: <head_sha>`
183+
# fetches the head's history but on fork PRs may NOT fetch
184+
# origin/<base> into the clone — leaving `git diff origin/HEAD...`
185+
# to fail with "no merge base", which silently bombs the skill
186+
# (num_turns=0, model never invoked) and would otherwise look
187+
# like a clean review with zero findings.
188+
# Fetching origin/<base> explicitly closes that gap.
189+
git fetch --no-tags origin "+refs/heads/$BASE_REF:refs/remotes/origin/$BASE_REF"
180190
git remote set-head origin "$BASE_REF"
181191
git symbolic-ref refs/remotes/origin/HEAD
182192
193+
# Sanity: ensure the merge base actually resolves before we hand off
194+
# to the skill. If it doesn't, fail loudly here rather than letting
195+
# the skill silently bail.
196+
if ! git merge-base "origin/$BASE_REF" HEAD >/dev/null; then
197+
echo "::error::No merge base between HEAD and origin/$BASE_REF — /security-review cannot compute a diff."
198+
exit 1
199+
fi
200+
echo "Merge base: $(git merge-base "origin/$BASE_REF" HEAD)"
201+
echo "Files changed: $(git diff --name-only "origin/$BASE_REF...HEAD" | wc -l)"
202+
183203
- name: Configure AWS credentials (OIDC)
184204
uses: aws-actions/configure-aws-credentials@v6
185205
with:
@@ -213,6 +233,50 @@ jobs:
213233
--model us.anthropic.claude-opus-4-7 --max-turns 30 --allowedTools
214234
mcp__github_inline_comment__create_inline_comment
215235
236+
- name: Verify model actually ran
237+
id: model-ran
238+
# The action exits 0 even when the model was never invoked (e.g. a
239+
# SessionStart hook errored before the first turn). Treating that as
240+
# success would let the workflow falsely report "no high-confidence
241+
# findings" — that's exactly what happened on PR #1474, where the
242+
# `/security-review` skill's first `git diff origin/HEAD...` hit a
243+
# "no merge base" error, the SDK still returned `subtype: success`
244+
# with `num_turns: 0` and zero tokens, and the buffer was empty.
245+
#
246+
# The action writes its full SDK transcript to
247+
# ${RUNNER_TEMP}/claude-execution-output.json. We pull the final
248+
# result envelope and require that the model actually took turns and
249+
# spent tokens. If it didn't, mark the run as not-actually-completed
250+
# so the summary comment uses the failure branch instead of pretending
251+
# there were no findings.
252+
if: steps.review.conclusion == 'success' || steps.review.conclusion == 'failure'
253+
env:
254+
# The action exposes its full SDK transcript path as the
255+
# `execution_file` step output (a JSON array of stream events; the
256+
# final element is the result envelope). We fall back to the known
257+
# path under RUNNER_TEMP if for some reason the output is missing.
258+
OUTPUT_JSON:
259+
${{ steps.review.outputs.execution_file || format('{0}/claude-execution-output.json', runner.temp) }}
260+
run: |
261+
set -euo pipefail
262+
if [ ! -s "$OUTPUT_JSON" ]; then
263+
echo "::warning::No claude-execution-output.json found at $OUTPUT_JSON; cannot verify the model ran."
264+
echo "ran=unknown" >> "$GITHUB_OUTPUT"
265+
echo "num_turns=0" >> "$GITHUB_OUTPUT"
266+
exit 0
267+
fi
268+
NUM_TURNS=$(jq -r '.[-1].num_turns // 0' "$OUTPUT_JSON")
269+
IS_ERROR=$(jq -r '.[-1].is_error // false' "$OUTPUT_JSON")
270+
OUTPUT_TOKENS=$(jq -r '.[-1].usage.output_tokens // 0' "$OUTPUT_JSON")
271+
echo "num_turns=$NUM_TURNS, is_error=$IS_ERROR, output_tokens=$OUTPUT_TOKENS"
272+
echo "num_turns=$NUM_TURNS" >> "$GITHUB_OUTPUT"
273+
if [ "$IS_ERROR" = "true" ] || [ "$NUM_TURNS" = "0" ] || [ "$OUTPUT_TOKENS" = "0" ]; then
274+
echo "::error::Claude Code SDK reported success but the model never ran productively (num_turns=$NUM_TURNS, output_tokens=$OUTPUT_TOKENS, is_error=$IS_ERROR). The /security-review skill likely bailed before analysis (e.g. SessionStart hook error). Refusing to report 'no findings'."
275+
echo "ran=false" >> "$GITHUB_OUTPUT"
276+
exit 1
277+
fi
278+
echo "ran=true" >> "$GITHUB_OUTPUT"
279+
216280
- name: Count buffered findings
217281
id: findings
218282
# Only count if the review step actually ran (success or failure - both produce
@@ -240,17 +304,28 @@ jobs:
240304
PR_NUMBER: ${{ steps.pr.outputs.number }}
241305
FINDING_COUNT: ${{ steps.findings.outputs.count }}
242306
REVIEW_CONCLUSION: ${{ steps.review.conclusion }}
307+
MODEL_RAN: ${{ steps.model-ran.outputs.ran }}
308+
NUM_TURNS: ${{ steps.model-ran.outputs.num_turns }}
243309
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
244310
with:
245311
github-token: ${{ steps.app-token.outputs.token }}
246312
script: |
247313
const prNumber = parseInt(process.env.PR_NUMBER, 10);
248314
const count = parseInt(process.env.FINDING_COUNT || '0', 10);
249315
const conclusion = process.env.REVIEW_CONCLUSION || 'skipped';
316+
const modelRan = process.env.MODEL_RAN || 'unknown';
317+
const numTurns = process.env.NUM_TURNS || '0';
250318
const runUrl = process.env.RUN_URL;
251319
252320
let body;
253-
if (conclusion === 'success') {
321+
if (modelRan !== 'true') {
322+
// Two cases land here, both unsafe to report as "no findings":
323+
// - 'false': SDK exited 0 but transcript shows the model never ran productively
324+
// (e.g. /security-review's SessionStart hook errored before the first turn).
325+
// - 'unknown': claude-execution-output.json was missing, so we couldn't verify
326+
// the model ran at all. Treat as not-verified rather than silently green.
327+
body = `**Claude Security Review:** the review did not actually analyze this PR (model took ${numTurns} turn${numTurns === '1' ? '' : 's'} — the skill likely failed during setup). See the [run](${runUrl}) for details; a later push or re-run is needed for a real review.`;
328+
} else if (conclusion === 'success') {
254329
body =
255330
count > 0
256331
? `**Claude Security Review:** posted ${count} inline finding${count === 1 ? '' : 's'} on this PR. ([run](${runUrl}))`

0 commit comments

Comments
 (0)