You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(ci): make /security-review fail loudly when the model never runs (#1482)
* fix(ci): make /security-review fail loudly when the model never runs
PR #1474 surfaced a silent failure mode: the bundled /security-review
skill's SessionStart hook runs `git diff --name-only origin/HEAD...` as
its first command. actions/checkout@v6 with `ref: <fork-head-sha>` and
fetch-depth: 0 fetches the PR head's history but does not always fetch
origin/<base>, so the diff resolves to `fatal: no merge base`. The hook
errors, the SDK exits cleanly with num_turns=0 and zero tokens, the
inline-comment buffer is empty, and the workflow falsely reports
"no high-confidence findings."
Two changes, both narrow:
1. Explicitly fetch origin/<base> before invoking the action and verify
`git merge-base` resolves. Fail the step if it doesn't.
2. After the action runs, read its execution_file transcript and require
num_turns > 0 / output_tokens > 0 / is_error == false. If the model
never productively ran, fail the step and have the summary comment
say so instead of pretending the review was clean.
* style: prettier
* fix(ci): also fail closed when model-ran is unknown
Per @aidandaly24 review feedback: switch the summary-comment branch from
`modelRan === 'false'` to `modelRan !== 'true'` so the 'unknown' case
(claude-execution-output.json missing) also routes to the
"did not actually analyze" message instead of falling through to
"no findings". Both cases mean we couldn't verify the model ran.
if [ "$IS_ERROR" = "true" ] || [ "$NUM_TURNS" = "0" ] || [ "$OUTPUT_TOKENS" = "0" ]; then
274
+
echo "::error::Claude Code SDK reported success but the model never ran productively (num_turns=$NUM_TURNS, output_tokens=$OUTPUT_TOKENS, is_error=$IS_ERROR). The /security-review skill likely bailed before analysis (e.g. SessionStart hook error). Refusing to report 'no findings'."
275
+
echo "ran=false" >> "$GITHUB_OUTPUT"
276
+
exit 1
277
+
fi
278
+
echo "ran=true" >> "$GITHUB_OUTPUT"
279
+
216
280
- name: Count buffered findings
217
281
id: findings
218
282
# Only count if the review step actually ran (success or failure - both produce
// Two cases land here, both unsafe to report as "no findings":
323
+
// - 'false': SDK exited 0 but transcript shows the model never ran productively
324
+
// (e.g. /security-review's SessionStart hook errored before the first turn).
325
+
// - 'unknown': claude-execution-output.json was missing, so we couldn't verify
326
+
// the model ran at all. Treat as not-verified rather than silently green.
327
+
body = `**Claude Security Review:** the review did not actually analyze this PR (model took ${numTurns} turn${numTurns === '1' ? '' : 's'} — the skill likely failed during setup). See the [run](${runUrl}) for details; a later push or re-run is needed for a real review.`;
328
+
} else if (conclusion === 'success') {
254
329
body =
255
330
count > 0
256
331
? `**Claude Security Review:** posted ${count} inline finding${count === 1 ? '' : 's'} on this PR. ([run](${runUrl}))`
0 commit comments