Skip to content

fix(entry-review): harden against malformed model verdicts#71

Merged
recursix merged 1 commit into
mainfrom
fix/entry-review-malformed-verdict
Jun 19, 2026
Merged

fix(entry-review): harden against malformed model verdicts#71
recursix merged 1 commit into
mainfrom
fix/entry-review-malformed-verdict

Conversation

@recursix

Copy link
Copy Markdown
Collaborator

Problem

The entry-review job crashed on PR #70 (waa entry) with:

TypeError: string indices must be integers, not 'str'
  format_verdict_comment → v = verdict["checks"][k]   (entry_review.py:217)

The submit_verdict tool schema requires checks to be an object, but Anthropic tool-use does not hard-validate tool inputs against the schema, so the model occasionally returns checks as a string (or omits keys). Downstream code assumed the documented shape and crashed.

Impact: the crash fails the workflow step → downstream success() is false → both auto-merge and request-review skip (quick-check.yml#L362), leaving the PR with a red check and no actionable feedback. (#70 merged manually as a result.) This is the source of the recent "Entry Pipeline" failure emails.

Fix

Add normalize_verdict() and call it at the call_claude boundary so it always returns a well-formed {verdict, checks, notes} dict:

  • invalid / missing individual check values default to "unverified";
  • an unparseable structure (checks not an object) downgrades the verdict to UNKNOWN — route to manual review, never auto-merge a result we couldn't read;
  • mirrors the existing UNKNOWN fallback already used for transient API errors.

This is a robustness fix only — no security boundary or schema change.

Tests

5 new cases in TestNormalizeVerdict covering the original string-checks crash, missing/invalid values, invalid verdict enum, and non-dict input. Full suite green (28 passed); ruff check + format clean.

🤖 Generated with Claude Code

The entry-review job crashed with `TypeError: string indices must be
integers` when the model returned `checks` as a string instead of the
documented object (Anthropic tool-use does not hard-validate tool inputs
against the schema). The crash failed the workflow step, which skipped
both auto-merge and request-review, leaving the PR with a red check and
no feedback — observed on PR #70 (waa entry).

Add normalize_verdict() at the call_claude boundary so it always returns
a well-formed {verdict, checks, notes} dict: invalid/missing check values
default to "unverified", and an unparseable structure downgrades the
verdict to UNKNOWN (route to manual review, never auto-merge a result we
couldn't read). Mirrors the existing UNKNOWN fallback for API errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com>
@recursix recursix merged commit 7c51f56 into main Jun 19, 2026
12 checks passed
@recursix recursix deleted the fix/entry-review-malformed-verdict branch June 19, 2026 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant