fix(entry-review): harden against malformed model verdicts#71
Merged
Conversation
The entry-review job crashed with `TypeError: string indices must be integers` when the model returned `checks` as a string instead of the documented object (Anthropic tool-use does not hard-validate tool inputs against the schema). The crash failed the workflow step, which skipped both auto-merge and request-review, leaving the PR with a red check and no feedback — observed on PR #70 (waa entry). Add normalize_verdict() at the call_claude boundary so it always returns a well-formed {verdict, checks, notes} dict: invalid/missing check values default to "unverified", and an unparseable structure downgrades the verdict to UNKNOWN (route to manual review, never auto-merge a result we couldn't read). Mirrors the existing UNKNOWN fallback for API errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
entry-reviewjob crashed on PR #70 (waa entry) with:The
submit_verdicttool schema requireschecksto be an object, but Anthropic tool-use does not hard-validate tool inputs against the schema, so the model occasionally returnschecksas a string (or omits keys). Downstream code assumed the documented shape and crashed.Impact: the crash fails the workflow step → downstream
success()is false → both auto-merge and request-review skip (quick-check.yml#L362), leaving the PR with a red check and no actionable feedback. (#70 merged manually as a result.) This is the source of the recent "Entry Pipeline" failure emails.Fix
Add
normalize_verdict()and call it at thecall_claudeboundary so it always returns a well-formed{verdict, checks, notes}dict:"unverified";checksnot an object) downgrades the verdict toUNKNOWN— route to manual review, never auto-merge a result we couldn't read;UNKNOWNfallback already used for transient API errors.This is a robustness fix only — no security boundary or schema change.
Tests
5 new cases in
TestNormalizeVerdictcovering the original string-checkscrash, missing/invalid values, invalid verdict enum, and non-dict input. Full suite green (28 passed); ruff check + format clean.🤖 Generated with Claude Code