Skip to content

Review agent should use CHANGES_REQUESTED for medium-severity behavior-change and test-integrity findings #2940

Description

@fullsend-ai-retro

What happened

On PR #2761 the review agent identified two medium-severity findings: (1) RETRO_COMMENT losing its default-to-empty semantics due to the migration from shell-based ${RETRO_COMMENT:-} to YAML ${RETRO_COMMENT} (behavior-change category), and (2) scaffold_test.go not updated to verify keys in Env.Runner (test-integrity category). Both findings were submitted as COMMENTED reviews. The human reviewer approved without engaging with either finding. The PR was merged with both issues unresolved. The RETRO_COMMENT finding was independently confirmed by Qodo's review bot.

What could go better

When the review agent posts medium-severity findings as COMMENTED rather than CHANGES_REQUESTED, they blend into the PR timeline and are easily overlooked — especially when a human reviewer is scanning for blocking issues. On this PR, the agent's findings were more substantive than the human review, yet they had no influence on the merge decision. Escalating medium-severity findings in high-confidence categories (behavior-change, correctness, test-integrity) to CHANGES_REQUESTED would make them visible in the GitHub review UI's merge readiness indicator. Confidence is medium — this is a single PR observation and the pattern may not generalize. The risk is false-positive blocking if medium-severity thresholds are too sensitive.

Proposed change

Evaluate the review agent's verdict logic (likely in the review skill or post-script that translates findings into GitHub review status) to determine whether medium-severity findings in behavior-change, correctness, and test-integrity categories should produce a CHANGES_REQUESTED review instead of COMMENTED. Scope narrowly to these high-confidence categories to avoid false-positive blocking on style, documentation-drift, or redundancy findings. Consider a shadow-mode trial where the agent logs what it would have escalated without actually blocking.

Validation criteria

On the next 10 PRs where the review agent identifies a medium-severity behavior-change or test-integrity finding, track whether the finding was addressed before merge. Compare the resolution rate to the pre-change baseline (where this PR serves as one data point of 0% resolution). Target: 80%+ of medium-severity findings in these categories are explicitly acknowledged (fixed, dismissed with rationale, or overridden) before merge. Monitor for false-positive blocks that waste author time — if more than 20% of escalations are dismissed as not-actionable, narrow the criteria.


Generated by retro agent from #2761

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent/reviewReview agentcomponent/harnessAgent harness, config, and skills loadingfeatureFeature-category issue awaiting human prioritizationpriority/mediumNormal priority, plan for next cycleready-for-triageRetro-filed issue awaiting triage agenttriagedTriaged but awaiting human prioritizationtype/featureNew capability request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions