Review agent should use CHANGES_REQUESTED for medium-severity behavior-change and test-integrity findings

## What happened

On PR #2761 the review agent identified two medium-severity findings: (1) RETRO_COMMENT losing its default-to-empty semantics due to the migration from shell-based `${RETRO_COMMENT:-}` to YAML `${RETRO_COMMENT}` (behavior-change category), and (2) scaffold_test.go not updated to verify keys in `Env.Runner` (test-integrity category). Both findings were submitted as COMMENTED reviews. The human reviewer approved without engaging with either finding. The PR was merged with both issues unresolved. The RETRO_COMMENT finding was independently confirmed by Qodo's review bot.

## What could go better

When the review agent posts medium-severity findings as COMMENTED rather than CHANGES_REQUESTED, they blend into the PR timeline and are easily overlooked — especially when a human reviewer is scanning for blocking issues. On this PR, the agent's findings were more substantive than the human review, yet they had no influence on the merge decision. Escalating medium-severity findings in high-confidence categories (behavior-change, correctness, test-integrity) to CHANGES_REQUESTED would make them visible in the GitHub review UI's merge readiness indicator. Confidence is medium — this is a single PR observation and the pattern may not generalize. The risk is false-positive blocking if medium-severity thresholds are too sensitive.

## Proposed change

Evaluate the review agent's verdict logic (likely in the review skill or post-script that translates findings into GitHub review status) to determine whether medium-severity findings in behavior-change, correctness, and test-integrity categories should produce a CHANGES_REQUESTED review instead of COMMENTED. Scope narrowly to these high-confidence categories to avoid false-positive blocking on style, documentation-drift, or redundancy findings. Consider a shadow-mode trial where the agent logs what it would have escalated without actually blocking.

## Validation criteria

On the next 10 PRs where the review agent identifies a medium-severity behavior-change or test-integrity finding, track whether the finding was addressed before merge. Compare the resolution rate to the pre-change baseline (where this PR serves as one data point of 0% resolution). Target: 80%+ of medium-severity findings in these categories are explicitly acknowledged (fixed, dismissed with rationale, or overridden) before merge. Monitor for false-positive blocks that waste author time — if more than 20% of escalations are dismissed as not-actionable, narrow the criteria.

---
_Generated by retro agent from https://github.com/fullsend-ai/fullsend/pull/2761_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Review agent should use CHANGES_REQUESTED for medium-severity behavior-change and test-integrity findings #2940

What happened

What could go better

Proposed change

Validation criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Review agent should use CHANGES_REQUESTED for medium-severity behavior-change and test-integrity findings #2940

Description

What happened

What could go better

Proposed change

Validation criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions