Skip to content

ARM API Reviewer agent: add a critic sub-agent to independently validate findings before posting #43302

@ravimeda

Description

@ravimeda

Summary

The ARM API Reviewer agent (.github/agents/arm-api-reviewer.agent.md) performs systematic, rules-based reviews of REST API specification PRs. Today it operates as a single-agent loop: it fetches the PR, applies the instruction-file checklists, classifies findings as NEW vs EXISTING, and presents the report to a human reviewer for posting approval.

In practice we still see a recurring class of issues that the single-agent loop is structurally unable to catch:

  1. False positives: findings that cite a rule the spec does not actually violate, or that misread the JSON/TypeSpec.
  2. Wrong line numbers: line citations that drift after the agent revises its own report, or off-by-one errors from counting against a stale read.
  3. Misclassified NEW vs EXISTING: a finding tagged [NEW] when the same violation is present in the previous API version, or vice versa.
  4. Stale or invented rule citations: a rule ID that no longer exists in the instruction file, or a paraphrase that does not match the rule's normative text.
  5. Missed violations: sections of the changed files (or entire changed files) skipped during the systematic review.
  6. Incomplete previous-version comparison: the suppression-continuity analysis or breaking-change diff was skipped or only partially performed.

These issues erode reviewer trust and create real cost. A single wrong finding posted to a PR forces the spec author to push back, the human reviewer to re-verify, and the agent to be corrected.

This proposal adds a separate critic sub-agent that independently re-validates the reviewer's findings before they are presented to the human for posting. The critic pattern is well-established for high-stakes automated reasoning (independent verification of an LLM's output by a second LLM with a different system prompt and a narrower tool surface). Applying it here addresses the structural gap directly.


Proposal

1. Add a new sub-agent: arm-api-review-critic.agent.md

Create .github/agents/arm-api-review-critic.agent.md alongside the existing reviewer agent.

Tool surface (strictly read-only):

  • read, search, web/fetch, web/githubRepo
  • GitHub MCP read tools only: get_pull_request, list_pull_request_files, get_file_contents, get_review_comments
  • Explicitly excluded: any GitHub MCP tool that creates, updates, resolves, or comments on the PR; any edit tool. The critic must be incapable of mutating the PR or the workspace.

Persona: even more skeptical than the reviewer. The critic's success metric is catching errors in the reviewer's report, not producing review findings of its own. It should assume every finding is wrong until it independently verifies the cited file content, the cited rule text, and the NEW/EXISTING classification.

2. Two-track verdict

The critic returns two independent verdicts on every invocation:

Track Verdict values Semantics
Finding accuracy (blocking) PASS / WARN / FAIL Every finding in the reviewer's report is verifiable. FAIL means at least one finding has a wrong line number, a misquoted or non-existent rule, a wrong NEW vs EXISTING classification, or a misread of the spec.
Coverage quality (advisory) APPROVE / REQUEST EXPANSION / NEEDS DISCUSSION The reviewer applied the full checklist to every changed file; obvious violations were not skipped; the previous-version comparison and suppression-continuity analysis were actually performed.

FAIL is a hard gate. The reviewer cannot present its findings to the human until the FAIL is resolved (either fixed or human-overridden with a recorded justification).

3. Mandatory re-validation steps

For each finding in the reviewer's report, the critic independently:

  1. Re-fetches the cited file from the PR head ref via GitHub MCP. It does not reuse the reviewer's in-context copy.
  2. Re-reads the cited line(s) and confirms the violation text is at that exact line. Catches off-by-one errors and drift after the reviewer's own edits to the report.
  3. Re-reads the cited rule from the relevant instruction file (openapi-review.instructions.md, armapi-review.instructions.md, typespec-review.instructions.md) and confirms the rule's normative text supports the finding. Catches paraphrase drift and invented rule IDs.
  4. Re-fetches the previous-version file from the base branch and verifies the NEW vs EXISTING classification by inspecting the same JSON path, model, or operation in the prior version. Catches misclassification in both directions.
  5. Spot-checks 2 to 3 unflagged sections of the changed files (sampled at random, weighted toward security, LRO, pagination, and provisioning-state areas) to detect missed violations.
  6. Cross-checks readme.md suppressions against the previous version to confirm the suppression-continuity analysis was actually executed.

4. Insert as Step 6.5 in the existing workflow

The reviewer agent's current workflow is:

Step 5 (Cross-File Consistency) then Step 6 (Report Findings) then Step 7 (Post Comments on PR)

Insert:

Step 6.5: Mandatory Critic Review. After producing the structured findings report in Step 6, and before presenting to the human for posting approval, invoke the ARM API Review Critic sub-agent. Pass the PR URL, the full findings report, and the list of files reviewed.

Authorization rule: Step 7 is gated on the critic's most recent verdict being Finding accuracy = PASS (or all FAILs human-overridden) and Coverage quality = APPROVE or NEEDS DISCUSSION. A verdict that predates any revision of the report is stale and does not count. If the reviewer changes any finding, the critic must be re-invoked.

Iteration cap: at most 3 reviewer-to-critic loops before remaining issues are escalated to the human with both the reviewer's and the critic's outputs.

The reviewer's existing constraints (read-only, human-gated PR posting, Step 7 still requires explicit human approval) remain unchanged. The critic adds an additional gate; it does not replace the human one.

5. Critic output schema

A rigid, machine-parseable output structure so the reviewer agent can act on it programmatically:

## ARM API Review Critique

### Verdict
- Finding accuracy:  PASS | WARN | FAIL
- Coverage quality:  APPROVE | REQUEST EXPANSION | NEEDS DISCUSSION

### Dimension scores
| Dimension                       | Rating           | Notes |
|---------------------------------|:----------------:|-------|
| Line number accuracy            | OK / Warn / Fail | ... |
| Rule citation accuracy          | OK / Warn / Fail | ... |
| NEW vs EXISTING classification  | OK / Warn / Fail | ... |
| Previous-version comparison     | OK / Warn / Fail | ... |
| Suppression continuity          | OK / Warn / Fail | ... |
| Severity assignment             | OK / Warn / Fail | ... |
| Checklist coverage              | OK / Warn / Fail | ... |
| Missed violations (spot-check)  | OK / Warn / Fail | ... |
| False-positive risk             | OK / Warn / Fail | ... |
| Fix-suggestion correctness      | OK / Warn / Fail | ... |

### Findings that must be corrected (FAIL)
1. Finding #N cites rule X at line Y. Re-read of <file> shows the violation is at line Z. Off-by-2.
2. Finding #M classified as [NEW]. Previous version <prev-file> line 312 has the same violation. Should be [EXISTING].

### Likely missed violations
1. <file> `$.paths['/foo'].patch.responses` returns a top-level `string` schema. RPC-Patch-V1-04 violation, not flagged.

### Replay validation
- Files re-fetched: <list>
- Rule texts re-verified: <list of rule IDs>
- Previous-version paths checked: <list>
- Random sections spot-checked: <list>

### What was done well
...

6. Override audit trail

When the human overrides a critic FAIL, record the override on the PR itself (this repo has no decisions.md convention). Reuse the existing hidden HTML telemetry marker with a new type field:

<!-- posted-by: arm-api-reviewer-agent | type: critic-override | rule: <rule-id> | reason: <short justification> -->

This keeps the audit trail co-located with the review and queryable via the GitHub API alongside existing reviewer telemetry.

7. Seed list of known false-positive and missed-violation patterns

The critic should embed a curated list of patterns mined from past ARM-review PR comments. These are places where the reviewer agent has historically been wrong. Seed examples:

  • Do not flag missing provisioningState on proxy resources.
  • Verify a property is actually in the response schema, not merely defined.
  • A "breaking" enum change inside x-ms-enum.modelAsString: true is not a breaking change.
  • common-types reference version must match the API version era.
  • Suppressions carried forward from a prior version are acceptable; only newly added suppressions need justification.

This list is intended to grow over time as humans correct the reviewer or the critic, encoding institutional memory directly in the agent.


Why a separate sub-agent, not a self-check step

The reviewer agent already contains self-discipline rules ("No false positives", "re-read the spec element before reporting"). They are insufficient by themselves because:

  • A single LLM context cannot reliably catch its own off-by-one errors against content it has already read into context. The cached reading dominates.
  • Self-check has no incentive structure: the same agent that wants to ship a finding is the one judging whether to ship it.
  • A separate agent with a different system prompt, no edit tools, and a success metric of "catch errors in the other agent's output" creates the missing adversarial pressure.
  • Tool-surface separation also makes the critic safe to run automatically: it cannot post, comment, label, or modify anything regardless of what it concludes.

What stays the same

  • The reviewer agent remains read-only and PR-only.
  • Human approval is still required to post any comments to the PR (Step 7 is unchanged).
  • The critic never posts to the PR; it only returns its verdict to the reviewer.
  • Existing telemetry markers, label rules (ARMChangesRequested, WaitForARMFeedback), and workspace-cleanup rules (Step 9) are unchanged.

Expected impact

  • Lower rate of false-positive comments posted to PRs.
  • Fewer "the line number is wrong" or "this rule does not apply here" pushbacks from spec authors.
  • Better NEW vs EXISTING classification, so authors are clear on what this PR introduced vs. pre-existing technical debt.
  • Higher confidence for the human reviewer when approving the agent's findings. They see both the reviewer's report and the critic's independent validation.
  • An auditable record of every false positive caught (via critic-override telemetry markers) that can be mined to improve the reviewer's instructions over time.

Out of scope for this issue

  • Adding a critic for non-ARM review agents in this repo (could follow the same pattern in a separate issue).
  • Automating the human approval step in Step 7.
  • Changing any of the underlying instruction files (openapi-review.instructions.md, armapi-review.instructions.md, typespec-review.instructions.md).

Next steps if accepted

  1. Draft .github/agents/arm-api-review-critic.agent.md with the persona, tool surface, re-validation steps, and output schema described above.
  2. Patch .github/agents/arm-api-reviewer.agent.md to insert Step 6.5 and the authorization rule into Step 7.
  3. Extend the telemetry marker spec in Step 7.8 to document the type: critic-override variant.
  4. Run the pair against a backlog of recently-reviewed PRs to calibrate the critic's spot-check sampling and seed the false-positive pattern list.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions