Skip to content

feat: add finding verification via lightweight sub-agent checks#95

Merged
waynesun09 merged 4 commits into
mainfrom
finding-verification
May 22, 2026
Merged

feat: add finding verification via lightweight sub-agent checks#95
waynesun09 merged 4 commits into
mainfrom
finding-verification

Conversation

@waynesun09

@waynesun09 waynesun09 commented May 22, 2026

Copy link
Copy Markdown
Owner

Summary

Add a FindingVerifier that spawns parallel, budget-constrained AI sub-agents to verify
code review findings against actual source code, filtering false positives before inline
comment posting.

  • Each finding is independently checked by a lightweight ExecutionEngine-based sub-agent
    that reads actual source files via read_file tools to confirm or reject findings
  • Invalid findings (false positives) are automatically filtered; severity can be adjusted
  • Verifier re-resolves line numbers from code snippets for accurate inline comment placement
  • Feature is off by default, enabled via DELEGATION_VERIFY_FINDINGS=true

Changes

  • New src/cicaddy/delegation/verifier.py (547 lines) — FindingVerifier with parallel
    verification, robust JSON parsing, read_file tool filtering, boundary-sanitized prompts,
    snippet-based line re-resolution, path-separator guards on suffix matching
  • src/cicaddy/delegation/line_resolver.pyget_diff_line_ranges(), is_line_in_diff_ranges(),
    exact-match-first logic, new_count==0 guard, path-separator suffix guards
  • src/cicaddy/delegation/orchestrator.py — threads verify_findings flag and ToolContext
    through to FindingVerifier
  • src/cicaddy/delegation/summarizer.pyFinding dataclass extended with verified /
    verification_reason fields; snippet-first line resolution in resolve_findings()
  • src/cicaddy/delegation/schemas/summarizer_response.schema.json — added verified and
    verification_reason fields (required for additionalProperties: false)
  • src/cicaddy/config/settings.py — new DELEGATION_VERIFY_FINDINGS boolean setting
  • src/cicaddy/agent/base.py — reads setting and passes to orchestrator

Key design decisions

  • Single shared provider: one AI provider instance shared across all concurrent verifications
  • Sub-linear budget scaling: sqrt(num_findings) * 2 fraction to avoid starving individual verifications
  • Tool allowlist: restricted to read_file / read_file_lines only (frozenset)
  • Boundary sanitization: all finding fields and line ranges sanitized via _sanitize_for_boundary
  • Path-separator guards: suffix matching uses '/' + prefix to prevent bar.py matching foobar.py
  • Graceful degradation: verification failures preserve original findings as uncertain
  • Consistent defaults: uses DEFAULT_AI_PROVIDER for provider fallback (matches token budget lookup)

Test plan

  • 95 new tests in test_delegation_verifier.py — parsing, tool filtering, prompt construction,
    parallel execution, provider lifecycle, snippet line resolution, adjusted_line validation
  • 26 new tests in test_line_resolver.py — diff range extraction, hunk validation, line range checks
  • 6 new tests in test_delegation_orchestrator.py — verification flag threading, error handling
  • uv run pytest tests/unit/ -q — 1399 tests pass
  • pre-commit run --files <all changed files> — all checks pass
  • Manual: set DELEGATION_VERIFY_FINDINGS=true and run against a real PR

Summary by CodeRabbit

  • New Features
    • Added optional finding verification to validate extracted findings against the actual codebase using a lightweight AI check.
    • Added new configuration setting to enable/disable finding verification.
    • Findings now include verification status and reasoning when verified.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds an optional verification step for delegated findings: a new CoreSettings flag and env binding, orchestrator wiring to forward the flag and tool context, summarizer/line-resolver updates to include verification metadata and hunk-aware mapping, a new FindingVerifier module that runs concurrent read-only sub-agents to validate findings, and extensive tests.

Changes

Finding Verification Feature

Layer / File(s) Summary
Configuration and system wiring
src/cicaddy/config/settings.py, src/cicaddy/agent/base.py
Adds delegation_verify_findings to CoreSettings (env DELEGATION_VERIFY_FINDINGS) and wires the flag into DelegationOrchestrator.execute.
Orchestrator pipeline updates
src/cicaddy/delegation/orchestrator.py
DelegationOrchestrator.execute() and _aggregate_results() gain verify_findings and accept/forward parent_tools, mcp_manager, and local_registry; orchestrator calls new helper to run FindingVerifier when enabled and returns potentially verified findings.
Summarizer and deterministic mapping
src/cicaddy/delegation/summarizer.py
Finding adds verified and verification_reason; SummarizationAgent accepts settings; AI line-resolution prompts and mapping application are tightened to use diff-hunk-aware diff_ranges.
Deterministic line resolution utilities
src/cicaddy/delegation/line_resolver.py
Adds _resolve_one_finding, get_diff_line_ranges, and is_line_in_diff_ranges; resolve_findings now prefers snippet-based resolution when existing_code is present and uses slash-delimited suffix matching.
FindingVerifier engine
src/cicaddy/delegation/verifier.py
New module: ToolContext, VerificationResult, and FindingVerifier that runs concurrent, budget-constrained verification sub-agents, builds strict verification prompts with diff context, restricts allowed tools to read-only file access, parses/validates responses, and applies line/snippet corrections.
Tests
tests/unit/test_delegation_verifier.py, tests/unit/test_delegation_summarizer.py, tests/unit/test_delegation_orchestrator.py, tests/unit/test_line_resolver.py
Adds comprehensive tests for verifier parsing/validation, engine scaling and lifecycle, prompt building, orchestrator verification wiring, line-resolver overrides, and Finding verification-field defaults.

Sequence Diagram(s)

sequenceDiagram
  participant Agent as BaseAIAgent
  participant Orchestrator as DelegationOrchestrator
  participant Summarizer as SummarizationAgent
  participant Verifier as FindingVerifier
  participant SubAgent as Verification Sub-Agent
  participant FileTools as File-Read Tools

  Agent->>Orchestrator: execute(..., verify_findings=True)
  Orchestrator->>Summarizer: construct(settings, parent_tools, mcp_manager, local_registry)
  Summarizer->>Summarizer: extract findings
  Orchestrator->>Verifier: _verify_findings(findings, diff, tool_context)
  Verifier->>SubAgent: initialize provider & create verification engine
  loop per finding (concurrent, bounded)
    Verifier->>SubAgent: run_turn(prompt, allowed_tools)
    SubAgent->>FileTools: read_file / read_file_lines
    FileTools-->>SubAgent: file content
    SubAgent-->>Verifier: verification response (status, reasoning, adjusted_severity, confidence, adjusted_line)
    Verifier->>Verifier: parse & validate response -> update Finding (verified, verification_reason, line)
  end
  Verifier-->>Orchestrator: verified findings
  Orchestrator-->>Agent: aggregated results (maybe verified)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • waynesun09/cicaddy#44: Related delegation execution pipeline changes that this PR builds upon.
  • waynesun09/cicaddy#52: Overlapping summarization and delegation flow changes; shares Finding and orchestrator integration points.
  • waynesun09/cicaddy#54: Prior changes to summarizer line-extraction logic that this PR’s diff-range–aware mapping builds on.

"I nibble at diffs and hop through code lines,
Sub-agents read files while I munch on designs.
Constrained and concurrent, the verifications run,
Filtering findings till the right ones are done.
A rabbit-approved change—swift, precise, and fun!" 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding a finding verification feature using lightweight sub-agent checks. It is concise, specific, and directly related to the primary purpose of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch finding-verification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/cicaddy/delegation/verifier.py (1)

200-206: ⚡ Quick win

Use Google-style docstrings for new methods

Several new method docstrings in this src/ module are short-form; please switch them to Google-style (with sections like Args: / Returns:) for consistency with project standards.

As per coding guidelines, src/**/*.py: "Use type hints and Google-style docstrings in Python code".

Also applies to: 249-256, 292-293, 315-317, 324-329, 370-371

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cicaddy/delegation/verifier.py` around lines 200 - 206, The docstring for
the newly added method _create_verification_engine (and the other short-form
docstrings referenced at the noted ranges) must be converted to Google-style
docstrings: replace the one-line/short-form description with a multi-section
docstring that includes at least an "Args:" section documenting provider:
"BaseProvider", num_findings: int, session_id: str, and a "Returns:" section
documenting the ExecutionEngine return type; keep the existing brief description
as the opening line and update other new methods (lines ~249-256, 292-293,
315-317, 324-329, 370-371) to the same Google-style format to match project
standards and type hints.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cicaddy/delegation/verifier.py`:
- Around line 182-185: When asyncio.gather(..., return_exceptions=True) returns
an Exception for a task, the code in verifier.py currently logs and appends the
original finding without populating verification metadata; update the Exception
branch (the block handling "if isinstance(result, Exception)") to set fallback
fields on the finding before appending: set findings[i]['verified'] = False and
set findings[i]['verification_reason'] = str(result) (or a short prefixed string
like f"verification_error: {result}") so verified_findings is always consistent;
keep the existing logger.warning and then append the modified findings[i] to
verified_findings.
- Around line 141-167: The code creates an asyncio.Semaphore with the
caller-provided max_concurrent which can be 0 or negative and cause deadlock;
before creating the semaphore in the verification routine (the function taking
max_concurrent: int = 3 and constructing asyncio.Semaphore(max_concurrent)),
validate or clamp max_concurrent to a positive integer (e.g. raise ValueError if
<= 0 or set max_concurrent = max(1, max_concurrent)) so the semaphore always has
capacity, then proceed to create asyncio.Semaphore(max_concurrent) and continue
initializing the provider as before.

---

Nitpick comments:
In `@src/cicaddy/delegation/verifier.py`:
- Around line 200-206: The docstring for the newly added method
_create_verification_engine (and the other short-form docstrings referenced at
the noted ranges) must be converted to Google-style docstrings: replace the
one-line/short-form description with a multi-section docstring that includes at
least an "Args:" section documenting provider: "BaseProvider", num_findings:
int, session_id: str, and a "Returns:" section documenting the ExecutionEngine
return type; keep the existing brief description as the opening line and update
other new methods (lines ~249-256, 292-293, 315-317, 324-329, 370-371) to the
same Google-style format to match project standards and type hints.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e1c2bf0f-17f3-415d-a54a-5b57f271ac1c

📥 Commits

Reviewing files that changed from the base of the PR and between 8bb5aa3 and f6f33da.

📒 Files selected for processing (7)
  • src/cicaddy/agent/base.py
  • src/cicaddy/config/settings.py
  • src/cicaddy/delegation/orchestrator.py
  • src/cicaddy/delegation/summarizer.py
  • src/cicaddy/delegation/verifier.py
  • tests/unit/test_delegation_summarizer.py
  • tests/unit/test_delegation_verifier.py

Comment thread src/cicaddy/delegation/verifier.py
Comment thread src/cicaddy/delegation/verifier.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/cicaddy/delegation/verifier.py (2)

261-265: 💤 Low value

Add explicit type annotation to satisfy static analysis.

dataclasses.replace() returns a generic type that the type checker doesn't recognize as Finding. Adding an explicit annotation addresses the SonarCloud warning about the return type.

Proposed fix
-        result = dataclasses.replace(finding)
+        result: Finding = dataclasses.replace(finding)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cicaddy/delegation/verifier.py` around lines 261 - 265, The call to
dataclasses.replace(finding) returns a generic type causing static-analysis
warnings; update the assignment to include an explicit Finding type annotation
(e.g., result: Finding = dataclasses.replace(finding)) and add or ensure an
import for the Finding type is present in verifier.py so the type checker
recognizes the result.

344-351: 💤 Low value

Remove redundant json.JSONDecodeError from exception tuple.

json.JSONDecodeError is a subclass of ValueError (since Python 3.5), so it's already caught by ValueError. This matches the SonarCloud hint about redundant exception classes.

Proposed fix
-        except (json.JSONDecodeError, ValueError, TypeError):
+        except (ValueError, TypeError):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cicaddy/delegation/verifier.py` around lines 344 - 351, The except clause
in the block that calls extract_json and json.loads (inside verifier.py, around
the code that assigns raw = extract_json(content) and data = json.loads(raw))
redundantly lists json.JSONDecodeError which is a subclass of ValueError; update
the exception tuple to remove json.JSONDecodeError so it only catches ValueError
and TypeError (i.e., except (ValueError, TypeError):) and keep the existing
return of VerificationResult(status="uncertain",
reasoning=content.strip()[:500]).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/test_line_resolver.py`:
- Line 429: The test assigns the second tuple element to an unused name
`unresolved` when calling resolve_findings([finding], SAMPLE_DIFF); replace the
unused binding with `_` (e.g., change `resolved, unresolved =
resolve_findings(...)` to `resolved, _ = resolve_findings(...)`) in the
occurrences around the resolve_findings calls (including the instances at the
lines that correspond to the other two similar cases) so the unused variable
warning (RUF059) is eliminated.

---

Nitpick comments:
In `@src/cicaddy/delegation/verifier.py`:
- Around line 261-265: The call to dataclasses.replace(finding) returns a
generic type causing static-analysis warnings; update the assignment to include
an explicit Finding type annotation (e.g., result: Finding =
dataclasses.replace(finding)) and add or ensure an import for the Finding type
is present in verifier.py so the type checker recognizes the result.
- Around line 344-351: The except clause in the block that calls extract_json
and json.loads (inside verifier.py, around the code that assigns raw =
extract_json(content) and data = json.loads(raw)) redundantly lists
json.JSONDecodeError which is a subclass of ValueError; update the exception
tuple to remove json.JSONDecodeError so it only catches ValueError and TypeError
(i.e., except (ValueError, TypeError):) and keep the existing return of
VerificationResult(status="uncertain", reasoning=content.strip()[:500]).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d4a190f2-74b4-4fa1-a7ae-411fc0d77a45

📥 Commits

Reviewing files that changed from the base of the PR and between f6f33da and 379b9d6.

📒 Files selected for processing (7)
  • src/cicaddy/delegation/line_resolver.py
  • src/cicaddy/delegation/orchestrator.py
  • src/cicaddy/delegation/summarizer.py
  • src/cicaddy/delegation/verifier.py
  • tests/unit/test_delegation_orchestrator.py
  • tests/unit/test_delegation_verifier.py
  • tests/unit/test_line_resolver.py
💤 Files with no reviewable changes (1)
  • src/cicaddy/delegation/summarizer.py

Comment thread tests/unit/test_line_resolver.py Outdated
@waynesun09 waynesun09 force-pushed the finding-verification branch from 8698255 to f408c63 Compare May 22, 2026 17:40
Add a FindingVerifier that spawns parallel, budget-constrained AI
sub-agents to verify code review findings against actual source code.
Each finding is independently checked using read_file tools, filtering
false positives before inline comment posting.

Key design decisions:
- Single shared AI provider across concurrent verifications
- Sub-linear budget scaling (sqrt-based) per finding
- Tool allowlist restricted to read_file/read_file_lines only
- Boundary-sanitized prompts to prevent injection via finding fields
- Snippet-based line re-resolution from verifier output
- Path-separator guards on all suffix matching to prevent false matches
- Graceful degradation: verification failures preserve original findings

Feature is off by default, enabled via DELEGATION_VERIFY_FINDINGS=true.

Signed-off-by: Wayne Sun <gsun@redhat.com>
@waynesun09 waynesun09 force-pushed the finding-verification branch from 5220206 to a2d65a5 Compare May 22, 2026 18:01
- Guard max_concurrent with max(1, ...) to prevent Semaphore deadlock
- Remove redundant json.JSONDecodeError (subclass of ValueError)
- Add explicit type annotation on dataclasses.replace result
- Replace unused unresolved variables with _ in test_line_resolver

Signed-off-by: Wayne Sun <nicksun98@gmail.com>
Signed-off-by: Wayne Sun <gsun@redhat.com>
- Track snippet_resolved flag to prevent adjusted_line being silently
  discarded when snippet resolution fails (verifier.py)
- Add path-separator guard to _find_target_file suffix matching,
  consistent with _find_hunk_ranges_for_file and is_line_in_diff_ranges
  (line_resolver.py)
- Add path-separator guard to _filter_diff_for_files suffix matching
  (summarizer.py)
- Guard provider.shutdown() to only run after successful initialize(),
  preventing misleading shutdown warnings on init failure (verifier.py)
- Pre-compute diff snippets per file to avoid re-parsing full diff for
  every finding during parallel verification (verifier.py)
- Pass orchestrator max_concurrent through to verifier to respect
  configured API rate limits (orchestrator.py)

Signed-off-by: Wayne Sun <nicksun98@gmail.com>
Signed-off-by: Wayne Sun <gsun@redhat.com>
…fety

- Extract _resolve_one_finding() from resolve_findings in line_resolver.py (S3776)
- Extract _validate_line_mapping() from _apply_line_mappings in summarizer.py (S3776)
- Extract _apply_line_correction() and _extract_adjusted_line() in verifier.py (S3776)
- Use cast(Finding, dataclasses.replace()) to satisfy type checker (S5890)
- Convert async mock functions to sync side_effect pattern (S7503)
- Fix unused variable in test_line_resolver.py (S1481)
- Fix mock parameter signatures to match updated _verify_single

Signed-off-by: Wayne Sun <gsun@redhat.com>
@sonarqubecloud

Copy link
Copy Markdown

@waynesun09 waynesun09 merged commit 4b08b59 into main May 22, 2026
7 of 8 checks passed
@waynesun09 waynesun09 deleted the finding-verification branch May 22, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant