feat(ai): add Contradiction Agent by ConnorYoh · Pull Request #6304 · Stirling-Tools/Stirling-PDF

ConnorYoh · 2026-05-01T17:19:13Z

Summary

A new AI specialist agent that finds textual contradictions across a PDF — arguments, claimed facts, points of view, recommendations — and is invoked as a tool by the existing Review and Question agents using the same two-turn handshake the Math Auditor uses.

The Math Auditor catches numeric inconsistencies; nothing today catches textual ones (e.g. p.2 "the deadline is March 5" vs p.7 "submissions close on April 1"). This closes that gap.

How it works

Two-round flow mirroring the math agent: examine() triages which pages need text/OCR, deliberate() does the work.
Per-page parallel claim extraction under a semaphore (cap 10), then ONE fast-model LLM call canonicalises subjects across the document.
Bucketed detection: claims grouped by canonical subject, with one batched detector LLM call per bucket (cap 5). Buckets larger than 12 claims are chunked with a 2-claim overlap so no claim is silently dropped.
Pre-filter heuristics before the detector: drop identical-quote pairs and same-page same-polarity paraphrases.
Review surface: each Contradiction yields TWO sticky-note CommentSpecs cross-referencing each other across pages.
Question surface: synthesises a prose answer quoting both conflicting passages verbatim.

Architect findings addressed

#	Status
C1 `source_tool` == endpoint path	✅ locked down by `AiWorkflowServiceContradictionTest`
C2 discriminated union round-trip	✅ `test_artifact_union.py`
C3/C5 two semaphores, batched per-bucket	✅ extract=10, detect=5
C4 chunked detection, no silent drops	✅ `test_concurrency.py::test_worst_case_50_claim_bucket_finds_cross_chunk_pair`
C6 subject canonicalisation default-on	✅ with lexical fallback on LLM failure
C7 combined math+contradiction intent	⚠️ v1 limitation — math is dropped; pinned by `test_combined_intent.py`
C8 shared `_throttled`	✅ math agent migrated to `agents/_concurrency.py`
C11 separate `_PairedLocalisedContradiction`	✅ `_LocalisedComment` untouched

Hardening

N4 prompt-injection — every synth/localiser prompt now wraps verdict JSON and user message in <verdict> / <user_message> tags with an explicit untrusted-data preamble. Applied to math and contradiction paths.
N5 Java ClaimPolarity enum mirrors the Python Literal["assert","deny","recommend","reject","neutral"]. Unknown values fail early instead of drifting silently.
N6 pages_examined semantics now reports only pages whose claims were actually checked; blank folios are excluded.

Limitations (documented)

Combined math + contradiction intent on a single prompt drops math silently. Documented in module docstrings of pdf_review.py / pdf_questions.py and pinned by test_combined_intent.py. Revisit when there's real-corpus data on combined-prompt frequency.
Cross-bucket pairs farther apart than the chunk overlap (>10 indices) are not detected. Documented in test_concurrency.py.

Test plan

pytest engine/tests/ — 205/205 pass
./gradlew :proprietary:test — green, coverage targets met
Math-auditor regression suite passes unchanged
Discriminated-union round-trip covers math, contradiction, mixed, and source_tool-omitted payloads
Worst-case 50-claim bucket — cross-chunk pair detected via overlap
Concurrency assertions: extract saturates at exactly 10, detect at exactly 5
Java orchestrator never calls extractTablesAsCsv (verified)

A new specialist agent that detects textual contradictions across a PDF — arguments, claimed facts, points of view, recommendations — and is invoked as a tool by the existing Review and Question agents using the same two-turn handshake the Math Auditor uses. Why - Math Auditor catches numeric inconsistencies; nothing today catches textual ones (e.g. p.2 "the deadline is March 5" vs p.7 "submissions close on April 1"). This closes that gap. How it works - Two-round flow: examine() triages which pages need text/OCR, then deliberate() extracts atomic claims per page in parallel under a semaphore (cap 10), canonicalises subjects via one fast-model LLM call, buckets claims by subject, and runs one batched detector LLM call per bucket (cap 5) to enumerate contradicting pairs. Buckets larger than 12 claims are chunked with overlap so no claim is silently dropped. - Review surface: each Contradiction yields TWO sticky-note CommentSpecs cross-referencing each other across pages. - Question surface: synthesises a prose answer that quotes both conflicting passages verbatim. Architect findings addressed (see plan Section 6) - C1 source_tool == endpoint path (locked down by test). - C2 ToolReportArtifact lifted to discriminated union on source_tool. - C3/C5 two semaphores (extract=10, detect=5); per-bucket batched calls. - C4 chunked detection with overlap; no silent claim drops. - C6 subject canonicalisation default-on with lexical fallback. - C7 v1 limitation: combined math+contradiction intent drops math silently (precedence test pins this in test_combined_intent.py). - C8 _throttled extracted into agents/_concurrency.py and the math agent migrated off its private copy. - C11 separate _PairedLocalisedContradiction model; _LocalisedComment unchanged. Hardening - N4 prompt-injection: untrusted-data preamble + <user_message>/<verdict> delimiters on every synth/localiser prompt (math + contradiction). - N5 Java ClaimPolarity enum mirrors the Python Literal. - N6 pages_examined now reports only pages whose claims were actually checked; blank folios are excluded. Tests - Python 205/205 pass (claim ledger, agent flow, routes, artifact union, review/question resume, concurrency saturation, combined intent precedence). - Java proprietary suite green; coverage targets met.

aikido-pr-checks · 2026-05-01T17:20:12Z

+            "[contradiction-agent] session=%s step 2: extracting claims from %d pages (parallel, max=%d)",
+            evidence.session_id,
+            len(folios_with_text),
+            self._extract_semaphore._value,  # advisory — initial value


Reading asyncio.Semaphore._value (self._extract_semaphore._value) accesses a semaphore's private mutable internals concurrently; avoid reading private attributes or use a thread-safe API (e.g., track capacity separately or omit the volatile value).

Details

✨ AI Reasoning
A singleton ContradictionAgent is created at startup and used concurrently by incoming requests. The code reads self._extract_semaphore._value (a private attribute of asyncio.Semaphore) for logging while other coroutines can be acquiring/releasing the same semaphore, causing a race / inconsistent diagnostic and relying on a private, internal field.

🔧 How do I fix it?
Use locks, concurrent collections, or atomic operations when accessing shared mutable state. Avoid modifying collections during iteration. Use proper synchronization primitives like mutex, lock, or thread-safe data structures.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

aikido-pr-checks · 2026-05-01T17:20:13Z

+    // -----------------------------------------------------------------------
+
+    @SafeVarargs
+    private static List<Integer> union(List<Integer>... lists) {


union() does result.contains(page) inside nested loops causing O(n^2) work; use a Set to dedupe (or collect to a Set then sort) to make it linear-time.

Details

✨ AI Reasoning
The union method builds a deduplicated list by iterating each input list and calling result.contains(page) for each element. This causes quadratic work as result grows. For page lists (document pages) the size can be large enough that O(n^2) allocations and repeated scans are avoidable; using a hashed Set for membership or leveraging a single pass collection-to-Set would make it linear-time and reduce allocations. The defect is localized to the deduplication loop and its membership check, which is executed during requisition fulfilment and could be invoked on many pages per audit.

🔧 How do I fix it?
Move constant work outside loops. Use StringBuilder instead of string concatenation in loops. Cache compiled regex patterns. Use hash-based lookups instead of nested loops. Batch database operations instead of N+1 queries.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

stirlingbot · 2026-05-01T17:29:01Z

🚀 V2 Auto-Deployment Complete!

Your V2 PR with embedded architecture has been deployed!

🔗 Direct Test URL (non-SSL) http://54.175.155.236:6304

🔐 Secure HTTPS URL: https://6304.ssl.stirlingpdf.cloud

This deployment will be automatically cleaned up when the PR is closed.

🔄 Auto-deployed for approved V2 contributors.

ConnorYoh requested review from Frooodle, Ludy87 and jbrunton96 as code owners May 1, 2026 17:19

dosubot Bot added size:XXL This PR changes 1000+ lines ignoring generated files. enhancement New feature or request labels May 1, 2026

stirlingbot Bot added Java Pull requests that update Java code Test Testing-related issues or pull requests engine labels May 1, 2026

ConnorYoh marked this pull request as draft May 1, 2026 17:19

aikido-pr-checks Bot reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): add Contradiction Agent#6304

feat(ai): add Contradiction Agent#6304
ConnorYoh wants to merge 1 commit intomainfrom
feat/contradiction-agent

ConnorYoh commented May 1, 2026

Uh oh!

aikido-pr-checks Bot May 1, 2026

Uh oh!

aikido-pr-checks Bot May 1, 2026

Uh oh!

stirlingbot Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ConnorYoh commented May 1, 2026

Summary

How it works

Architect findings addressed

Hardening

Limitations (documented)

Test plan

Uh oh!

aikido-pr-checks Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

aikido-pr-checks Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

stirlingbot Bot commented May 1, 2026

🚀 V2 Auto-Deployment Complete!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant