Conversation
A new specialist agent that detects textual contradictions across a PDF — arguments, claimed facts, points of view, recommendations — and is invoked as a tool by the existing Review and Question agents using the same two-turn handshake the Math Auditor uses. Why - Math Auditor catches numeric inconsistencies; nothing today catches textual ones (e.g. p.2 "the deadline is March 5" vs p.7 "submissions close on April 1"). This closes that gap. How it works - Two-round flow: examine() triages which pages need text/OCR, then deliberate() extracts atomic claims per page in parallel under a semaphore (cap 10), canonicalises subjects via one fast-model LLM call, buckets claims by subject, and runs one batched detector LLM call per bucket (cap 5) to enumerate contradicting pairs. Buckets larger than 12 claims are chunked with overlap so no claim is silently dropped. - Review surface: each Contradiction yields TWO sticky-note CommentSpecs cross-referencing each other across pages. - Question surface: synthesises a prose answer that quotes both conflicting passages verbatim. Architect findings addressed (see plan Section 6) - C1 source_tool == endpoint path (locked down by test). - C2 ToolReportArtifact lifted to discriminated union on source_tool. - C3/C5 two semaphores (extract=10, detect=5); per-bucket batched calls. - C4 chunked detection with overlap; no silent claim drops. - C6 subject canonicalisation default-on with lexical fallback. - C7 v1 limitation: combined math+contradiction intent drops math silently (precedence test pins this in test_combined_intent.py). - C8 _throttled extracted into agents/_concurrency.py and the math agent migrated off its private copy. - C11 separate _PairedLocalisedContradiction model; _LocalisedComment unchanged. Hardening - N4 prompt-injection: untrusted-data preamble + <user_message>/<verdict> delimiters on every synth/localiser prompt (math + contradiction). - N5 Java ClaimPolarity enum mirrors the Python Literal. - N6 pages_examined now reports only pages whose claims were actually checked; blank folios are excluded. Tests - Python 205/205 pass (claim ledger, agent flow, routes, artifact union, review/question resume, concurrency saturation, combined intent precedence). - Java proprietary suite green; coverage targets met.
| "[contradiction-agent] session=%s step 2: extracting claims from %d pages (parallel, max=%d)", | ||
| evidence.session_id, | ||
| len(folios_with_text), | ||
| self._extract_semaphore._value, # advisory — initial value |
There was a problem hiding this comment.
Reading asyncio.Semaphore._value (self._extract_semaphore._value) accesses a semaphore's private mutable internals concurrently; avoid reading private attributes or use a thread-safe API (e.g., track capacity separately or omit the volatile value).
Details
✨ AI Reasoning
A singleton ContradictionAgent is created at startup and used concurrently by incoming requests. The code reads self._extract_semaphore._value (a private attribute of asyncio.Semaphore) for logging while other coroutines can be acquiring/releasing the same semaphore, causing a race / inconsistent diagnostic and relying on a private, internal field.
🔧 How do I fix it?
Use locks, concurrent collections, or atomic operations when accessing shared mutable state. Avoid modifying collections during iteration. Use proper synchronization primitives like mutex, lock, or thread-safe data structures.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
| // ----------------------------------------------------------------------- | ||
|
|
||
| @SafeVarargs | ||
| private static List<Integer> union(List<Integer>... lists) { |
There was a problem hiding this comment.
union() does result.contains(page) inside nested loops causing O(n^2) work; use a Set to dedupe (or collect to a Set then sort) to make it linear-time.
Details
✨ AI Reasoning
The union method builds a deduplicated list by iterating each input list and calling result.contains(page) for each element. This causes quadratic work as result grows. For page lists (document pages) the size can be large enough that O(n^2) allocations and repeated scans are avoidable; using a hashed Set for membership or leveraging a single pass collection-to-Set would make it linear-time and reduce allocations. The defect is localized to the deduplication loop and its membership check, which is executed during requisition fulfilment and could be invoked on many pages per audit.
🔧 How do I fix it?
Move constant work outside loops. Use StringBuilder instead of string concatenation in loops. Cache compiled regex patterns. Use hash-based lookups instead of nested loops. Batch database operations instead of N+1 queries.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
🚀 V2 Auto-Deployment Complete!Your V2 PR with embedded architecture has been deployed! 🔗 Direct Test URL (non-SSL) http://54.175.155.236:6304 🔐 Secure HTTPS URL: https://6304.ssl.stirlingpdf.cloud This deployment will be automatically cleaned up when the PR is closed. 🔄 Auto-deployed for approved V2 contributors. |
Summary
A new AI specialist agent that finds textual contradictions across a PDF — arguments, claimed facts, points of view, recommendations — and is invoked as a tool by the existing Review and Question agents using the same two-turn handshake the Math Auditor uses.
The Math Auditor catches numeric inconsistencies; nothing today catches textual ones (e.g. p.2 "the deadline is March 5" vs p.7 "submissions close on April 1"). This closes that gap.
How it works
examine()triages which pages need text/OCR,deliberate()does the work.Contradictionyields TWO sticky-noteCommentSpecs cross-referencing each other across pages.Architect findings addressed
source_tool== endpoint pathAiWorkflowServiceContradictionTesttest_artifact_union.pytest_concurrency.py::test_worst_case_50_claim_bucket_finds_cross_chunk_pairtest_combined_intent.py_throttledagents/_concurrency.py_PairedLocalisedContradiction_LocalisedCommentuntouchedHardening
<verdict>/<user_message>tags with an explicit untrusted-data preamble. Applied to math and contradiction paths.ClaimPolarityenum mirrors the PythonLiteral["assert","deny","recommend","reject","neutral"]. Unknown values fail early instead of drifting silently.pages_examinedsemantics now reports only pages whose claims were actually checked; blank folios are excluded.Limitations (documented)
pdf_review.py/pdf_questions.pyand pinned bytest_combined_intent.py. Revisit when there's real-corpus data on combined-prompt frequency.test_concurrency.py.Test plan
pytest engine/tests/— 205/205 pass./gradlew :proprietary:test— green, coverage targets metsource_tool-omitted payloadsextractTablesAsCsv(verified)