Skip to content

Pdf comment agent#6196

Merged
Frooodle merged 38 commits intomainfrom
pdf-comment-agent
May 1, 2026
Merged

Pdf comment agent#6196
Frooodle merged 38 commits intomainfrom
pdf-comment-agent

Conversation

@ConnorYoh
Copy link
Copy Markdown
Member

@ConnorYoh ConnorYoh commented Apr 23, 2026

PDF Comment Agent

AI tool that annotates a PDF with sticky-note comments based on a user prompt. Prototype-gated.

New endpoints

  • POST /api/v1/ai/tools/pdf-comment-agent — composed AI tool. Takes PDF + prompt. Extracts text chunks, calls the engine for review comments, resolves positions, applies via add-comments. Returns annotated PDF + X-Stirling-Tool-Report header with {annotationsApplied, rationale}.
  • POST /api/v1/ai/tools/math-auditor-agent — relocated from /api/v1/ai/. Pure specialist, returns Verdict JSON.
  • POST /api/v1/misc/add-comments — deterministic Java primitive. Takes PDF + JSON comment specs. Reusable from Automate, scripts, any caller.

Namespace convention: /api/v1/ai/tools/* for dispatchable AI tools (on the orchestrator allowlist), /api/v1/ai/* for non-dispatchable AI endpoints (orchestrate, health — recursion-safe), /api/v1/misc/* for deterministic primitives.

Orchestrator changes

  • New delegate_pdf_review — plans review annotations, either via math-auditor + add-comments (math intent) or via pdf-comment-agent (prose intent).
  • delegate_pdf_question + delegate_pdf_review detect math intent and consult math-auditor via a multi-turn plan: resume_with=<capability> + a new ToolReportArtifact carries the Verdict back to the delegate, which projects it to prose (question) or sticky-note specs (review).
  • Dropped the old math_auditor_agent delegate and the AgentTool name→path registry — agent tools now live under /api/v1/ai/tools/* and dispatch through the standard InternalApiClient allowlist.

Test in the UI

Run the prototypes build:

cd frontend && npm run dev -- --mode prototypes
  • Tool picker → Advanced → Document Review → Add AI comments. Upload a PDF, type a prompt, submit. Review pane shows the annotated PDF with clickable sticky-note icons.

  • Chat panel — orchestrator routes:

    • "Review this PDF and flag unclear sentences" → annotated PDF returned.
    • "Is the math in this document correct?" (with testing/ledger/arithmetic_error.pdf) → prose answer in chat.
    • "Review the math and flag errors as comments" (with testing/ledger/mixed_errors.pdf) → annotated PDF with one sticky note per discrepancy.
  • Verify gating: build in proprietary (default) mode — the tool must NOT appear in the picker and the chat orchestrator shouldn't target /api/v1/ai/tools/pdf-comment-agent

Split the bespoke `/api/v1/ai/pdf-comment-agent` into a reusable primitive
plus a composed AI tool, eliminating the name-to-path dispatch plumbing.

New shared primitives (common):
- `PdfAnnotationService` + `AnnotationLocation` / `StickyNoteSpec` records
  hold the PDFBox sticky-note logic. Reusable from any caller: Automate
  workflows, AI agents, scripts, tests.

New deterministic tool (core):
- `POST /api/v1/misc/add-comments` takes PDF + JSON comment specs and
  returns an annotated PDF. Same contract every other `/api/v1/misc/*`
  tool follows — bit-identical output for identical input.

Endpoint relocation (proprietary):
- Moved PDF Comment Agent from `/api/v1/ai/pdf-comment-agent` to
  `/api/v1/misc/pdf-comment-agent` so it passes the primary dispatch
  allowlist regex without any special plumbing. Controller slimmed to
  delegate annotation placement to the shared `PdfAnnotationService`.

Removed plumbing:
- Deleted `AgentTool` registry + tests.
- Reverted `InternalApiClient`'s secondary allowlist branch.
- Reverted `AiWorkflowService.resolveToolPath` translation.
- Dropped `AgentToolId.PDF_COMMENT_AGENT` from the engine's agent-tool
  registry (math-auditor still uses it pending the Phase B refactor).

Python orchestrator:
- `delegate_pdf_review` replaces `delegate_pdf_comment`. Emits a plan
  step targeting `ToolEndpoint.PDF_COMMENT_AGENT` (the standard tool
  path) instead of a bare agent id. System prompt now distinguishes
  "review" (annotated PDF) from "question" (chat answer).
- `tool_models.py` gains `CommentSpec`, `AddCommentsParams`, and
  `PdfCommentAgentParams` with their `ToolEndpoint` entries.

Frontend:
- Tool op config posts to the new `/api/v1/misc/pdf-comment-agent` URL.

Verification:
- Java: `:common:test`, `:proprietary:test`, `:stirling-pdf:test` for
  affected modules — all green.
- Python: ruff + pyright clean, pytest 10/10.
- Live: direct add-comments, direct pdf-comment-agent, and orchestrator
  dispatch ("Add review comments to this PDF...") all return annotated
  PDFs with meaningful sticky notes.
The math auditor is one specialist that produces a Verdict. How that
Verdict is presented to the user (chat answer vs annotated PDF) is the
meta-agent's job, not the specialist's. Before this commit we had two
top-level math delegates and an ``output_mode`` on the math controller —
presentation concerns leaking all the way down.

- **Specialists** produce structured findings. Stateless. No presentation.
  - ``MathAuditorAgent`` → ``Verdict`` JSON. Always.
  - ``PdfCommentAgent`` → ``PdfCommentResponse``.
- **Meta-agents** own presentation and may delegate to specialists.
  - ``delegate_pdf_question`` — math intent → consult math-auditor →
    render Verdict as prose (``outcome=answer``).
  - ``delegate_pdf_review`` — math intent → consult math-auditor →
    project Verdict to ``CommentSpec``s → ``/api/v1/misc/add-comments``.
  - Non-math review still uses the composed ``pdf-comment-agent`` tool.
- **Orchestrator top-level** stays simple: question / review / edit. No
  math-specific branch.

A delegate can emit an ``EditPlanResponse`` with ``resume_with`` set.
``AiWorkflowService`` runs the plan, captures any structured report (JSON
body or ``X-Stirling-Tool-Report`` header), attaches it as a new
``ToolReportArtifact`` on the next ``WorkflowTurnRequest``, and re-enters
the orchestrator. The delegate's ``_run`` method finds the report in its
artifacts and produces its final output.

- New ``ArtifactKind.TOOL_REPORT`` + ``ToolReportArtifact`` contract.
- ``EditPlanResponse.resume_with`` field (optional
  ``SupportedCapability``).
- New ``SupportedCapability.PDF_REVIEW`` for the review resume path.
- New ``stirling.agents.math_presentation`` module with
  ``is_math_intent``, ``extract_math_verdict``, ``verdict_to_prose``,
  ``verdict_to_comment_specs``, ``verdict_to_add_comments_payload``.
- Orchestrator: dropped ``delegate_math_report`` /
  ``delegate_math_review`` / the old ``math_auditor_agent`` delegate.
  ``_run_pdf_question`` / ``_run_pdf_review`` gained math-intent
  routing with multi-turn resume. ``_resume`` handles ``PDF_REVIEW``.
- ``tool_models.py``: slimmed ``MathAuditorAgentParams`` back to just
  ``tolerance`` — no output mode.
- Deleted ``agent_tool_models.py`` — no longer needed.

- ``PdfContentExtractor``: ``ArtifactKind.TOOL_REPORT`` enum value +
  ``ToolReportArtifact`` record alongside ``ExtractedTextArtifact``.
- ``AiWorkflowService.onPlan``: after executing steps, if the
  orchestrator response had ``resumeWith`` set AND we captured a tool
  report, build a follow-up ``WorkflowTurnRequest`` with the report
  attached as a ``ToolReportArtifact`` and return ``Pending`` — the
  existing outer ``while`` loop re-enters the engine.
- ``MathAuditorAgentController``: reverted to a pure specialist.
  Always returns ``Verdict`` JSON. No ``outputMode``, no comment
  projection, no report header.
- Deleted ``MathAuditorOutputMode`` + ``MathAuditorCommentProjector``
  + its test — presentation is the caller's concern now.
- ``AiWorkflowResponse.report`` (``JsonNode``, optional) from the prior
  round kept — it remains useful as a file+metadata channel for tools
  like ``pdf-comment-agent`` that surface ``X-Stirling-Tool-Report``
  alongside a file body.

- Java ``:common:test :proprietary:test :stirling-pdf:test`` green.
- Python ``ruff`` + ``pyright`` + ``pytest`` (122 tests) green.
- Live end-to-end:
  - "Is the math correct?" → ``outcome=answer`` with a prose Verdict
    summary.
  - "Review the math and leave comments" → ``outcome=completed`` with
    annotated PDF, 3 sticky notes per the Verdict's discrepancies.
  - "Review this PDF and comment on unclear wording" → unchanged,
    uses ``pdf-comment-agent``; ``report`` carries annotation counts
    and the agent's rationale.
Previously the tool was registered in the core tool registry and visible
from every build (proprietary/saas/desktop/core). That's wrong for an
experimental AI tool — it should only ship in the prototypes bundle until
it graduates.

## Mechanism

Follow the existing proprietary-overlay stub/shadow pattern (see
`useProprietaryToolRegistry`), one tier deeper:

- `core/types/prototypeToolId.ts` — empty stub (PROTOTYPE_*_TOOL_IDS = []).
- `prototypes/types/prototypeToolId.ts` — real impl listing "pdfCommentAgent".
- `core/data/usePrototypeToolRegistry.tsx` — empty hook.
- `prototypes/data/usePrototypeToolRegistry.tsx` — real hook returning the
  registry entry (icon, component, i18n keys, operation config, etc.).

`@app/*` alias resolution:
- Proprietary/saas/desktop/core builds → prototype stubs are empty. ToolId
  union excludes "pdfCommentAgent"; tool picker doesn't show the entry;
  the tool component isn't imported at all (tree-shakes out of the bundle).
- Prototypes build (`@app/* → prototypes → proprietary → core`) → real
  stubs win. ToolId includes "pdfCommentAgent"; tool picker shows it;
  bundle includes the component.

## Changes

- Moved (git mv, history preserved):
  - `core/tools/PdfCommentAgent.tsx` → `prototypes/tools/PdfCommentAgent.tsx`
  - `core/hooks/tools/pdfCommentAgent/*` → `prototypes/hooks/tools/pdfCommentAgent/*`
- New files:
  - `core/types/prototypeToolId.ts` (stub)
  - `core/data/usePrototypeToolRegistry.tsx` (stub)
  - `prototypes/types/prototypeToolId.ts` (real)
  - `prototypes/data/usePrototypeToolRegistry.tsx` (real)
- Modified:
  - `core/types/toolId.ts` — import PROTOTYPE_* unions, add to
    REGULAR/SUPER/LINK_TOOL_IDS, remove "pdfCommentAgent" from
    CORE_REGULAR_TOOL_IDS.
  - `core/data/toolsTaxonomy.ts` — new `PrototypeToolRegistry` type.
  - `core/data/useTranslatedToolRegistry.tsx` — drop direct imports and
    registry entry for pdfCommentAgent; spread `usePrototypeToolRegistry()`
    alongside the proprietary overlay.
  - `proprietary/utils/creditCosts.ts` — loosen `TOOL_CREDIT_COSTS` type to
    `Partial<Record<ToolId, number>>` so overlays can contribute build-only
    ids without every overlay needing to know the others'. Fallback to
    MEDIUM already handled in `getToolCreditCost`.

## Verification

- `npx tsc -p tsconfig.json --noEmit` (proprietary/default build):
  zero errors mentioning pdfCommentAgent or our new files. The 13
  remaining errors are all pre-existing test-file Axios/Tauri typing
  issues unrelated to this branch.
- `npx tsc -p src/prototypes/tsconfig.json --noEmit` (prototypes build):
  one pre-existing error in `ChatContext.tsx:284`; `"pdfCommentAgent"`
  is in the `ToolId` union as expected.
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines ignoring generated files. enhancement New feature or request labels Apr 23, 2026
@stirlingbot stirlingbot Bot added Java Pull requests that update Java code Front End Issues or pull requests related to front-end development Back End Issues related to back-end development Translation API API-related issues or pull requests Test Testing-related issues or pull requests and removed enhancement New feature or request labels Apr 23, 2026
Comment thread frontend/src/core/hooks/useIndexedDBThumbnail.ts Outdated
ConnorYoh and others added 10 commits April 23, 2026 16:56
Previous round put agent tools under /api/v1/misc/ as a plumbing workaround
— InternalApiClient's dispatch allowlist only permitted a fixed set of tool
namespaces, and /api/v1/ai/ wasn't one of them. Putting AI tools under misc
is semantically wrong.

Carve out /api/v1/ai/tools/ as the explicit sub-namespace for dispatchable
AI tools and extend the allowlist to include it. /api/v1/ai/orchestrate
and /api/v1/ai/health remain off-limits to internal dispatch so a plan
step can't re-enter the orchestrator (recursion risk).

Net namespaces:
- /api/v1/ai/*       AI-facing endpoints (orchestrate, health, ...)
- /api/v1/ai/tools/* Dispatchable AI tools
- /api/v1/misc/*     Deterministic primitives

Changes:
- MathAuditorAgentController @RequestMapping → "/api/v1/ai/tools"
- PdfCommentAgentController @RequestMapping → "/api/v1/ai/tools"
- InternalApiClient ALLOWED_ENDPOINT_PATH regex adds
  "|^/api/v1/ai/tools(/[A-Za-z0-9_-]+)+$" alternation.
- Python ToolEndpoint: MATH_AUDITOR_AGENT + PDF_COMMENT_AGENT values
  updated accordingly.
- Frontend PDF_COMMENT_AGENT_ENDPOINT constant updated.
- Java + Python tests updated; new InternalApiClientTest case asserting
  /api/v1/ai/tools/* is accepted while /api/v1/ai/orchestrate is still
  rejected.

Also in this commit — unrelated collateral fix:
- AddCommentsController + its test updated to match WebResponseUtils.
  pdfFileToWebResponse's new return type (ResponseEntity<Resource>),
  which changed in upstream PR #6160 "Migrate stream to resource for
  stability" (merged via the recent main merge).

Live verification:
- POST /api/v1/ai/tools/pdf-comment-agent — 200 + annotated PDF + report header.
- POST /api/v1/ai/tools/math-auditor-agent — 200 + Verdict JSON.
- Orchestrator "review this PDF" → outcome=completed with annotated PDF.
- Orchestrator "review the math" → outcome=completed with sticky-note PDF.
- Orchestrator "is the math correct?" → outcome=answer with prose summary.
Math-auditor review previously stacked all sticky notes in the right
margin because the Python projector emitted fixed coordinates. Now each
CommentSpec can supply an anchorText hint; the add-comments tool locates
the first matching line on the target page and anchors the icon there,
falling back to the caller's coordinates when no match is found.

The projection helper passes the discrepancy's stated value (or context
when stated is empty) as the anchor, so sticky notes land on the line
that was actually flagged.
- Remove engine/src/trace* log dumps that got committed and gitignore
  trace/trace.* so they stop re-appearing.
- Restore frontend/src/core/hooks/useIndexedDBThumbnail.ts to main —
  the thumbnail-regeneration fix from #6134 was partially reverted here
  by a bad merge; it's unrelated to this PR.
- MathAuditorAgentController javadoc said the endpoint lives under
  /api/v1/misc/; it was moved to /api/v1/ai/tools/ in c87f24f.
- AiToolInputValidator: new shared validator enforcing the PDF content
  type and a 50 MB upper bound on uploads. Spring's multipart limit is
  tuned for the regular tools (2 GB), which is far too permissive once
  a payload feeds into an LLM token budget. Both PdfCommentAgent and
  MathAuditorAgent now route uploads through it, replacing the
  duplicated inline content-type checks.
- PdfAnnotationService: drop specs with non-positive width/height
  (previously produced invisible/zero-size annotations) and with text
  over 100 000 chars (pathological / accidental document-dump). Matches
  the existing "skip and log, never throw for one bad spec" convention
  so every caller benefits, not just AI tools.
- AiEngineClient: surface network IOException from httpClient.send as
  ResponseStatusException — HttpTimeoutException becomes 504 GATEWAY_TIMEOUT,
  everything else becomes 503 SERVICE_UNAVAILABLE with the original cause
  chained. Previously these leaked out as raw IOException and every
  caller (PdfCommentAgent, MathAuditorAgent, AiWorkflowService,
  AiEngineController) bubbled them up as an unstructured 500. Adds a
  package-private constructor that accepts an HttpClient for testing.
- pdf_comment agent: JSON-encode request.user_message when building the
  prompt rather than inlining it between literal triple quotes. The old
  format was trivially breakable by a user message containing `"""`
  followed by fake chunk records, which would spoof structure. Chunks
  were already JSON-encoded; this aligns the user-prompt path.
- New AiToolResponseHeaders class holds the X-Stirling-Tool-Report
  constant; both PdfCommentAgentController (producer) and
  AiWorkflowService (consumer) now reference it, dropping the
  self-acknowledged "must stay in sync" duplication.
- pdfCommentAgentOperationConfig.ts now sanitises the filename pulled
  from the Content-Disposition header, rejecting blank/whitespace
  values and anything containing path separators before returning it.
  Hostile or buggy backend can no longer steer the browser save into
  a parent directory; falls back to <inputName>-commented.pdf.
- math_presentation: add tests for extract_math_verdict — the resume-turn
  entry point — covering happy-path round-trip from ToolReportArtifact
  back to Verdict, empty artifacts (first turn), non-math reports being
  ignored, and graceful degradation on malformed reports. Also cover
  verdict_to_prose for the pdf_question math path (clean verdict,
  errors + warnings rendering).
- pdf-comment-agent routes: add test that an agent-layer exception
  surfaces as HTTP 500, so Java's AiEngineClient can map it to 502
  instead of mis-applying a spurious success.
- AddCommentsController: cover the empty-array case — callers with zero
  comments to place get a 200 + unmodified PDF instead of having to
  special-case the no-op.
@stirlingbot stirlingbot Bot added the engine label Apr 27, 2026
…nguage

Addresses PR comment #7. The math auditor is the one finding issues;
turning the structured Verdict into prose or sticky-note text is the
consumer's job, and the consumer should answer in the same language the
user asked in. No English glue should leak from the engine.

- math_presentation.py: dropped verdict_to_prose,
  verdict_to_add_comments_payload, verdict_to_comment_specs,
  _comment_body, _severity_label, _comment_subject,
  _discrepancy_one_liner, _anchor_text_for. Module is now just the
  language-agnostic helpers (is_math_intent, extract_math_verdict).
- PdfQuestionAgent: small math-synth LLM (fast model) takes the
  Verdict + the user's original question and answers in the same
  language. Drops verdict_to_prose entirely.
- PdfReviewAgent: localiser LLM produces sticky-note subject + text
  per Discrepancy, in the user's language. Python combines that with
  deterministic placement geometry (page, x/y stacking, anchor_text)
  to build the CommentSpec list. No English in the response shape.
- EditPlan summaries on first turn / pre-tool turn are emptied so no
  English UI hint leaks; frontend handles its own loading state.
- Tests: math_presentation tests slimmed to the helpers that remain;
  the placement-geometry and camelCase-serialisation tests moved to
  the new test_pdf_review.py with the localiser LLM mocked.
The is_math_intent regex was English-only and over-eager — "review the
invoices for ambiguous wording" matched on "invoices" and routed prose
review through the math auditor. Both problems disappear when the
orchestrator's existing top-level LLM (already classifying the request
in any language) emits the math-intent decision alongside the routing.

- Orchestrator: delegate_pdf_question and delegate_pdf_review now take
  consult_math_auditor: bool, threaded through to _run_pdf_question and
  _run_pdf_review (with a False default so the resume-turn fast-path
  still works). System prompt updated with a language-neutral rule for
  the flag.
- PdfQuestionAgent.orchestrate / PdfReviewAgent.orchestrate take the
  flag and use it on the first turn. Resume turns are still detected
  by Verdict-artifact presence and are unaffected.
- math_presentation.py: dropped is_math_intent and _MATH_KEYWORDS.
  Module is now just extract_math_verdict.
- Both consumer system prompts now require verbatim quoting of any
  stated/expected numeric values from the Verdict — guards against the
  LLM paraphrasing $215,000 as "about $215k".
- Tests: dropped the keyword test; added flag-driven first-turn tests
  for both delegates and prompt-pinning tests for the verbatim rule.
…riant

Closes the loop on PR comment #9. Previously the question delegate's
return type leaked the union PdfQuestionResponse | EditPlanResponse
because the math-intent first turn needed to ask the caller to run a
side-quest. J's suggestion was to embed the plan as a nullable member
of the question response — a single, clean type with optional pre-work
attached.

- contracts/pdf_questions.py: PdfQuestionAnswerResponse gains
  edit_plan: EditPlanResponse | None. Outcome stays ANSWER on this
  turn; answer is empty string until the resume turn produces it.
- agents/pdf_questions.py: orchestrate() returns
  PdfQuestionAnswerResponse(answer="", edit_plan=...) on the math
  first turn instead of bare EditPlanResponse. Return type collapses
  to PdfQuestionResponse.
- agents/orchestrator.py: delegate_pdf_question and _run_pdf_question
  return type drop the | EditPlanResponse — single signature now.
- Java side:
  - New AiWorkflowEditPlan POJO mirrors the engine's EditPlanResponse
    shape (summary, rationale, steps, resumeWith).
  - AiWorkflowResponse gains a nested editPlan field.
  - AiWorkflowService.advance() splits ANSWER into its own branch and
    dispatches to onAnswer, which checks editPlan and runs the embedded
    plan if present. onPlan is refactored to delegate to a runPlan
    helper that takes the plan fields directly so both call sites share
    the iteration + resume logic.
- Tests updated to assert on PdfQuestionAnswerResponse with
  edit_plan set rather than a bare EditPlanResponse return.
Reverts the orchestrator-flag approach. The orchestrator no longer knows
what specialists exist; the question and review delegates each ask their
own tiny classifier LLM whether the prompt needs the math auditor.
Decision stays language-agnostic (LLM, not regex) and the classifier code
is shared so both delegates use the same prompt and shape.

- math_presentation.py: new MathIntentClassifier — small LLM (fast model)
  with a structured bool output. Reused by both consumers.
- pdf_questions.py / pdf_review.py: drop the consult_math_auditor
  parameter from orchestrate(); each instantiates the classifier in
  __init__ and calls it on the first turn (resume turns short-circuit
  via the Verdict-artifact check, so no LLM call there).
- orchestrator.py: delegate_pdf_question / delegate_pdf_review and the
  matching _run_* methods drop the flag parameter. System prompt drops
  the math-routing clause (it's no longer the orchestrator's concern).
- Tests updated to mock the classifier where the flag used to be passed.
  New assertions confirm resume turns don't call the classifier.

Cost: one extra small-model LLM call per first turn of Q&A or Review.
Acceptable given cleaner separation — the orchestrator stays a pure
router and each consumer owns its own intent decision.
…heck

ChatContext stamps ToolOperation.toolId="ai-workflow" on files coming
back from the AI orchestrator (introduced in #6116). The ToolId union
didn't include it, so the assignment was a latent type error — never
caught because typecheck:prototypes wasn't in typecheck:all until the
recent +1 review-point fix added it.

Add "ai-workflow" to PROTOTYPE_SUPER_TOOL_IDS so it satisfies ToolId.
Keeps it out of any other build's union (proprietary/core/saas/desktop)
and out of the prototype tool registry's user-facing entries — it's a
generic marker for orchestrator-produced files, not a pickable tool.

Follow-up worth doing later: surface the terminal tool's endpoint path
on AiWorkflowResponse so we can record the actual tool that produced
the file rather than this generic marker.
Comment thread engine/scripts/generate_tool_models.py
Comment thread engine/src/stirling/agents/pdf_comment/agent.py Outdated
Comment thread engine/src/stirling/agents/math_presentation.py Outdated
Comment thread engine/src/stirling/agents/orchestrator.py Outdated
Comment thread engine/src/stirling/agents/orchestrator.py
Comment thread engine/src/stirling/agents/pdf_questions.py
Comment thread engine/src/stirling/api/app.py
Comment thread engine/src/stirling/contracts/common.py Outdated
Comment thread engine/tests/agents/test_math_presentation.py
ConnorYoh and others added 7 commits April 28, 2026 16:20
PR comment from James — test directories aren't packages and shouldn't
need __init__.py files. Removed the empty ones; switched pytest to the
modern importlib import mode so duplicate basenames (e.g. test_routes.py
in two different test folders) still collect cleanly without needing
package markers. Added ``tests`` to pythonpath so shared conftest helpers
keep importing.
PR comment from James: the "If the user is asking a question about the PDF
contents (and does NOT want comments written onto the document) use
delegate_pdf_question instead" sentence read like an over-correction to
something I'd hit in development and added too much weight. The prior
sentence already covers when to use delegate_pdf_question.
PR comment from James — the bare ``except Exception`` with a noqa was
covering more than necessary. ``model_validate`` only raises
pydantic.ValidationError on a malformed payload, so catch that and let
anything else propagate.
PR comment from James — Pretty handles model dumping internally and
lazily, so calling ``response.model_dump()`` upfront does the work even
when the log level is above DEBUG.
PR comment from James — only some of the subagents had been slimmed in
the previous round, which read inconsistently. Push the same pattern
through PdfEditAgent and UserSpecAgent: each gets an
``orchestrate(OrchestratorRequest)`` adapter that builds the typed
request the inner pipeline expects, so the orchestrator's _run_* methods
collapse to one-liners. ``handle()`` / ``draft()`` keep their typed
request signatures so the direct API routes don't have to change.

Will fold these into whatever shared agent interface lands in #6163 once
that's settled.
PR comment from James — the previous ``report: dict[str, Any]`` meant any
JSON could pass through as a tool report and the meta-agent would only
discover problems when it tried to read it. Pin the shape:

- ToolReportArtifact is now ``MathAuditorToolReportArtifact`` with
  ``report: Verdict``. Pydantic validates the payload on receipt — a
  malformed Verdict is rejected before extract_math_verdict ever sees it,
  rather than being silently swallowed.
- ToolReportArtifact stays as a type alias so callers don't have to know
  there's only one variant today; lifts into a discriminated union when
  another consumer-side report appears.
- extract_math_verdict shrinks accordingly — no more model_validate,
  no try/except, just an isinstance check on the typed artifact.
- PdfCommentReport added in contracts/pdf_comments.py for the report the
  pdf-comment-agent surfaces alongside the annotated PDF (annotations
  applied, instructions received, rationale). Lands as the top-level
  AiWorkflowResponse.report on COMPLETED — no resume, so it never re-
  enters the orchestrator as an artifact.
- Tests updated: construct artifacts with the typed model directly;
  the old "graceful degradation on malformed JSON" test becomes a
  "validation rejects malformed payload at receipt" test.
jbrunton96
jbrunton96 previously approved these changes Apr 30, 2026
Comment thread engine/src/stirling/contracts/common.py
Co-authored-by: James Brunton <jbrunton96@gmail.com>
@ConnorYoh ConnorYoh added this pull request to the merge queue Apr 30, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Apr 30, 2026
@jbrunton96 jbrunton96 enabled auto-merge May 1, 2026 08:22
@jbrunton96 jbrunton96 disabled auto-merge May 1, 2026 08:23
@jbrunton96 jbrunton96 enabled auto-merge May 1, 2026 08:24
@Frooodle Frooodle disabled auto-merge May 1, 2026 09:19
@Frooodle Frooodle merged commit 86774d5 into main May 1, 2026
33 of 35 checks passed
@Frooodle Frooodle deleted the pdf-comment-agent branch May 1, 2026 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API API-related issues or pull requests Back End Issues related to back-end development engine Front End Issues or pull requests related to front-end development Java Pull requests that update Java code size:XXL This PR changes 1000+ lines ignoring generated files. Test Testing-related issues or pull requests Translation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants