fix: robust LLM JSON parsing + sanitize report sections + chat history dedup (closes #624 #622 #601 #599 #577) by rqd6f4g6zn-bit · Pull Request #626 · 666ghj/MiroFish

rqd6f4g6zn-bit · 2026-05-17T06:23:27Z

Fix 4 open bugs: robust LLM JSON parsing + sanitize report sections + chat history dedup

Summary

This PR addresses four open bugs that all stem from how the backend handles slightly-non-conforming LLM outputs (trailing text, raw tool_call JSON, duplicate user messages). Net change: backend is more defensive, no behavioral regressions on conforming outputs.

Issues fixed

#624 / #622 / #601 — `Unexpected non-whitespace character after JSON at position N` + 500 on `/api/graph/ontology/generate`

Root cause: chat_json() in backend/app/utils/llm_client.py runs strict json.loads() after stripping Markdown fences. Many LLMs (qwen-plus, gemma, ollama-served models) append trailing prose after the JSON block even with response_format=json_object — strict parsing fails with the reported error.

Fix: Extracted parsing into _parse_llm_json() with a 5-stage strategy:

Strip Markdown fences (existing)
Strict json.loads (fast path)
json.JSONDecoder().raw_decode() — parses JSON prefix, ignores trailing text (logs a warning)
Balanced-brace extraction (handles leading prose + embedded JSON)
Strip control chars + retry

Falls through to a ValueError with a clear preview if everything fails. Logs visibility into which strategy succeeded so we can monitor which LLMs misbehave.

Tests: 8 unit-test cases covering clean JSON, fenced JSON, trailing text, fences+trailing, leading prose, nested-objects+trailing, control chars, and the exact failure mode at position 10243 — all pass.

#599 — Section content is raw unexecuted `tool_call` JSON

Root cause: In backend/app/services/report_agent.py::write_section(), three code paths can return a final_answer that consists entirely of an unexecuted tool_call JSON (e.g. {"name":"quick_search","parameters":{...}}):

Final Answer path (line ~1392) — when LLM prefixes raw JSON with Final Answer:
"No prefix" fallback (line ~1491) — when LLM has used enough tools but emits a tool_call instead of prose
Force-finalize path (line ~1519) — when max iterations hit and LLM still emits a tool_call

All three paths now go through new _sanitize_section_content() which detects the leak pattern (full content parses as JSON and name matches VALID_TOOL_NAMES) and replaces with a clear fallback message instead of leaking the raw artifact.

#577 — Report-Agent chat repeats first answer regardless of follow-up question

Likely root cause: Frontend can (under some flows) include the just-sent user message inside the chat_history array. Backend then appends the same message again at the end → LLM sees duplicate-trailing user message and ignores the latest one, returning the prior answer style.

Fix: In report_agent.py::chat(), defensively:

Only keep {role, content} from history items (defensive against extra fields)
Skip entries that match the current message
Added debug log of constructed messages array for future diagnostics

This is defensive — even if the frontend filters correctly, this prevents regressions from future call sites.

Files changed

backend/app/utils/llm_client.py       +112 -10  (new _parse_llm_json with 5 strategies)
backend/app/services/report_agent.py   +50  -10  (new _sanitize_section_content + 3 call sites; chat() history dedup)

Verification

All unit tests for _parse_llm_json pass (8/8 cases covering each strategy)
Backend starts cleanly: uv run python run.py → all routes return correct validation errors on empty bodies (400/404, no more 500s on validation failures)
No new dependencies
No changes to public API contracts

Notes for maintainers

The JSON parser now emits warnings via logger when fallback strategies are used. Monitor these — high frequency from a specific model suggests adjusting that model's system prompt.
_sanitize_section_content() returns a German-language fallback message ("_(Hinweis: Für diesen Abschnitt..."). If the project wants Chinese/English, localize via the i18n system in app/utils/locale.py.
For Bug for the last step among the five steps! #577 a fuller fix would be on the frontend (filter logic in Step5Interaction.vue::sendToReportAgent) — but the backend defensive fix is independent and prevents regressions.

…y dedup Closes 666ghj#624, 666ghj#622, 666ghj#601, 666ghj#599, 666ghj#577 ## LLM JSON parsing (666ghj#624 / 666ghj#622 / 666ghj#601) - New `_parse_llm_json()` in llm_client.py with 5-stage fallback: 1. Strip markdown fences (existing) 2. Strict json.loads (fast path) 3. json.JSONDecoder.raw_decode (handles trailing prose after JSON) 4. Balanced-brace extraction (leading prose + embedded JSON) 5. Strip control chars + retry - Replaces strict json.loads in chat_json() that was failing on any LLM appending text after the JSON (common with qwen-plus, ollama, gemma even with response_format=json_object). - Logs which fallback was used so problematic LLMs are visible. - 8 unit-test cases covering each strategy. ## Report section tool_call leak (666ghj#599) - New `_sanitize_section_content()` in report_agent.py detects when a section's "final answer" is actually an unexecuted tool_call JSON (e.g. `{"name":"quick_search","parameters":{...}}`) and replaces it with a clear fallback message instead of writing the raw artifact to the report. - Applied at all 3 places where final_answer is returned in write_section(): the Final Answer path, the no-prefix fallback, and the force-finalize path. ## Chat history duplicate user message (666ghj#577) - In report_agent.py chat(), defensively dedupe chat_history: - Only keep {role, content} from history items - Skip entries that match the current message exactly - This prevents LLM from seeing a duplicate trailing user message and echoing back the previous answer. - Added debug log of constructed messages array for diagnostics.

Thamer26 · 2026-05-17T08:07:44Z

So what do I do to actually make it work?

hse00435-hub · 2026-05-17T11:20:42Z

Thanks for the quick patch in #626! However, please note that the live production website is STILL throwing the exact same error (as shown in my newly uploaded screenshot).

It seems the fix has been coded and passed your local unit tests, but it hasn't been merged or deployed to the live production server yet. Paid users are still completely blocked by this position 10243 bug on the web platform.

Could you please nudge the team to deploy PR #626 to the production server ASAP? Thank you!

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: robust LLM JSON parsing + sanitize report sections + chat history dedup (closes #624 #622 #601 #599 #577)#626

fix: robust LLM JSON parsing + sanitize report sections + chat history dedup (closes #624 #622 #601 #599 #577)#626
rqd6f4g6zn-bit wants to merge 1 commit into
666ghj:mainfrom
rqd6f4g6zn-bit:fix/llm-json-robust-parsing-and-tool-call-sanitize

rqd6f4g6zn-bit commented May 17, 2026

Uh oh!

Thamer26 commented May 17, 2026

Uh oh!

hse00435-hub commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rqd6f4g6zn-bit commented May 17, 2026

Fix 4 open bugs: robust LLM JSON parsing + sanitize report sections + chat history dedup

Summary

Issues fixed

#624 / #622 / #601 — Unexpected non-whitespace character after JSON at position N + 500 on /api/graph/ontology/generate

#599 — Section content is raw unexecuted tool_call JSON

#577 — Report-Agent chat repeats first answer regardless of follow-up question

Files changed

Verification

Notes for maintainers

Uh oh!

Thamer26 commented May 17, 2026

Uh oh!

hse00435-hub commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

#624 / #622 / #601 — `Unexpected non-whitespace character after JSON at position N` + 500 on `/api/graph/ontology/generate`

#599 — Section content is raw unexecuted `tool_call` JSON