Skip to content

fix: robust LLM JSON parsing + sanitize report sections + chat history dedup (closes #624 #622 #601 #599 #577)#626

Open
rqd6f4g6zn-bit wants to merge 1 commit into
666ghj:mainfrom
rqd6f4g6zn-bit:fix/llm-json-robust-parsing-and-tool-call-sanitize
Open

fix: robust LLM JSON parsing + sanitize report sections + chat history dedup (closes #624 #622 #601 #599 #577)#626
rqd6f4g6zn-bit wants to merge 1 commit into
666ghj:mainfrom
rqd6f4g6zn-bit:fix/llm-json-robust-parsing-and-tool-call-sanitize

Conversation

@rqd6f4g6zn-bit
Copy link
Copy Markdown

Fix 4 open bugs: robust LLM JSON parsing + sanitize report sections + chat history dedup

Closes #624, #622, #601, #599, #577

Summary

This PR addresses four open bugs that all stem from how the backend handles slightly-non-conforming LLM outputs (trailing text, raw tool_call JSON, duplicate user messages). Net change: backend is more defensive, no behavioral regressions on conforming outputs.

Issues fixed

#624 / #622 / #601Unexpected non-whitespace character after JSON at position N + 500 on /api/graph/ontology/generate

Root cause: chat_json() in backend/app/utils/llm_client.py runs strict json.loads() after stripping Markdown fences. Many LLMs (qwen-plus, gemma, ollama-served models) append trailing prose after the JSON block even with response_format=json_object — strict parsing fails with the reported error.

Fix: Extracted parsing into _parse_llm_json() with a 5-stage strategy:

  1. Strip Markdown fences (existing)
  2. Strict json.loads (fast path)
  3. json.JSONDecoder().raw_decode() — parses JSON prefix, ignores trailing text (logs a warning)
  4. Balanced-brace extraction (handles leading prose + embedded JSON)
  5. Strip control chars + retry

Falls through to a ValueError with a clear preview if everything fails. Logs visibility into which strategy succeeded so we can monitor which LLMs misbehave.

Tests: 8 unit-test cases covering clean JSON, fenced JSON, trailing text, fences+trailing, leading prose, nested-objects+trailing, control chars, and the exact failure mode at position 10243 — all pass.

#599 — Section content is raw unexecuted tool_call JSON

Root cause: In backend/app/services/report_agent.py::write_section(), three code paths can return a final_answer that consists entirely of an unexecuted tool_call JSON (e.g. {"name":"quick_search","parameters":{...}}):

  • Final Answer path (line ~1392) — when LLM prefixes raw JSON with Final Answer:
  • "No prefix" fallback (line ~1491) — when LLM has used enough tools but emits a tool_call instead of prose
  • Force-finalize path (line ~1519) — when max iterations hit and LLM still emits a tool_call

All three paths now go through new _sanitize_section_content() which detects the leak pattern (full content parses as JSON and name matches VALID_TOOL_NAMES) and replaces with a clear fallback message instead of leaking the raw artifact.

#577 — Report-Agent chat repeats first answer regardless of follow-up question

Likely root cause: Frontend can (under some flows) include the just-sent user message inside the chat_history array. Backend then appends the same message again at the end → LLM sees duplicate-trailing user message and ignores the latest one, returning the prior answer style.

Fix: In report_agent.py::chat(), defensively:

  • Only keep {role, content} from history items (defensive against extra fields)
  • Skip entries that match the current message
  • Added debug log of constructed messages array for future diagnostics

This is defensive — even if the frontend filters correctly, this prevents regressions from future call sites.

Files changed

backend/app/utils/llm_client.py       +112 -10  (new _parse_llm_json with 5 strategies)
backend/app/services/report_agent.py   +50  -10  (new _sanitize_section_content + 3 call sites; chat() history dedup)

Verification

  • All unit tests for _parse_llm_json pass (8/8 cases covering each strategy)
  • Backend starts cleanly: uv run python run.py → all routes return correct validation errors on empty bodies (400/404, no more 500s on validation failures)
  • No new dependencies
  • No changes to public API contracts

Notes for maintainers

  • The JSON parser now emits warnings via logger when fallback strategies are used. Monitor these — high frequency from a specific model suggests adjusting that model's system prompt.
  • _sanitize_section_content() returns a German-language fallback message ("_(Hinweis: Für diesen Abschnitt..."). If the project wants Chinese/English, localize via the i18n system in app/utils/locale.py.
  • For Bug for the last step among the five steps! #577 a fuller fix would be on the frontend (filter logic in Step5Interaction.vue::sendToReportAgent) — but the backend defensive fix is independent and prevents regressions.

…y dedup

Closes 666ghj#624, 666ghj#622, 666ghj#601, 666ghj#599, 666ghj#577

## LLM JSON parsing (666ghj#624 / 666ghj#622 / 666ghj#601)
- New `_parse_llm_json()` in llm_client.py with 5-stage fallback:
  1. Strip markdown fences (existing)
  2. Strict json.loads (fast path)
  3. json.JSONDecoder.raw_decode (handles trailing prose after JSON)
  4. Balanced-brace extraction (leading prose + embedded JSON)
  5. Strip control chars + retry
- Replaces strict json.loads in chat_json() that was failing on any LLM
  appending text after the JSON (common with qwen-plus, ollama, gemma even
  with response_format=json_object).
- Logs which fallback was used so problematic LLMs are visible.
- 8 unit-test cases covering each strategy.

## Report section tool_call leak (666ghj#599)
- New `_sanitize_section_content()` in report_agent.py detects when a
  section's "final answer" is actually an unexecuted tool_call JSON
  (e.g. `{"name":"quick_search","parameters":{...}}`) and replaces it
  with a clear fallback message instead of writing the raw artifact to
  the report.
- Applied at all 3 places where final_answer is returned in
  write_section(): the Final Answer path, the no-prefix fallback, and
  the force-finalize path.

## Chat history duplicate user message (666ghj#577)
- In report_agent.py chat(), defensively dedupe chat_history:
  - Only keep {role, content} from history items
  - Skip entries that match the current message exactly
- This prevents LLM from seeing a duplicate trailing user message and
  echoing back the previous answer.
- Added debug log of constructed messages array for diagnostics.
@Thamer26
Copy link
Copy Markdown

So what do I do to actually make it work?

@hse00435-hub
Copy link
Copy Markdown

Thanks for the quick patch in #626! However, please note that the live production website is STILL throwing the exact same error (as shown in my newly uploaded screenshot).

It seems the fix has been coded and passed your local unit tests, but it hasn't been merged or deployed to the live production server yet. Paid users are still completely blocked by this position 10243 bug on the web platform.

Could you please nudge the team to deploy PR #626 to the production server ASAP? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected non-whitespace character after JSON at position 10243

4 participants