fix: robust LLM JSON parsing + sanitize report sections + chat history dedup (closes #624 #622 #601 #599 #577)#626
Open
rqd6f4g6zn-bit wants to merge 1 commit into
Conversation
…y dedup Closes 666ghj#624, 666ghj#622, 666ghj#601, 666ghj#599, 666ghj#577 ## LLM JSON parsing (666ghj#624 / 666ghj#622 / 666ghj#601) - New `_parse_llm_json()` in llm_client.py with 5-stage fallback: 1. Strip markdown fences (existing) 2. Strict json.loads (fast path) 3. json.JSONDecoder.raw_decode (handles trailing prose after JSON) 4. Balanced-brace extraction (leading prose + embedded JSON) 5. Strip control chars + retry - Replaces strict json.loads in chat_json() that was failing on any LLM appending text after the JSON (common with qwen-plus, ollama, gemma even with response_format=json_object). - Logs which fallback was used so problematic LLMs are visible. - 8 unit-test cases covering each strategy. ## Report section tool_call leak (666ghj#599) - New `_sanitize_section_content()` in report_agent.py detects when a section's "final answer" is actually an unexecuted tool_call JSON (e.g. `{"name":"quick_search","parameters":{...}}`) and replaces it with a clear fallback message instead of writing the raw artifact to the report. - Applied at all 3 places where final_answer is returned in write_section(): the Final Answer path, the no-prefix fallback, and the force-finalize path. ## Chat history duplicate user message (666ghj#577) - In report_agent.py chat(), defensively dedupe chat_history: - Only keep {role, content} from history items - Skip entries that match the current message exactly - This prevents LLM from seeing a duplicate trailing user message and echoing back the previous answer. - Added debug log of constructed messages array for diagnostics.
This was referenced May 17, 2026
|
So what do I do to actually make it work? |
|
Thanks for the quick patch in #626! However, please note that the live production website is STILL throwing the exact same error (as shown in my newly uploaded screenshot). It seems the fix has been coded and passed your local unit tests, but it hasn't been merged or deployed to the live production server yet. Paid users are still completely blocked by this Could you please nudge the team to deploy PR #626 to the production server ASAP? Thank you! |
This was referenced May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix 4 open bugs: robust LLM JSON parsing + sanitize report sections + chat history dedup
Closes #624, #622, #601, #599, #577
Summary
This PR addresses four open bugs that all stem from how the backend handles slightly-non-conforming LLM outputs (trailing text, raw
tool_callJSON, duplicate user messages). Net change: backend is more defensive, no behavioral regressions on conforming outputs.Issues fixed
#624 / #622 / #601 —
Unexpected non-whitespace character after JSON at position N+ 500 on/api/graph/ontology/generateRoot cause:
chat_json()inbackend/app/utils/llm_client.pyruns strictjson.loads()after stripping Markdown fences. Many LLMs (qwen-plus, gemma, ollama-served models) append trailing prose after the JSON block even withresponse_format=json_object— strict parsing fails with the reported error.Fix: Extracted parsing into
_parse_llm_json()with a 5-stage strategy:json.loads(fast path)json.JSONDecoder().raw_decode()— parses JSON prefix, ignores trailing text (logs a warning)Falls through to a
ValueErrorwith a clear preview if everything fails. Logs visibility into which strategy succeeded so we can monitor which LLMs misbehave.Tests: 8 unit-test cases covering clean JSON, fenced JSON, trailing text, fences+trailing, leading prose, nested-objects+trailing, control chars, and the exact failure mode at position 10243 — all pass.
#599 — Section content is raw unexecuted
tool_callJSONRoot cause: In
backend/app/services/report_agent.py::write_section(), three code paths can return afinal_answerthat consists entirely of an unexecutedtool_callJSON (e.g.{"name":"quick_search","parameters":{...}}):Final Answer:All three paths now go through new
_sanitize_section_content()which detects the leak pattern (full content parses as JSON andnamematchesVALID_TOOL_NAMES) and replaces with a clear fallback message instead of leaking the raw artifact.#577 — Report-Agent chat repeats first answer regardless of follow-up question
Likely root cause: Frontend can (under some flows) include the just-sent user message inside the
chat_historyarray. Backend then appends the same message again at the end → LLM sees duplicate-trailing user message and ignores the latest one, returning the prior answer style.Fix: In
report_agent.py::chat(), defensively:{role, content}from history items (defensive against extra fields)This is defensive — even if the frontend filters correctly, this prevents regressions from future call sites.
Files changed
Verification
_parse_llm_jsonpass (8/8 cases covering each strategy)uv run python run.py→ all routes return correct validation errors on empty bodies (400/404, no more 500s on validation failures)Notes for maintainers
loggerwhen fallback strategies are used. Monitor these — high frequency from a specific model suggests adjusting that model's system prompt._sanitize_section_content()returns a German-language fallback message ("_(Hinweis: Für diesen Abschnitt..."). If the project wants Chinese/English, localize via the i18n system inapp/utils/locale.py.Step5Interaction.vue::sendToReportAgent) — but the backend defensive fix is independent and prevents regressions.