-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
- Observed
While sending a chat request that triggers the “Web Search” path, api_server aborts with pydantic_core.ValidationError: Invalid JSON: expected value at line 1 column 104. The stack trace
shows the failure bubbling up from invoke_llm_json when parsing the LLM response into WebSearchAnswer (backend/onyx/agents/agent_search/dr/sub_agents/web_search/dr_ws_2_search.py:104-
117, backend/onyx/agents/agent_search/shared_graph_utils/llm.py:172-199).- Repro (current prod docker-compose stack, default config)
- Bring the system up (docker compose up -d).
- In the UI hit Chat and ask something that forces an internet lookup (e.g. “Summarize the latest changes to the IRS child tax credit”).
- Within ~30s the UI stream halts; backend logs contain the validation error above.
- Expected
The agent should recover gracefully even if the LLM emits slightly malformed JSON, ideally retrying or defaulting to “no URLs selected” so the chat continues. - Actual
The JSON decoding failure is unhandled, terminating the request; the user sees a broken answer and further steps in the graph never run. - Hypothesis
The prompt only loosely requests JSON and the fallback parsing in invoke_llm_json relies on naive brace slicing. When the LLM adds stray prose (e.g. “… change process ] }”),
schema.model_validate_json fails before any retry logic kicks in. No guard rails catch the exception, so the streaming pipeline raises. - Suggested Fixes
- Harden invoke_llm_json with structured-response enforcement or defensive retry when model_validate_json raises; fall back to urls_to_open_indices=[] instead of crashing.
- Tighten WEB_SEARCH_URL_SELECTION_PROMPT (backend/onyx/prompts/dr_prompts.py:1482-1513) to command “Respond with ONLY valid JSON” and clarify that indices must be integers.
- Consider logging the raw response for debugging once (redacted) so malformed outputs can be inspected.
- Impact
High: any chat path using web search can fail unpredictably, degrading search-backed answers for all users.
- Repro (current prod docker-compose stack, default config)
▌ 1 validation error for WebSearchAnswer Invalid JSON: expected value at line 1 column 104 [type=json_invalid, input_value='{ "urls_to_open_indi... change process ] }', input_type=str] For
▌ further information visit https://errors.pydantic.dev/2.11/v/json_invalid background-1 | INFO: INFO 10/05/2025 01:58:31 AM tasks.py:918 :
▌ [monitor_celery_queues(76473535-bf01-4197-a96d-d054e9757f8e)] Queue lengths: celery=0 docfetching=0 docfetching_prefetched=0 docprocessing=0 docprocessing_prefetched=0
▌ user_files_indexing=0 user_file_processing=0 user_file_project_sync=0 sync=0 deletion=0 pruning=0 permissions_sync=0 external_group_sync=0 permissions_upsert=0
▌ api_server-1 | ERROR: 10/05/2025 01:58:33 AM process_message.py 789: [API:thJsCXev] Failed to process chat message.
▌ api_server-1 | Traceback (most recent call last):
▌ api_server-1 | File "/app/onyx/chat/process_message.py", line 784, in stream_chat_message_objects
▌ api_server-1 | yield from process_streamed_packets(
▌ api_server-1 | File "/app/onyx/chat/packet_proccessing/process_streamed_packets.py", line 20, in process_streamed_packets
▌ api_server-1 | for packet in answer_processed_output:
▌ api_server-1 | File "/app/onyx/chat/answer.py", line 150, in processed_streamed_output
▌ api_server-1 | for packet in stream:
▌ api_server-1 | File "/app/onyx/agents/agent_search/run_graph.py", line 73, in run_dr_graph
▌ api_server-1 | yield from run_graph(compiled_graph, config, input)
▌ api_server-1 | File "/app/onyx/agents/agent_search/run_graph.py", line 47, in run_graph
▌ api_server-1 | for event in manage_sync_streaming(
▌ api_server-1 | File "/app/onyx/agents/agent_search/run_graph.py", line 33, in manage_sync_streaming
▌ api_server-1 | for event in compiled_graph.stream(
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/init.py", line 1724, in stream
▌ api_server-1 | for _ in runner.tick(
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/runner.py", line 302, in tick
▌ api_server-1 | _panic_or_proceed(
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/runner.py", line 619, in _panic_or_proceed
▌ api_server-1 | raise exc
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/executor.py", line 83, in done
▌ api_server-1 | task.result()
▌ api_server-1 | File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
▌ api_server-1 | return self.__get_result()
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
▌ api_server-1 | raise self._exception
▌ api_server-1 | File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
▌ api_server-1 | result = self.fn(*self.args, **self.kwargs)
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/retry.py", line 40, in run_with_retry
▌ api_server-1 | return task.proc.invoke(task.input, config)
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/utils/runnable.py", line 506, in invoke
▌ api_server-1 | input = step.invoke(input, config, **kwargs)
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/init.py", line 2069, in invoke
▌ api_server-1 | for chunk in self.stream(
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/init.py", line 1724, in stream
▌ api_server-1 | for _ in runner.tick(
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/runner.py", line 302, in tick
▌ api_server-1 | _panic_or_proceed(
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/runner.py", line 619, in _panic_or_proceed
▌ api_server-1 | raise exc
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/executor.py", line 83, in done
▌ api_server-1 | task.result()
▌ api_server-1 | File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
▌ api_server-1 | return self.__get_result()
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
▌ api_server-1 | raise self._exception
▌ api_server-1 | File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
▌ api_server-1 | result = self.fn(*self.args, **self.kwargs)
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/pregel/retry.py", line 40, in run_with_retry
▌ api_server-1 | return task.proc.invoke(task.input, config)
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/utils/runnable.py", line 506, in invoke
▌ api_server-1 | input = step.invoke(input, config, **kwargs)
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/langgraph/utils/runnable.py", line 270, in invoke
▌ api_server-1 | ret = context.run(self.func, *args, **kwargs)
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/app/onyx/agents/agent_search/dr/sub_agents/web_search/dr_ws_2_search.py", line 105, in web_search
▌ api_server-1 | agent_decision = invoke_llm_json(
▌ api_server-1 | ^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/app/onyx/agents/agent_search/shared_graph_utils/llm.py", line 181, in invoke_llm_json
▌ api_server-1 | return schema.model_validate_json(response_content)
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 746, in model_validate_json
▌ api_server-1 | return cls.pydantic_validator.validate_json(
▌ api_server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
▌ api_server-1 | pydantic_core._pydantic_core.ValidationError: 1 validation error for WebSearchAnswer
▌ api_server-1 | Invalid JSON: expected value at line 1 column 104 [type=json_invalid, input_value='{ "urls_to_open_indi... change process ] }', input_type=str]
▌ api_server-1 | For further information visit https://errors.pydantic.dev/2.11/v/json_invalid
▌ api_server-1 | During task with name 'search' and id 'e9148a36-4c89-0d78-6a09-2929ce5c12ae'
▌ api_server-1 | During task with name 'DRPath.WEB_SEARCH' and id '27acf7ba-b690-ca0a-c514-ce067b5565dd'
▌ api_server-1 | INFO: 10/05/2025 01:58:33 AM timing.py 76: [API:thJsCXev] stream_chat_message took 33.30265140533447 seconds