feat(llm): support local OpenAI-compatible runtimes via LLM_JSON_MODE by prenansantana · Pull Request #580 · 666ghj/MiroFish

prenansantana · 2026-04-25T15:14:23Z

Problem

MiroFish hard-codes response_format={"type": "json_object"} on every JSON-producing LLM call. That works for OpenAI, Qwen/Dashscope, and Ollama, but several popular OpenAI-compatible runtimes reject it with HTTP 400 - 'response_format.type' must be 'json_schema' or 'text'. The result is an immediate 500 on /api/graph/ontology/generate (and on profile / config generation later in the pipeline) for users running local stacks.

This appears to be the root cause behind several open issues from users trying to bring their own LLM:

Ollama and qwen models #432 (Ollama with custom base URL — 500 on ontology/generate)
500 error on /api/graph/ontology/generate with OpenAI API — JSON parse failure #299, Issue: 500 error on /api/graph/ontology/generate when using OpenAI API #271, POST /api/graph/ontology/generate HTTP/1.1" 500 #403, 第一步ontology/generate提示500错误。 #298, 500 error #331, 我使用了官方推荐的qwen-plus模型，还是报500 #339 (various 500 errors on /api/graph/ontology/generate from non-default LLM providers)

Fix

Add a single env var LLM_JSON_MODE (default json_object, set to none to opt out). When none, MiroFish omits the response_format field entirely and relies on prompt-driven JSON output, which the existing markdown-tolerant fallback (re.search(r'\{[\s\S]*\}', ...) + _fix_truncated_json / _try_fix_config_json) already handles robustly.

The change is minimal and surgical — applied at the three call sites that send response_format:

backend/app/utils/llm_client.py (chat_json helper)
backend/app/services/oasis_profile_generator.py (persona synthesis)
backend/app/services/simulation_config_generator.py (time/event/agent config)

backend/app/config.py exposes the new var; .env.example documents it with usage guidance.

Backwards compatibility

Zero breaking changes. The default LLM_JSON_MODE=json_object preserves the exact current behavior for every existing user (OpenAI, Qwen Cloud, Ollama, Dashscope). Local-runtime users opt in by adding one line to .env.

Docs

docs/LOCAL_LLM.md (new) covers:

Provider compatibility matrix (which LLM_JSON_MODE to use for each)
Step-by-step recipes for LM Studio, Ollama, and llama.cpp server
Apple Silicon caveats (the macOS Taheo + M-series Metal shader bug in Ollama 0.21 — see Ollama 0.21.0 fails to initialize Metal on Apple M5 / macOS 26.2 while 0.18.0 works ollama/ollama#15748)
Memory/throughput expectations and a troubleshooting table

README.md gains a one-line callout to the new doc, keeping the main quickstart focused on the cloud-LLM happy path.

Test plan

Default path unchanged: with no env var set, chat_json still sends response_format={"type": "json_object"} against OpenAI-compatible cloud endpoints.
Smoke test with LLM_JSON_MODE=none against LM Studio (Qwen3-4B MLX): chat_json returns parsed dict.
End-to-end ontology generation with all four sample seed PDFs (~28k chars combined) against LM Studio with 32k context window: HTTP 200, ontology built correctly with valid entity/edge types.
Full simulation (10 rounds, 79 agents, Twitter+Reddit) against Anthropic Haiku 4.5 via OpenAI-compat endpoint with LLM_JSON_MODE=none: prep, run, and report-generation phases all completed; the resulting prediction report cites real polling data accurately (verified against published Paraná Pesquisas figures from April 2026).

Files

backend/app/config.py (+4)
backend/app/utils/llm_client.py (+1/-1)
backend/app/services/oasis_profile_generator.py (+11/-7)
backend/app/services/simulation_config_generator.py (+11/-7)
.env.example (+6)
README.md (+2)
docs/LOCAL_LLM.md (+126, new)

7 files changed, +155/-13.

OpenAI-compatible runtimes differ in how they handle `response_format`: cloud providers (OpenAI, Qwen/Dashscope, Ollama) accept `{"type": "json_object"}`, while local runtimes like LM Studio and llama.cpp server reject it with HTTP 400, only accepting `json_schema` or `text`. This prevented MiroFish from running against fully-local stacks. Introduce `LLM_JSON_MODE` (default `json_object`) so users can opt out of strict JSON response mode by setting `LLM_JSON_MODE=none`. The existing prompt-based JSON + markdown-tolerant parsing already handles the unstructured response path robustly, so `none` is viable for any OpenAI-compatible endpoint. Applied at all three call sites that send `response_format`: - utils/llm_client.py (chat_json helper) - services/oasis_profile_generator.py (persona synthesis) - services/simulation_config_generator.py (time/event/agent config) Documented in .env.example with guidance on when to pick each value.

Companion to the LLM_JSON_MODE feature: documents how to point MiroFish at LM Studio, Ollama, and llama.cpp server, including the LLM_JSON_MODE=none requirement, recommended context window for ontology generation, memory/throughput expectations, and Apple Silicon caveats (the macOS Tahoe + Mx Metal shader bug in Ollama 0.21). Adds a one-line callout in README.md pointing to the new doc, so the main quickstart stays focused on the cloud-LLM happy path.

prenansantana added 2 commits April 24, 2026 21:46

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request LLM API Any questions regarding the LLM API labels Apr 25, 2026

This was referenced Apr 25, 2026

Ollama and qwen models #432

Open

500 error #331

Open

dosubot Bot mentioned this pull request May 3, 2026

500 error on /api/graph/ontology/generate even with tiny TXT and qwen-plus #601

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): support local OpenAI-compatible runtimes via LLM_JSON_MODE#580

feat(llm): support local OpenAI-compatible runtimes via LLM_JSON_MODE#580
prenansantana wants to merge 2 commits into
666ghj:mainfrom
prenansantana:feat/llm-json-mode-configurable

prenansantana commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

prenansantana commented Apr 25, 2026

Problem

Fix

Backwards compatibility

Docs

Test plan

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant