feat(llm): support local OpenAI-compatible runtimes via LLM_JSON_MODE#580
Open
prenansantana wants to merge 2 commits into
Open
feat(llm): support local OpenAI-compatible runtimes via LLM_JSON_MODE#580prenansantana wants to merge 2 commits into
prenansantana wants to merge 2 commits into
Conversation
OpenAI-compatible runtimes differ in how they handle `response_format`:
cloud providers (OpenAI, Qwen/Dashscope, Ollama) accept
`{"type": "json_object"}`, while local runtimes like LM Studio and
llama.cpp server reject it with HTTP 400, only accepting `json_schema`
or `text`. This prevented MiroFish from running against fully-local
stacks.
Introduce `LLM_JSON_MODE` (default `json_object`) so users can opt out
of strict JSON response mode by setting `LLM_JSON_MODE=none`. The
existing prompt-based JSON + markdown-tolerant parsing already handles
the unstructured response path robustly, so `none` is viable for any
OpenAI-compatible endpoint.
Applied at all three call sites that send `response_format`:
- utils/llm_client.py (chat_json helper)
- services/oasis_profile_generator.py (persona synthesis)
- services/simulation_config_generator.py (time/event/agent config)
Documented in .env.example with guidance on when to pick each value.
Companion to the LLM_JSON_MODE feature: documents how to point MiroFish at LM Studio, Ollama, and llama.cpp server, including the LLM_JSON_MODE=none requirement, recommended context window for ontology generation, memory/throughput expectations, and Apple Silicon caveats (the macOS Tahoe + Mx Metal shader bug in Ollama 0.21). Adds a one-line callout in README.md pointing to the new doc, so the main quickstart stays focused on the cloud-LLM happy path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
MiroFish hard-codes
response_format={"type": "json_object"}on every JSON-producing LLM call. That works for OpenAI, Qwen/Dashscope, and Ollama, but several popular OpenAI-compatible runtimes reject it withHTTP 400 - 'response_format.type' must be 'json_schema' or 'text'. The result is an immediate 500 on/api/graph/ontology/generate(and on profile / config generation later in the pipeline) for users running local stacks.This appears to be the root cause behind several open issues from users trying to bring their own LLM:
/api/graph/ontology/generatefrom non-default LLM providers)Fix
Add a single env var
LLM_JSON_MODE(defaultjson_object, set tononeto opt out). Whennone, MiroFish omits theresponse_formatfield entirely and relies on prompt-driven JSON output, which the existing markdown-tolerant fallback (re.search(r'\{[\s\S]*\}', ...)+_fix_truncated_json/_try_fix_config_json) already handles robustly.The change is minimal and surgical — applied at the three call sites that send
response_format:backend/app/utils/llm_client.py(chat_jsonhelper)backend/app/services/oasis_profile_generator.py(persona synthesis)backend/app/services/simulation_config_generator.py(time/event/agent config)backend/app/config.pyexposes the new var;.env.exampledocuments it with usage guidance.Backwards compatibility
Zero breaking changes. The default
LLM_JSON_MODE=json_objectpreserves the exact current behavior for every existing user (OpenAI, Qwen Cloud, Ollama, Dashscope). Local-runtime users opt in by adding one line to.env.Docs
docs/LOCAL_LLM.md(new) covers:LLM_JSON_MODEto use for each)README.mdgains a one-line callout to the new doc, keeping the main quickstart focused on the cloud-LLM happy path.Test plan
chat_jsonstill sendsresponse_format={"type": "json_object"}against OpenAI-compatible cloud endpoints.LLM_JSON_MODE=noneagainst LM Studio (Qwen3-4B MLX):chat_jsonreturns parsed dict.LLM_JSON_MODE=none: prep, run, and report-generation phases all completed; the resulting prediction report cites real polling data accurately (verified against published Paraná Pesquisas figures from April 2026).Files
backend/app/config.py(+4)backend/app/utils/llm_client.py(+1/-1)backend/app/services/oasis_profile_generator.py(+11/-7)backend/app/services/simulation_config_generator.py(+11/-7).env.example(+6)README.md(+2)docs/LOCAL_LLM.md(+126, new)7 files changed, +155/-13.