Skip to content

feat(llm): support local OpenAI-compatible runtimes via LLM_JSON_MODE#580

Open
prenansantana wants to merge 2 commits into
666ghj:mainfrom
prenansantana:feat/llm-json-mode-configurable
Open

feat(llm): support local OpenAI-compatible runtimes via LLM_JSON_MODE#580
prenansantana wants to merge 2 commits into
666ghj:mainfrom
prenansantana:feat/llm-json-mode-configurable

Conversation

@prenansantana
Copy link
Copy Markdown

Problem

MiroFish hard-codes response_format={"type": "json_object"} on every JSON-producing LLM call. That works for OpenAI, Qwen/Dashscope, and Ollama, but several popular OpenAI-compatible runtimes reject it with HTTP 400 - 'response_format.type' must be 'json_schema' or 'text'. The result is an immediate 500 on /api/graph/ontology/generate (and on profile / config generation later in the pipeline) for users running local stacks.

This appears to be the root cause behind several open issues from users trying to bring their own LLM:

Fix

Add a single env var LLM_JSON_MODE (default json_object, set to none to opt out). When none, MiroFish omits the response_format field entirely and relies on prompt-driven JSON output, which the existing markdown-tolerant fallback (re.search(r'\{[\s\S]*\}', ...) + _fix_truncated_json / _try_fix_config_json) already handles robustly.

The change is minimal and surgical — applied at the three call sites that send response_format:

  • backend/app/utils/llm_client.py (chat_json helper)
  • backend/app/services/oasis_profile_generator.py (persona synthesis)
  • backend/app/services/simulation_config_generator.py (time/event/agent config)

backend/app/config.py exposes the new var; .env.example documents it with usage guidance.

Backwards compatibility

Zero breaking changes. The default LLM_JSON_MODE=json_object preserves the exact current behavior for every existing user (OpenAI, Qwen Cloud, Ollama, Dashscope). Local-runtime users opt in by adding one line to .env.

Docs

docs/LOCAL_LLM.md (new) covers:

README.md gains a one-line callout to the new doc, keeping the main quickstart focused on the cloud-LLM happy path.

Test plan

  • Default path unchanged: with no env var set, chat_json still sends response_format={"type": "json_object"} against OpenAI-compatible cloud endpoints.
  • Smoke test with LLM_JSON_MODE=none against LM Studio (Qwen3-4B MLX): chat_json returns parsed dict.
  • End-to-end ontology generation with all four sample seed PDFs (~28k chars combined) against LM Studio with 32k context window: HTTP 200, ontology built correctly with valid entity/edge types.
  • Full simulation (10 rounds, 79 agents, Twitter+Reddit) against Anthropic Haiku 4.5 via OpenAI-compat endpoint with LLM_JSON_MODE=none: prep, run, and report-generation phases all completed; the resulting prediction report cites real polling data accurately (verified against published Paraná Pesquisas figures from April 2026).

Files

  • backend/app/config.py (+4)
  • backend/app/utils/llm_client.py (+1/-1)
  • backend/app/services/oasis_profile_generator.py (+11/-7)
  • backend/app/services/simulation_config_generator.py (+11/-7)
  • .env.example (+6)
  • README.md (+2)
  • docs/LOCAL_LLM.md (+126, new)

7 files changed, +155/-13.

OpenAI-compatible runtimes differ in how they handle `response_format`:
cloud providers (OpenAI, Qwen/Dashscope, Ollama) accept
`{"type": "json_object"}`, while local runtimes like LM Studio and
llama.cpp server reject it with HTTP 400, only accepting `json_schema`
or `text`. This prevented MiroFish from running against fully-local
stacks.

Introduce `LLM_JSON_MODE` (default `json_object`) so users can opt out
of strict JSON response mode by setting `LLM_JSON_MODE=none`. The
existing prompt-based JSON + markdown-tolerant parsing already handles
the unstructured response path robustly, so `none` is viable for any
OpenAI-compatible endpoint.

Applied at all three call sites that send `response_format`:
- utils/llm_client.py (chat_json helper)
- services/oasis_profile_generator.py (persona synthesis)
- services/simulation_config_generator.py (time/event/agent config)

Documented in .env.example with guidance on when to pick each value.
Companion to the LLM_JSON_MODE feature: documents how to point MiroFish
at LM Studio, Ollama, and llama.cpp server, including the
LLM_JSON_MODE=none requirement, recommended context window for
ontology generation, memory/throughput expectations, and Apple Silicon
caveats (the macOS Tahoe + Mx Metal shader bug in Ollama 0.21).

Adds a one-line callout in README.md pointing to the new doc, so the
main quickstart stays focused on the cloud-LLM happy path.
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request LLM API Any questions regarding the LLM API labels Apr 25, 2026
This was referenced Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request LLM API Any questions regarding the LLM API size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant