feat: Industrial-Grade Reliability, Cascading LLM Fallback, and Tiered Service Support by amaynez · Pull Request #600 · 666ghj/MiroFish

amaynez · 2026-05-03T08:19:16Z

Overview

This PR transforms the MiroFish engine from a prototype into a robust, production-ready platform. It introduces a multi-layered resilience system for both LLM generations and external service (Zep) interactions, alongside a foundation for tiered service support.

Key Changes

1. Robust LLM Client with Cascading Fallback

Truncation Detection: Automatically detects if an LLM response was cut off (e.g., due to token limits).
JSON Repair: Implements smart logic to repair malformed or truncated JSON responses.
Boost Fallback: If the primary LLM fails or returns broken JSON, the system automatically falls back to a high-capacity "Boost" model (configured via LLM_BOOST_* env vars).

2. Zep Resilience Layer

Smart Retries & Rate Limiting: Added a sophisticated handling layer to avoid 429 errors and handle quota limits gracefully.
Robust Paging: Re-implemented graph reading with robust paging to handle large-scale data without timeouts.
Localized Error Handling: Improved error messages to inform users specifically when Zep quotas are exceeded.

3. Tiered Service Foundation

Configuration-Driven Polling: Introduced a /config endpoint that allows the backend to control frontend polling behavior.
Conditional Polling: The UI now dynamically enables/disables automatic graph updates based on the service tier (Free vs. Premium).
Response Caching: Implemented server-side caching for graph data to optimize performance and reduce API costs.

Why this benefits MiroFish users

No-Fail Simulations: Simulations are significantly less likely to crash due to minor AI hiccups or transient network issues.
Clearer Feedback: Users are no longer met with generic "500 Internal Server Error" when external service limits are hit; they get clear, actionable messages.
Scalability: The engine is now better equipped to handle large documents and complex simulations that previously caused timeouts or JSON parsing errors.

Commit Breakdown

feat(utils): implement robust LLM client with cascading fallback and JSON repair
feat(zep): add resilience layer with retries, rate limiting, and robust paging
feat(graph): refactor core services for high-availability simulations
feat(api): add tiered configuration and graph data caching
feat(frontend): implement conditional polling and service-tier UI
fix(i18n): add localized error messages for service quotas

…JSON repair

…st paging

amaynez added 6 commits May 3, 2026 02:18

feat(utils): implement robust LLM client with cascading fallback and …

15bd114

…JSON repair

feat(zep): add resilience layer with retries, rate limiting, and robu…

676ec89

…st paging

feat(graph): refactor core services for high-availability simulations

e7f452e

feat(api): add tiered configuration and graph data caching

d790acc

feat(frontend): implement conditional polling and service-tier UI

744eb80

fix(i18n): add localized error messages for service quotas

d072972

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels May 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Industrial-Grade Reliability, Cascading LLM Fallback, and Tiered Service Support#600

feat: Industrial-Grade Reliability, Cascading LLM Fallback, and Tiered Service Support#600
amaynez wants to merge 6 commits into
666ghj:mainfrom
amaynez:feature/industrial-reliability

amaynez commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amaynez commented May 3, 2026

Overview

Key Changes

1. Robust LLM Client with Cascading Fallback

2. Zep Resilience Layer

3. Tiered Service Foundation

Why this benefits MiroFish users

Commit Breakdown

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant