Skip to content

Improve Zep retry handling and frontend API proxy#625

Open
PhoenixCPH wants to merge 1 commit into
666ghj:mainfrom
PhoenixCPH:fix/zep-rate-limit-and-api-proxy
Open

Improve Zep retry handling and frontend API proxy#625
PhoenixCPH wants to merge 1 commit into
666ghj:mainfrom
PhoenixCPH:fix/zep-rate-limit-and-api-proxy

Conversation

@PhoenixCPH
Copy link
Copy Markdown

@PhoenixCPH PhoenixCPH commented May 17, 2026

This PR makes report generation and graph fetching more resilient when Zep Cloud slows down or rate-limits requests, and it also makes the frontend API setup work more reliably during development.

Why this was needed:
The report flow depends on several Zep-backed operations that can take a long time and occasionally hit rate limits or transient failures. Before this change, those cases could lead to slow recovery, confusing status output, or retries that waited for fixed periods without taking server guidance into account. On the frontend, the API client was tied to a backend host, which made the dev setup less flexible when using the Vite proxy or opening the app through a different host.

What changed in the backend:
The Zep service now detects rate-limit style failures explicitly. When the API provides Retry-After or rate-limit reset information, the retry delay follows that signal instead of relying only on a generic backoff. There is also a cap on retry delays so the system does not end up waiting too long. For ordinary transient errors, the existing retry path still applies, but the logging is clearer so it is easier to tell rate-limit handling apart from other failures.

Paging for nodes and edges was also made more robust. The retry budget is higher now, and paging retries also handle API error responses that are effectively temporary or throttle-related. The same header-aware delay logic is used there too, so the system can recover more gracefully when Zep temporarily rejects requests.

What changed in the frontend:
The frontend Axios base URL now defaults to the Vite proxy path instead of hard-coding the backend host. Environment variables can still override that when a direct backend target is needed. The API helper paths were updated to match the proxy-based routing, and the fixed Content-Type header was removed for FormData requests so the browser can set the multipart boundary correctly.

What changed in the report UI:
A visible retry notice now appears in the report view when the console log shows an external rate-limit event. This gives the user a clear explanation that the system is retrying automatically instead of leaving them with only low-level logs. The notice clears automatically after the expected retry window or can be dismissed manually.

Files changed:
backend/app/services/zep_tools.py
backend/app/utils/zep_paging.py
frontend/src/api/graph.js
frontend/src/api/index.js
frontend/src/api/report.js
frontend/src/api/simulation.js
frontend/src/components/Step4Report.vue
frontend/vite.config.js

Validation:
npm run build

Note:
This change does not alter the underlying report logic or simulation flow. It only improves resilience, API routing, and user-facing feedback when external services throttle requests.

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 17, 2026
@PhoenixCPH PhoenixCPH changed the title Fix Zep retries and frontend API proxy Improve Zep retry handling and frontend API proxy May 17, 2026
@PhoenixCPH
Copy link
Copy Markdown
Author

The current Zep service on the free tier is very tight. Some users may encounter 429 rate limit errors when uploading large Markdown files for MiroFish to analyze. This is a problem worth paying attention to.

@PhoenixCPH
Copy link
Copy Markdown
Author

The current project depends on Zep, and Zep has significantly tightened its free quota. It may be worth placing a clear reminder in a prominent spot for users so they understand the limitation before uploading large files or starting analysis.

Another concern is that the project still does not have a detailed retry mechanism or breakpoint resume support after errors. I think that is an important gap for this project and worth addressing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants