Improve Zep retry handling and frontend API proxy#625
Open
PhoenixCPH wants to merge 1 commit into
Open
Conversation
Author
|
The current Zep service on the free tier is very tight. Some users may encounter 429 rate limit errors when uploading large Markdown files for MiroFish to analyze. This is a problem worth paying attention to. |
Author
|
The current project depends on Zep, and Zep has significantly tightened its free quota. It may be worth placing a clear reminder in a prominent spot for users so they understand the limitation before uploading large files or starting analysis. Another concern is that the project still does not have a detailed retry mechanism or breakpoint resume support after errors. I think that is an important gap for this project and worth addressing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR makes report generation and graph fetching more resilient when Zep Cloud slows down or rate-limits requests, and it also makes the frontend API setup work more reliably during development.
Why this was needed:
The report flow depends on several Zep-backed operations that can take a long time and occasionally hit rate limits or transient failures. Before this change, those cases could lead to slow recovery, confusing status output, or retries that waited for fixed periods without taking server guidance into account. On the frontend, the API client was tied to a backend host, which made the dev setup less flexible when using the Vite proxy or opening the app through a different host.
What changed in the backend:
The Zep service now detects rate-limit style failures explicitly. When the API provides Retry-After or rate-limit reset information, the retry delay follows that signal instead of relying only on a generic backoff. There is also a cap on retry delays so the system does not end up waiting too long. For ordinary transient errors, the existing retry path still applies, but the logging is clearer so it is easier to tell rate-limit handling apart from other failures.
Paging for nodes and edges was also made more robust. The retry budget is higher now, and paging retries also handle API error responses that are effectively temporary or throttle-related. The same header-aware delay logic is used there too, so the system can recover more gracefully when Zep temporarily rejects requests.
What changed in the frontend:
The frontend Axios base URL now defaults to the Vite proxy path instead of hard-coding the backend host. Environment variables can still override that when a direct backend target is needed. The API helper paths were updated to match the proxy-based routing, and the fixed Content-Type header was removed for FormData requests so the browser can set the multipart boundary correctly.
What changed in the report UI:
A visible retry notice now appears in the report view when the console log shows an external rate-limit event. This gives the user a clear explanation that the system is retrying automatically instead of leaving them with only low-level logs. The notice clears automatically after the expected retry window or can be dismissed manually.
Files changed:
backend/app/services/zep_tools.py
backend/app/utils/zep_paging.py
frontend/src/api/graph.js
frontend/src/api/index.js
frontend/src/api/report.js
frontend/src/api/simulation.js
frontend/src/components/Step4Report.vue
frontend/vite.config.js
Validation:
npm run build
Note:
This change does not alter the underlying report logic or simulation flow. It only improves resilience, API routing, and user-facing feedback when external services throttle requests.