fix(tools): read_url third-party disclosure + cache opt-out#122
Merged
Conversation
read_url silently sent the full URL (incl. query string) to the third-party Jina Reader (r.jina.ai), could serve a stale cache with no opt-out, and HTTP/exception errors leaked the vendor name and response body (P11). Disclose the dependency, caching/staleness, and the bash-fallback caveat in the web-reader SKILL.md; add an opt-in no_cache flag (x-no-cache header — default path byte-identical); surface a stale snapshot via an additive `cached: true`; sanitize HTTP/exception errors to a generic message (detail kept in the server log only). A local direct-fetch fallback needs SSRF hardening first and is a deliberate out-of-scope follow-up. (cherry picked from commit eaf91128370dd9800195961a55baf5b123162012)
Restore the response body / exception text in read_url error returns so operators can diagnose a remote failure (HTTP 451, transport errors, etc.) without grepping logs. Keeps the new logger.warning lines from the parent commit intact — both paths now surface debug info. Adjusts the two affected regression tests to assert the debug info is included rather than scrubbed.
Collaborator
|
Merged — thanks. One tweak on top before squash: rolled back the HTTP/exc error sanitization in |
ykykj
pushed a commit
to ykykj/Vibe-Trading
that referenced
this pull request
May 23, 2026
) - SKILL.md now explicitly notes that read_url forwards the full URL to the third-party Jina Reader (r.jina.ai), and warns against passing credentials or internal URLs. - New optional no_cache=True parameter sends x-no-cache to bypass Jina's cache when freshness matters. - Responses that contain Jina's cached-snapshot marker now surface a cached: true flag. - Error paths keep the upstream HTTP body and exception text in the returned error string (and also log them via logger.warning) so operators can diagnose remote failures without grepping logs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Disclose the
read_urlthird-party dependency (r.jina.ai) in the skill doc, add ano_cacheopt-out, and sanitize upstream errors so the vendor name and response body are not leaked to the caller (kept in server logs).Why
read_urlsilently routed URLs through a third party and could return stale cached snapshots with no disclosure or opt-out.Changes
src/skills/web-reader/SKILL.md,src/tools/web_reader_tool.pyTest Plan
pytest --ignore=agent/tests/e2e_backtest -qgreen; requests mockedChecklist
Follow-up: anchor the cached-marker detection to the header region; verify the bypass header against current Jina docs.