Skip to content

fix(tools): read_url third-party disclosure + cache opt-out#122

Merged
warren618 merged 2 commits into
HKUDS:mainfrom
Teerapat-Vatpitak:pr/readurl-privacy
May 16, 2026
Merged

fix(tools): read_url third-party disclosure + cache opt-out#122
warren618 merged 2 commits into
HKUDS:mainfrom
Teerapat-Vatpitak:pr/readurl-privacy

Conversation

@Teerapat-Vatpitak
Copy link
Copy Markdown
Contributor

Summary

Disclose the read_url third-party dependency (r.jina.ai) in the skill doc, add a no_cache opt-out, and sanitize upstream errors so the vendor name and response body are not leaked to the caller (kept in server logs).

Why

read_url silently routed URLs through a third party and could return stale cached snapshots with no disclosure or opt-out.

Changes

  • src/skills/web-reader/SKILL.md, src/tools/web_reader_tool.py

Test Plan

  • pytest --ignore=agent/tests/e2e_backtest -q green; requests mocked
  • Asserts default path byte-identical + opt-out header behavior

Checklist

  • No protected-area changes
  • No hardcoded values
  • Follows CONTRIBUTING.md

Follow-up: anchor the cached-marker detection to the header region; verify the bypass header against current Jina docs.

Teerapat-Vatpitak and others added 2 commits May 16, 2026 10:53
read_url silently sent the full URL (incl. query string) to the
third-party Jina Reader (r.jina.ai), could serve a stale cache with
no opt-out, and HTTP/exception errors leaked the vendor name and
response body (P11).

Disclose the dependency, caching/staleness, and the bash-fallback
caveat in the web-reader SKILL.md; add an opt-in no_cache flag
(x-no-cache header — default path byte-identical); surface a stale
snapshot via an additive `cached: true`; sanitize HTTP/exception
errors to a generic message (detail kept in the server log only).
A local direct-fetch fallback needs SSRF hardening first and is a
deliberate out-of-scope follow-up.

(cherry picked from commit eaf91128370dd9800195961a55baf5b123162012)
Restore the response body / exception text in read_url error returns so
operators can diagnose a remote failure (HTTP 451, transport errors, etc.)
without grepping logs. Keeps the new logger.warning lines from the parent
commit intact — both paths now surface debug info.

Adjusts the two affected regression tests to assert the debug info is
included rather than scrubbed.
@warren618 warren618 marked this pull request as ready for review May 16, 2026 06:50
@warren618 warren618 merged commit 1b547d2 into HKUDS:main May 16, 2026
1 check passed
@warren618
Copy link
Copy Markdown
Collaborator

Merged — thanks.

One tweak on top before squash: rolled back the HTTP/exc error sanitization in web_reader_tool.py (kept your logger.warning). Vibe-Trading runs as a local stdio MCP, so the caller is the user's own agent — scrubbing the body / exc text just removes debug signal. Tests updated to match.

@Teerapat-Vatpitak Teerapat-Vatpitak deleted the pr/readurl-privacy branch May 17, 2026 17:36
ykykj pushed a commit to ykykj/Vibe-Trading that referenced this pull request May 23, 2026
)

- SKILL.md now explicitly notes that read_url forwards the full URL to the third-party Jina Reader (r.jina.ai), and warns against passing credentials or internal URLs.
- New optional no_cache=True parameter sends x-no-cache to bypass Jina's cache when freshness matters.
- Responses that contain Jina's cached-snapshot marker now surface a cached: true flag.
- Error paths keep the upstream HTTP body and exception text in the returned error string (and also log them via logger.warning) so operators can diagnose remote failures without grepping logs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants