Skip to content

fix(core): allow web_fetch JSON fallback#5660

Open
tt-a1i wants to merge 1 commit into
QwenLM:mainfrom
tt-a1i:fix/web-fetch-json-accept
Open

fix(core): allow web_fetch JSON fallback#5660
tt-a1i wants to merge 1 commit into
QwenLM:mainfrom
tt-a1i:fix/web-fetch-json-accept

Conversation

@tt-a1i

@tt-a1i tt-a1i commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What this PR does

This PR loosens web_fetch content negotiation by adding a low-priority */*;q=0.1 fallback to every Accept header the tool sends. The existing preferences stay intact: auto still prefers markdown, then HTML, then plain text, while explicit markdown, html, and text formats still prefer their requested media type first.

It also updates the web-fetch developer docs to describe the new headers and adds coverage for JSON content returned through fallback negotiation.

Why it's needed

web_fetch currently sends only text/* media types. JSON-only APIs such as GitHub REST reject that request with HTTP 415, so the tool cannot fetch API responses even though the downstream processing path already normalizes fetched content to text.

Adding */*;q=0.1 keeps the token-saving markdown path for servers that support it, but lets JSON APIs and other non-text endpoints respond when they cannot satisfy the preferred text formats.

Reviewer Test Plan

How to verify

Run the focused unit tests and checks:

cd packages/core
npx vitest run src/tools/web-fetch.test.ts --coverage.enabled=false
cd ../..
npx prettier --check packages/core/src/tools/web-fetch.ts packages/core/src/tools/web-fetch.test.ts docs/developers/tools/web-fetch.md
npx eslint packages/core/src/tools/web-fetch.ts packages/core/src/tools/web-fetch.test.ts
npm run build --workspace @qwen-code/qwen-code-core
npm run typecheck --workspace @qwen-code/qwen-code-core
git diff --check

Manual content-negotiation check against the GitHub REST example from the issue:

curl -sS -o /tmp/qwen-web-fetch-json-before.out -w '%{http_code}\n' -H 'Accept: text/markdown, text/html, text/plain' https://api.github.com/repos/openai/codex/releases/tags/rust-v0.92.0
curl -sS -o /tmp/qwen-web-fetch-json-after.out -w '%{http_code}\n' -H 'Accept: text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1' https://api.github.com/repos/openai/codex/releases/tags/rust-v0.92.0

Evidence (Before & After)

Before: the current auto header, Accept: text/markdown, text/html, text/plain, returns HTTP 415 for the GitHub REST release endpoint.

After: the new auto header, Accept: text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1, returns HTTP 200; the response includes published_at: 2026-01-27T11:50:52Z.

Focused test result: src/tools/web-fetch.test.ts passed with 19 tests.

Tested on

OS Status
🍏 macOS ✅ tested
🪟 Windows ⚠️ not tested
🐧 Linux ⚠️ not tested

Environment (optional)

Node.js 22 local checkout. No sandbox required for the focused unit tests.

Risk & Scope

  • Main risk or tradeoff: Some servers may return a non-text content type when they cannot satisfy the preferred formats; this is intentional and low priority because */* has q=0.1, and existing processing already converts fetched content to text before prompting the model.
  • Not validated / out of scope: No change to retry behavior, timeouts, proxy handling, or the side-query summarization path.
  • Breaking changes / migration notes: None expected. Preferred markdown/HTML/plain-text behavior is preserved.

Linked Issues

Fixes #5611

AI Assistance Disclosure

I used Codex to review the changes, sanity-check the implementation against existing patterns, and help spot potential edge cases.

中文说明

What this PR does

这个 PR 放宽了 web_fetch 的内容协商逻辑:给工具发送的每一种 Accept 头都加上低优先级的 */*;q=0.1 兜底。原有偏好保持不变:auto 仍然优先请求 markdown,然后是 HTML,再然后是纯文本;显式的 markdownhtmltext 格式也仍然优先请求对应媒体类型。

同时更新了 web-fetch 开发者文档,说明新的 header,并补了一个通过兜底协商拿到 JSON 内容的测试覆盖。

Why it's needed

当前 web_fetch 只发送 text/* 媒体类型。像 GitHub REST 这样的纯 JSON API 会用 HTTP 415 拒绝这类请求,所以工具无法抓取 API 响应,尽管下游处理流程本来就会把抓取到的内容归一化成文本。

添加 */*;q=0.1 后,支持 markdown 的服务器仍然会走节省 token 的 markdown 路径;但当 JSON API 或其他非文本端点无法满足首选文本格式时,也可以正常返回内容。

Reviewer Test Plan

How to verify

运行 focused 单测和检查:

cd packages/core
npx vitest run src/tools/web-fetch.test.ts --coverage.enabled=false
cd ../..
npx prettier --check packages/core/src/tools/web-fetch.ts packages/core/src/tools/web-fetch.test.ts docs/developers/tools/web-fetch.md
npx eslint packages/core/src/tools/web-fetch.ts packages/core/src/tools/web-fetch.test.ts
npm run build --workspace @qwen-code/qwen-code-core
npm run typecheck --workspace @qwen-code/qwen-code-core
git diff --check

针对 issue 里的 GitHub REST 示例做手动内容协商检查:

curl -sS -o /tmp/qwen-web-fetch-json-before.out -w '%{http_code}\n' -H 'Accept: text/markdown, text/html, text/plain' https://api.github.com/repos/openai/codex/releases/tags/rust-v0.92.0
curl -sS -o /tmp/qwen-web-fetch-json-after.out -w '%{http_code}\n' -H 'Accept: text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1' https://api.github.com/repos/openai/codex/releases/tags/rust-v0.92.0

Evidence (Before & After)

修改前:当前 auto header,Accept: text/markdown, text/html, text/plain,访问 GitHub REST release endpoint 返回 HTTP 415

修改后:新的 auto header,Accept: text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1,返回 HTTP 200;响应里包含 published_at: 2026-01-27T11:50:52Z

focused 测试结果:src/tools/web-fetch.test.ts 通过,共 19 个测试。

Tested on

OS Status
🍏 macOS ✅ tested
🪟 Windows ⚠️ not tested
🐧 Linux ⚠️ not tested

Environment (optional)

Node.js 22 本地 checkout。focused 单测不需要 sandbox。

Risk & Scope

  • 主要风险或取舍:当服务器无法满足首选格式时,可能返回非文本内容类型;这是有意的低优先级兜底,因为 */* 的权重是 q=0.1,且现有处理流程已经会在把内容交给模型前归一化为文本。
  • 未验证 / 不在范围内:不修改 retry、timeout、proxy,也不修改 side-query summarization 路径。
  • 破坏性变更 / 迁移说明:预计没有。首选 markdown/HTML/plain-text 的行为仍然保留。

Linked Issues

Fixes #5611

AI Assistance Disclosure

I used Codex to review the changes, sanity-check the implementation against existing patterns, and help spot potential edge cases.

@tt-a1i tt-a1i marked this pull request as ready for review June 22, 2026 16:46
@wenshao

wenshao commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

@qwen-code /triage

@qwen-code-ci-bot

qwen-code-ci-bot commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Thanks for the PR, @tt-a1i!

Template looks good ✓ — all required sections present including Reviewer Test Plan, Risk & Scope, Linked Issues, and AI Assistance Disclosure.

On direction: this is a clear, real bug — web_fetch sends only text/* media types, causing JSON-only APIs like GitHub REST to return HTTP 415. Adding a low-priority */*;q=0.1 fallback is standard HTTP content negotiation and squarely within the tool's core mission. Fixes #5611.

On approach: minimal and focused — four Accept header string changes in one switch, a new passthrough branch for non-HTML/non-text content types (so JSON isn't run through the HTML converter), one new test plus four header assertion updates, and matching doc updates. No scope creep, no drive-by refactors. The auto header also gets explicit q-values (text/markdown 1.0 > text/html 0.9 > text/plain 0.8 > */* 0.1), which actually strengthens markdown preference over the old implicit-equal weighting — a nice side improvement.

Moving on to code review. 🔍

中文说明

感谢贡献,@tt-a1i

模板完整 ✓ — 所有必需章节齐全,包括 Reviewer Test Plan、Risk & Scope、Linked Issues 和 AI Assistance Disclosure。

方向:这是一个明确的真实 bug —— web_fetch 只发送 text/* 媒体类型,导致 GitHub REST 等纯 JSON API 返回 HTTP 415。添加低优先级 */*;q=0.1 兜底是标准 HTTP 内容协商,完全在工具核心职责范围内。Fixes #5611

方案:最小化且聚焦 —— 一个 switch 里的四处 Accept 头字符串修改、为非 HTML/非文本内容类型新增一个直通分支(避免 JSON 被 HTML 转换器破坏)、一个新测试加四个 header 断言更新、以及配套的文档更新。没有范围蔓延,没有顺手重构。auto 头还加上了显式 q 值(text/markdown 1.0 > text/html 0.9 > text/plain 0.8 > */* 0.1),实际上强化了对 markdown 的偏好(旧版是隐式等权)—— 一个不错的附带改进。

进入代码审查 🔍

Qwen Code · qwen3.7-max

@qwen-code-ci-bot qwen-code-ci-bot added type/bug Something isn't working as expected category/tools Tool integration and execution scope/web-search Web search functionality status/in-review This issue is currently in review. labels Jun 22, 2026
@qwen-code-ci-bot

qwen-code-ci-bot commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Code Review

Independent proposal (before reading the diff): I'd add */*;q=0.1 to each Accept header in getAcceptHeader(), add a passthrough branch for non-HTML content types (so application/json isn't fed to the HTML-to-text converter), pin the header strings in tests, and update docs.

The PR does exactly this. Implementation is clean and correct — four string changes in one switch, one new else if (contentType.includes('text/html')) branch that replaces the catch-all else, and one new test.

No critical issues found. One non-blocking observation: the HTML path changes from truncate-after-convert (convert(text).substring(0, MAX)) to truncate-before-convert (convert(text.substring(0, MAX))). This bounds html-to-text work on huge documents — reasonable tradeoff — but means very large HTML pages (>100 KB source) yield slightly less converted text than before.

Reuse check: the change modifies existing logic in-place, no new utilities or parallel code introduced. ✓

Automated Checks

Check Result
Unit tests (vitest run web-fetch.test.ts) ✅ 19/19 pass
Mutation test (revert source, keep PR tests) ✅ 5/19 fail — 4 header assertions + JSON passthrough
Prettier ✅ clean
ESLint ✅ clean
TypeScript (tsc --noEmit, core) ✅ 0 errors

Real-Scenario Testing (tmux)

Drove the exact content-negotiation scenario from #5611 against both a local mock JSON-only server and the real api.github.com endpoint. The Accept header is the only variable:

=== Mock JSON-only API listening on :18765 ===

--- BEFORE (old Accept header, no */* fallback) ---
[SERVER] Received Accept: text/markdown, text/html, text/plain
[SERVER] No wildcard or application/json in Accept → 415
HTTP 415 Unsupported Media Type
Result: FAILED — tool would return WEB_FETCH_FALLBACK_FAILED

--- AFTER (new Accept header with */*;q=0.1 fallback) ---
[SERVER] Received Accept: text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1
[SERVER] Accept has wildcard → 200 with JSON
HTTP 200 OK
Body: {"published_at":"2026-01-27T11:50:52Z","tag_name":"rust-v0.92.0","body":"<p>Release <b>notes</b></p>"}
Result: SUCCESS — published_at = 2026-01-27T11:50:52Z

--- REAL GitHub API (exact URL from issue #5611) ---
OLD header:
  HTTP 415 Unsupported Media Type
NEW header:
  HTTP 200 OK
  published_at = 2026-01-27T11:50:52Z

Unit tests (from worktree, PR source):

 ✓ src/tools/web-fetch.test.ts (19 tests) 77ms

 Test Files  1 passed (1)
      Tests  19 passed (19)
中文说明

代码审查

独立方案(读 diff 之前):我会在 getAcceptHeader() 的每个 Accept 头里加 */*;q=0.1,为非 HTML 内容类型加一个直通分支(避免 application/json 被 HTML 转换器破坏),在测试里锁定 header 字符串,并更新文档。

PR 完全做到了这些。实现干净正确 —— 一个 switch 里的四处字符串修改、一个新的 else if (contentType.includes('text/html')) 分支替代了原来的兜底 else、一个新测试。

未发现关键问题。 一条非阻塞观察:HTML 路径从「先转换再截断」(convert(text).substring(0, MAX)) 改为「先截断再转换」(convert(text.substring(0, MAX)))。这能限制 html-to-text 在超大文档上的开销——合理取舍——但意味着超大 HTML 页面(源 >100 KB)转换后的文本会比之前略少。

复用检查: 改动在原位修改现有逻辑,没有引入新工具或并行代码。✓

自动化检查

检查项 结果
单元测试 (vitest run web-fetch.test.ts) ✅ 19/19 通过
变异测试(还原源码,保留 PR 测试) ✅ 5/19 失败 —— 4 个 header 断言 + JSON 直通
Prettier ✅ 干净
ESLint ✅ 干净
TypeScript (tsc --noEmit,core) ✅ 0 错误

真实场景测试(tmux)

针对 #5611 的确切内容协商场景,同时测试了本地 mock JSON-only 服务器和真实的 api.github.com 端点。Accept 头是唯一变量:旧头 → 415(复现 bug),新头 → 200 并拿到 published_at = 2026-01-27T11:50:52Z(修复确认)。

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

qwen-code-ci-bot commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

This is a clean, focused bug fix that does exactly what it says. The motivation is real (JSON APIs returning 415), the approach is standard HTTP content negotiation (quality-weighted */* fallback), and the implementation is minimal — four string changes in one switch statement, one new content-type branch, one new test, docs updates. No scope creep, no drive-by refactors.

The mutation test is the strongest evidence I've seen in a while: reverting only the source file while keeping the PR's test file causes exactly 5 tests to fail — the 4 header assertion tests (each diffing old vs new Accept strings) and the JSON passthrough test (which catches that the old catch-all else feeds JSON through the HTML converter, corrupting it). Restoring the PR source turns all 19 green. That's a genuine regression guard, not just a passing test suite.

The tmux test reproduces the exact bug from #5611 against the real GitHub API: old header → HTTP 415, new header → HTTP 200 with published_at = 2026-01-27T11:50:52Z. No mock needed for the ground-truth check.

My independent proposal (add */*;q=0.1, separate the HTML conversion from the catch-all, pin headers in tests) matches the PR's approach exactly. I didn't find a simpler path.

One non-blocking observation carried from Stage 2: the HTML truncation order change (truncate-before-convert) is a reasonable tradeoff but a behavior change for pages >100 KB. Worth knowing about, not worth blocking.

Verdict: approve.

中文说明

这是一个干净、聚焦的 bug 修复,完全做到了所描述的内容。动机真实(JSON API 返回 415),方案是标准 HTTP 内容协商(带权重的 */* 兜底),实现最小化 —— 一个 switch 里的四处字符串修改、一个新的内容类型分支、一个新测试、文档更新。没有范围蔓延,没有顺手重构。

变异测试是我近期见过的最有力的证据:只还原源文件、保留 PR 测试文件,恰好 5 个测试失败 —— 4 个 header 断言测试(每个都 diff 了旧 vs 新的 Accept 字符串)和 JSON 直通测试(捕获了旧的兜底 else 把 JSON 喂给 HTML 转换器导致内容被破坏)。恢复 PR 源码后 19 个全部变绿。这是真正的回归守卫,不只是测试通过。

tmux 测试在真实 GitHub API 上复现了 #5611 的确切 bug:旧头 → HTTP 415,新头 → HTTP 200 并拿到 published_at = 2026-01-27T11:50:52Z。基准事实检查不需要 mock。

我的独立方案(加 */*;q=0.1、把 HTML 转换从兜底中分离出来、在测试里锁定 header)与 PR 方案完全一致。没有找到更简单的路径。

一条非阻塞观察:HTML 截断顺序变化(先截断再转换)是合理取舍,但对 >100 KB 的页面是行为变化。值得了解,不值得阻塞。

结论:批准。

Qwen Code · qwen3.7-max

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks ready to ship. ✅

@qwen-code-ci-bot qwen-code-ci-bot added status/ready-for-merge Ready to be merged and removed status/in-review This issue is currently in review. labels Jun 22, 2026
@wenshao

wenshao commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

✅ Local verification — real tmux E2E (Linux)

Verified as a maintainer on a real Linux box, including a full qwen CLI run driven in tmux. Conclusion: the fix is correct and resolves #5611. LGTM / safe to merge.

Environment: Linux 6.12 x86_64 · Node v22.22.2 · npm 10.9.7 · tmux 3.5a. PR branch is 1 commit behind main, but all 3 touched files are unchanged on main since the fork point and GitHub reports MERGEABLE, so testing the branch in isolation reflects post-merge behavior.


1. Real-world reproduction against the live GitHub API (the actual bug)

curl against the exact endpoint from the issue (api.github.com/.../releases/tags/rust-v0.92.0) — every Accept the tool can send, before vs after:

format Accept (before → after) before after
auto text/markdown, text/html, text/plain…;q=0.9, …;q=0.8, */*;q=0.1 415 200
markdown text/markdowntext/markdown, */*;q=0.1 415 200
html text/htmltext/html, */*;q=0.1 415 200
text text/plaintext/plain, */*;q=0.1 415 200

Confirms the diagnosis: it is purely the Accept header, and the */*;q=0.1 fallback fixes all four formats.

2. Unit tests + non-vacuity

  • vitest run src/tools/web-fetch.test.ts19/19 pass.
  • Differential (proves the tests guard the fix): reverting only web-fetch.ts to the pre-fix version makes exactly the 4 header-assertion tests fail with the precise old→new header diffs; the rest pass.
  • Minor: the new should process JSON content returned by fallback content negotiation test also passes on pre-fix code — it mocks the fetch result and asserts only that JSON reaches the model, so it covers downstream processing, not the header change. The 4 header tests are what actually pin the regression. (Non-blocking — see Notes.)

3. Static checks

prettier --check ✅ · eslint ✅ · tsc --noEmit (core) ✅ · build (core) ✅ · git diff --check
(eslint initially failed on a missing dev-dep eslint-plugin-check-file — a local-install artifact; the PR touches neither eslint config nor manifests. Clean after installing it.)

4. Real E2E #1 — the compiled shipped tool vs a 415-on-text-only server

A local origin server emulating GitHub's content-negotiation (415 unless Accept contains */*) + the actual compiled WebFetchTool from packages/core/dist (only the model side stubbed, so the fetch path is 100% real). A/B = swapping just the 4 compiled Accept strings:

Tool Target Accept the tool sent (server-logged) Result
NEW mock origin (auto) text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1 200 · JSON published_at reached model ✅
NEW mock origin (markdown/html/text) text/<type>, */*;q=0.1 200 each ✅
NEW real api.github.com (auto) (new auto header) 200 · real published_at:2026-01-27T11:50:52Z reached model ✅
OLD mock origin (auto) text/markdown, text/html, text/plain 415 · web_fetch_fallback_failed · model never called ❌
OLD real api.github.com (auto) (old header) 415 (reproduces #5611) ❌

5. Real E2E #2 — the full qwen CLI in tmux

Ran the real CLI (node packages/cli … -p "…" --approval-mode yolo) inside tmux against a mock OpenAI-compatible backend (so no model key needed); the model emits a web_fetch tool-call, the tool runs for real, the result is summarized and answered. A/B on the compiled tool:

Arm Accept origin received HTTP web_fetch → model? CLI final output
NEW …text/plain;q=0.8, */*;q=0.1 200 yes "published on 2026-01-27T11:50:52Z"
OLD text/markdown, text/html, text/plain 415 no "FETCH_FAILED — web_fetch could not retrieve the URL"

Also ran the NEW arm against the real GitHub API end-to-end: the CLI fetched the live JSON and the side-query request to the model carried the real published_at — i.e. #5611's exact scenario now works through the shipped CLI path.


Notes (all non-blocking)

  1. Test precision: consider asserting the Accept header inside the new JSON test so it guards the fix directly (today only the 4 dedicated header tests do).
  2. auto now uses graded q-values. Before, text/markdown, text/html, text/plain were all implicit q=1.0 (equal preference); now markdown 1.0 > html 0.9 > text 0.8 > */* 0.1. This strengthens the markdown preference (good for the token-saving goal), so it's an improvement — just noting the relative ordering changed, not only the catch-all addition.
  3. */*;q=0.1 admits truly binary responses. A server with no text representation can now return e.g. application/octet-stream; the tool routes non-markdown/non-plain types through html-to-text, so binary becomes garbled text rather than a hard error. This matches the PR's stated risk and the "everything is normalized to text" contract — no crash, and strictly better than the previous hard-fail.

Verdict

Behavior is correct and minimal, docs match the code, no other code path hard-codes the old header, and both the real GitHub API and a full CLI run confirm the before/after. Approve / ready to merge.

中文版(点击展开)

✅ 本地验证 —— 真实 tmux 端到端测试(Linux)

作为维护者在真实 Linux 环境完成验证,包含在 tmux 中驱动的完整 qwen CLI 运行。结论:修复正确,解决 #5611,建议合并。

环境: Linux 6.12 x86_64 · Node v22.22.2 · npm 10.9.7 · tmux 3.5a。PR 落后 main 1 个提交,但所改的 3 个文件自分叉点起在 main 上均无改动,且 GitHub 显示 MERGEABLE,因此单独测试该分支可代表合并后的行为。

1. 针对真实 GitHub API 复现(真正的 bug)

curl 访问 issue 中的确切端点,对工具能发送的每一种 Accept 做前后对比:

format Accept(修复前 → 修复后)
auto text/markdown, text/html, text/plain…;q=0.9, …;q=0.8, */*;q=0.1 415 200
markdown text/markdowntext/markdown, */*;q=0.1 415 200
html text/htmltext/html, */*;q=0.1 415 200
text text/plaintext/plain, */*;q=0.1 415 200

证实问题完全出在 Accept 头,*/*;q=0.1 兜底修复了全部四种格式。

2. 单元测试 + 非空泛性

  • vitest run src/tools/web-fetch.test.ts19/19 通过
  • 差分(证明测试确实守护本修复): 仅把 web-fetch.ts 回退到修复前,恰好那 4 个 header 断言测试失败(差异即旧→新 header),其余通过。
  • 小提示:新增的 should process JSON content… 测试在修复前的代码上也能通过——它直接 mock 抓取结果、只断言 JSON 进入模型,覆盖的是下游处理而非 header 改动。真正锁定回归的是那 4 个 header 测试。(非阻塞,见“说明”。)

3. 静态检查

prettier --check ✅ · eslint ✅ · tsc --noEmit(core)✅ · build(core)✅ · git diff --check
(eslint 起初因缺少开发依赖 eslint-plugin-check-file 报错——属本地安装产物;PR 未改 eslint 配置或清单,补装后通过。)

4. 真实端到端 #1 —— 已编译的发布产物工具 vs 仅接受文本即返回 415 的服务器

本地起一个模拟 GitHub 内容协商的源站(Accept 不含 */*415)+ 来自 packages/core/dist真实已编译 WebFetchTool(仅 stub 模型侧,抓取路径 100% 真实)。A/B 仅替换那 4 个已编译的 Accept 字符串:

工具 目标 工具实际发送的 Accept(服务器记录) 结果
模拟源站 (auto) text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1 200 · JSON published_at 抵达模型 ✅
模拟源站 (markdown/html/text) text/<type>, */*;q=0.1 各自 200 ✅
真实 api.github.com (auto) (新 auto 头) 200 · 真实 published_at:2026-01-27T11:50:52Z 抵达模型 ✅
模拟源站 (auto) text/markdown, text/html, text/plain 415 · web_fetch_fallback_failed · 模型未被调用 ❌
真实 api.github.com (auto) (旧头) 415(复现 #5611)❌

5. 真实端到端 #2 —— 在 tmux 中运行完整 qwen CLI

在 tmux 内运行真实 CLI(node packages/cli … -p "…" --approval-mode yolo),后端接一个模拟 OpenAI 兼容服务(无需真实模型 key);模型发出 web_fetch 工具调用,工具真实执行,结果被总结并作答。对已编译工具做 A/B:

分支 源站收到的 Accept HTTP web_fetch→模型? CLI 最终输出
…text/plain;q=0.8, */*;q=0.1 200 "published on 2026-01-27T11:50:52Z"
text/markdown, text/html, text/plain 415 "FETCH_FAILED —— web_fetch 无法获取该 URL"

并用“新”分支对真实 GitHub API跑了完整链路:CLI 抓到线上 JSON,发给模型的 side-query 携带了真实 published_at——即 #5611 的确切场景已在发布 CLI 路径上跑通。

说明(均为非阻塞)

  1. 测试精度: 建议在新增的 JSON 测试里也断言 Accept 头,使其直接守护本修复(目前只有那 4 个专门的 header 测试在守护)。
  2. auto 现在使用带权重的 q 值。 之前 text/markdown, text/html, text/plain 都是隐式 q=1.0(同等优先,由服务器选择);现在 markdown 1.0 > html 0.9 > text 0.8 > */* 0.1。这其实强化了对 markdown 的偏好(契合省 token 目标),属改进——仅提示相对优先级也变了,不只是加了兜底。
  3. */*;q=0.1 会接纳真正的二进制响应。 没有文本表示的服务器现在可能返回如 application/octet-stream;工具会把非 markdown/非 plain 的类型走 html-to-text,于是二进制会变成乱码文本而非直接报错。这符合 PR 自述的风险与“一切归一化为文本”的约定——不会崩溃,且优于此前的直接失败。

结论

行为正确且改动最小,文档与代码一致,没有其他代码路径硬编码旧 header,真实 GitHub API 与完整 CLI 运行都印证了前后差异。批准 / 可合并。

@wenshao

wenshao commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

✅ Maintainer verification — real local E2E test (tmux)

I verified this PR locally by driving the real WebFetchTool (built from the PR head) against both a local GitHub-like server and the live api.github.com endpoint from #5611 — not just by reading the diff. Verdict: LGTM, safe to merge. The fix resolves the 415, the new tests guard it, the token-saving markdown path is preserved, and there is no regression for non-JSON targets.

Setup

  • Dedicated git worktree at PR head 8637730c5, with a real npm ci (no symlinked node_modules); all commands driven through a real tmux session, Node v22.22.2, Linux x64.
  • The harness imports the shipped WebFetchTool from the built core/dist and runs its real execute() path (real fetchWithTimeout, real content-negotiation). Only the downstream LLM call is stubbed via a fake Config — exactly as web-fetch.test.ts does — so the test isolates the fetch behavior the PR changes.

Test 1 — Ground truth: the real GitHub API (the exact URL from #5611)

Accept: text/markdown, text/html, text/plain                          -> HTTP 415   (old auto)
Accept: text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1   -> HTTP 200   (new auto)

Confirms the issue: the bug is purely the over-narrow Accept header, and */*;q=0.1 fixes it.

Test 2 — Real WebFetchTool, end-to-end, BEFORE vs AFTER

Same tool, same JSON-only server (and same real GitHub endpoint), the only difference being this PR's one-line getAcceptHeader() change. The Accept column is the header the server actually received (captured server-side), proving the tool sends what we claim.

format BEFORE (main): header → HTTP → tool AFTER (this PR): header → HTTP → tool
auto text/markdown, text/html, text/plain415ERROR text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1200SUCCESS
markdown text/markdown415ERROR text/markdown, */*;q=0.1200SUCCESS
html text/html415ERROR text/html, */*;q=0.1200SUCCESS
text text/plain415ERROR text/plain, */*;q=0.1200SUCCESS
  • BEFORE returns WEB_FETCH_FALLBACK_FAILED for every format; against real GitHub it reproduces the issue verbatim: Error during fetch … Request failed with status code 415 Unsupported Media Type.
  • AFTER, every format succeeds and the JSON content actually reaches the model prompt. Against real GitHub, the tool now feeds published_at: 2026-01-27T11:50:52Z to the model — the exact value the HTML page couldn't provide.

Test 3 — Unit tests + A/B test-integrity

  • On the PR: web-fetch.test.ts19/19 pass, including the new should process JSON content returned by fallback content negotiation and the four should prefer … header assertions.
  • Revert only the getAcceptHeader() change back to the text/*-only strings → exactly 4 tests fail, each diffing the new header (expected) against the old one (received), e.g.:
    - "Accept": "text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1"   (expected, PR)
    + "Accept": "text/markdown, text/html, text/plain"                          (received, reverted)
    
    Restoring the fix turns them green again → the four header tests are real regression guards.
  • Minor note (non-blocking): the new …JSON content… test still passes under the reverted code, because it mocks fetchWithTimeout to return JSON regardless of the header — so it guards the JSON→model processing path, while the four should prefer/default tests guard the header strings. Both aspects are covered, just by different tests.

Test 4 — No regression: the token-saving markdown path is preserved

The one real risk is "could the new */* fallback make a markdown-capable server return something other than markdown?" Drove the real tool (auto) against a server that supports both text/markdown and application/json:

auto vs markdown+json server -> tool SUCCESS; served = text/markdown (token-saving path preserved ✓)

Because */* is q=0.1 (strictly the lowest), a compliant server still returns the most-preferred type it supports; */* only activates when nothing else matches. No change for normal HTML pages, markdown-capable servers, raw text files, or servers that ignore Accept.


Conclusion: the real tool now fetches JSON REST APIs (fixing #5611) across all four format values, the markdown token-saving path is untouched, and the suite genuinely guards the change. 👍

🇨🇳 中文版(合并参考)

✅ 维护者验证 —— 本地真实端到端测试(tmux)

我在本地用真实的 WebFetchTool(基于 PR head 构建)同时针对一个本地 GitHub 风格服务器#5611 里真实的 api.github.com 接口进行了验证,不只是看 diff。结论:LGTM,可以合并。 修复解决了 415、新测试能守住它、节省 token 的 markdown 路径被保留、且对非 JSON 目标没有回归。

环境

  • 在 PR head 8637730c5 上单独开了 git worktree,执行了真实 npm ci(没有软链 node_modules);所有命令都在真实 tmux 会话里跑,Node v22.22.2,Linux x64。
  • 探针从构建产物 core/dist 里 import 线上同款 WebFetchTool,跑它真实的 execute() 路径(真实 fetchWithTimeout、真实内容协商)。只有下游 LLM 调用被 fake Config 打桩——和 web-fetch.test.ts 完全一致——从而把本 PR 改动的抓取行为单独隔离出来验证。

测试 1 —— 基准事实:真实 GitHub API(#5611 的原始 URL)

Accept: text/markdown, text/html, text/plain                          -> HTTP 415   (旧 auto)
Accept: text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1   -> HTTP 200   (新 auto)

证实了 issue:bug 完全在于过窄的 Accept 头,*/*;q=0.1 修好了它。

测试 2 —— 真实 WebFetchTool 端到端,修复前 vs 修复后

同一个工具、同一个 JSON-only 服务器(以及同一个真实 GitHub 接口),唯一区别就是本 PR 对 getAcceptHeader() 的一行改动。Accept 列是服务器实际收到的头(服务器侧抓取),证明工具确实发送了我们所说的内容。

format 修复前(main):header → HTTP → tool 修复后(本 PR):header → HTTP → tool
auto text/markdown, text/html, text/plain415ERROR text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1200SUCCESS
markdown text/markdown415ERROR text/markdown, */*;q=0.1200SUCCESS
html text/html415ERROR text/html, */*;q=0.1200SUCCESS
text text/plain415ERROR text/plain, */*;q=0.1200SUCCESS
  • 修复前每种 format 都返回 WEB_FETCH_FALLBACK_FAILED;针对真实 GitHub 复现了 issue 原文:Error during fetch … Request failed with status code 415 Unsupported Media Type
  • 修复后每种 format 都成功,且 JSON 内容真正进入了模型 prompt。针对真实 GitHub,工具现在会把 published_at: 2026-01-27T11:50:52Z 喂给模型——正是 HTML 页面无法提供的那个值。

测试 3 —— 单测 + A/B 测试有效性

  • 在 PR 上:web-fetch.test.ts19/19 通过,包含新增的 should process JSON content returned by fallback content negotiation 和四个 should prefer … 头断言。
  • getAcceptHeader() 还原成 text/*-only 字符串 → 恰好 4 个测试失败,每个都把新 header(expected)和旧 header(received)做了 diff,例如:
    - "Accept": "text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1"   (expected, PR)
    + "Accept": "text/markdown, text/html, text/plain"                          (received, 还原后)
    
    恢复修复后又变绿 → 这四个 header 测试是真正的回归守卫。
  • 小提示(不阻塞):新增的 …JSON content… 测试在还原代码后仍然通过,因为它把 fetchWithTimeout mock 成无论 header 如何都返回 JSON——所以它守的是 JSON→模型的处理路径,而四个 should prefer/default 测试守的是 header 字符串。两方面都覆盖了,只是由不同测试负责。

测试 4 —— 无回归:节省 token 的 markdown 路径被保留

唯一的真实风险是"新的 */* 兜底会不会让支持 markdown 的服务器返回 markdown 以外的东西?"用真实工具(auto)针对一个同时支持 text/markdownapplication/json 的服务器:

auto vs markdown+json server -> tool SUCCESS; served = text/markdown(节省 token 的路径被保留 ✓)

因为 */* 的权重是 q=0.1(严格最低),合规服务器仍会返回它能提供的最优先类型;*/* 只在没有其它匹配时才生效。普通 HTML 页面、支持 markdown 的服务器、原始文本文件、以及忽略 Accept 的服务器都不受影响。


结论: 真实工具现在能在全部四种 format 下抓取 JSON REST API(修复 #5611),markdown 节省 token 路径未受影响,且测试套件确实守住了这个改动。👍

headers: new Headers({ 'content-type': 'application/json' }),
text: () =>
Promise.resolve(
JSON.stringify({ published_at: '2026-01-27T11:50:52Z' }),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The JSON fallback test uses a fixture with no angle brackets or HTML entities, so it doesn't exercise the corruption path. In web-fetch.ts:128-135, the else branch feeds non-markdown, non-plain-text content through html-to-text's convert(). With the new */*;q=0.1 Accept header, application/json responses now hit this branch.

I verified empirically that convert() corrupts JSON in two ways:

  • Angle bracket stripping: convert('{"body":"Fix <script> injection"}'){"body":"Fix — everything after <script> is consumed as an HTML element and discarded
  • Entity decoding: convert('{"desc":"Use &amp; for ampersand"}'){"desc":"Use & for ampersand"}

This directly undermines the PR's goal of supporting JSON-only APIs. Real-world API responses (GitHub releases, CMS APIs) routinely contain HTML in string values.

Suggested fix — source: Flip the else branch to only convert actual HTML:

} else if (contentType.includes('text/html')) {
  this.debugLogger.debug('[WebFetchTool] Converting HTML to text');
  textContent = convert(responseText, { ... }).substring(0, MAX_CONTENT_LENGTH);
} else {
  this.debugLogger.debug(`[WebFetchTool] Passing through ${contentType || 'unknown'} content as text`);
  textContent = responseText.substring(0, MAX_CONTENT_LENGTH);
}

Suggested fix — test: Use a fixture with angle brackets and HTML entities:

text: () => Promise.resolve(JSON.stringify({
  body: '<p>Release <b>notes</b></p>',
  desc: 'Use &amp; for ampersand',
})),

Then assert expect(receivedContent).toContain('<p>') and expect(receivedContent).toContain('&amp;').

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covered this in the latest push. The JSON fixture now includes <p>... and &amp;, and the assertion checks those survive unchanged so this catches the html-to-text corruption path.

case 'auto':
default:
return 'text/markdown, text/html, text/plain';
return 'text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1';

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The */* fallback invites large non-HTML responses (JSON API payloads, XML feeds) into the else branch at line 128, which calls convert(responseText, {...}).substring(0, MAX_CONTENT_LENGTH) — running html-to-text on the full response body before truncating. The markdown and plain-text branches truncate first (responseText.substring(0, MAX_CONTENT_LENGTH)). A multi-MB JSON response would be fully parsed by the HTML converter before being truncated to 100K chars, wasting CPU and memory.

Suggested change
return 'text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1';
return 'text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1';

No source change needed here, but the else branch should truncate before converting for consistency:

textContent = convert(responseText.substring(0, MAX_CONTENT_LENGTH), { ... });

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this path. JSON/unknown content types now skip html-to-text and pass through as text, while text/html is the only branch that converts. I also changed the JSON fixture to include <p> and &amp; so the test catches conversion corruption.

@wenshao

wenshao commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

✅ Local verification report (maintainer)

Built and tested this PR locally to confirm the fix before merge.

Environment

  • macOS (Darwin 25.5), Node v22.22.2
  • PR head 8637730c5MERGEABLE, web-fetch.ts has no drift vs current origin/main, so the PR head is a faithful base.
  • Method: isolated git worktree at the PR head with a clean npm ci + core build, then verified each claim independently.

Results

Check Command Result
Target tests vitest run tools/web-fetch.test.ts ✅ 19/19 pass
Types tsc --noEmit (core) ✅ exit 0, 0 errors
Lint / Format eslint / prettier --check (src + test + docs) ✅ clean

Mutation test (the key evidence). I reverted web-fetch.ts to main while keeping the new tests — exactly the 4 Accept-header assertion tests fail (default/auto, markdown, html, text), because the old headers don't include the */* fallback:

× should default to auto format when not specified
× should prefer markdown when format is markdown
× should prefer HTML when format is html
× should prefer plain text when format is text

The other 15 pass. This confirms the header change is genuinely pinned.

Correctness review

  • The fix appends a low-priority */*;q=0.1 fallback to every Accept header, and gives auto an explicit quality order — text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1. So a JSON-only endpoint (e.g. the GitHub REST API from web_fetch can't fetch JSON APIs — fails with HTTP 415 because it only sends text/* Accept headers #5611) can serve application/json instead of returning HTTP 415, while the existing preference (markdown > html > text) is preserved. Docs and the tool description are updated consistently (request only Xprefer X).

Minor / non-blocking observation. The added should process JSON content returned by fallback content negotiation test mocks fetchWithTimeout to return JSON regardless of the Accept header, so it documents the end-to-end "JSON is processed" outcome but does not itself exercise the 415/negotiation path — the 4 header-assertion tests are the real regression guard for the change. (Verifying the actual 415-avoidance would need a live server, which is reasonably out of scope for a unit test.)

Verdict: LGTM. Correct, minimal, mutation-verified; merges cleanly, type/lint/format clean. Safe to merge from my side.

🇨🇳 中文版

✅ 本地验证报告(维护者)

合并前在本地构建并测试了本 PR 以确认修复。

环境

  • macOS(Darwin 25.5),Node v22.22.2
  • PR head 8637730c5 —— MERGEABLEweb-fetch.ts 相对当前 origin/main 无漂移,PR head 是忠实基线。
  • 方法:在 PR head 上建独立 git worktree,做干净的 npm ci + core 构建,再逐条独立验证。

结果

检查项 命令 结果
目标测试 vitest run tools/web-fetch.test.ts ✅ 19/19 通过
类型 tsc --noEmitcore ✅ exit 0,0 错误
Lint / 格式 eslint / prettier --check(源 + 测试 + 文档) ✅ 干净

变异测试(关键证据)。 我把 web-fetch.ts 还原成 main,保留新测试 —— 恰好 4 个 Accept 头断言测试失败(default/auto、markdown、html、text),因为旧的头不含 */* 回退:

× should default to auto format when not specified
× should prefer markdown when format is markdown
× should prefer HTML when format is html
× should prefer plain text when format is text

其余 15 个通过。说明头部改动被真正锁定。

正确性审查

次要 / 非阻塞观察。 新增的 should process JSON content returned by fallback content negotiation 测试把 fetchWithTimeout mock 成无论 Accept 头如何都返回 JSON,所以它记录的是端到端「JSON 被处理」这个结果,本身并不走 415/内容协商那条路 —— 真正给这个改动兜底的回归保护是那 4 个头断言测试。(要验证真正的避免-415 需要真实服务器,作为单测合理地超出范围。)

结论:LGTM。 正确、最小化、经变异测试验证;可干净合并,类型/lint/格式干净。从我这边看可以安全合并。

@doudouOUC doudouOUC left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review findings on this revision. The prior Critical (JSON corrupted by html-to-text) and Suggestion are both correctly resolved. Downgraded from Approve to Comment: CI failing (Test on macos-latest, Node 22.x).

— qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

@qwen-code /triage

@wenshao

wenshao commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Maintainer local verification — real build + wire-level A/B ✅

I verified this PR locally with the real WebFetchTool source (not mocks of the tool) driven against a local mock server that emulates a JSON-only API like GitHub REST (returns 415 when the Accept header has no wildcard, 200 + application/json when it does). This reproduces the issue's Before/After deterministically, without depending on api.github.com.

Verdict: behaves exactly as described — both the content-negotiation fix and the JSON passthrough work. Looks mergeable. A couple of non-blocking observations below.

Environment

  • PR head 043965f7b in an isolated worktree; base = merge-base d350dd8df. Node v22.22.2, Linux.
  • Method: ran the actual web-fetch.ts via tsx (real fetchWithTimeout, real Accept header on the wire). The only stub is the downstream model call (getBaseLlmClient().generateText), used to capture the exact textContent the tool extracts — the summarization path is unchanged by this PR.

Real wire-level A/B (same harness, web-fetch.ts swapped base↔PR)

Dimension BASE (pre-PR) PR 043965f7b
auto Accept header on the wire text/markdown, text/html, text/plain text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1
JSON API that 415s on text-only HTTP 415 → web_fetch_fallback_failed (never reaches the model) HTTP 200 → success, model received the body and answered published_at = 2026-01-27T11:50:52Z
JSON body handling (forced 200) ❌ run through the HTML converter → mangled: Notes: Fix & ship; keep spacing. (<b>…</b> stripped, &amp;&, double space collapsed) passed through raw/intact: Notes: <b>Fix</b> &amp; ship; keep spacing.

So the headline 415 case is fixed by the low-priority */*;q=0.1, and the new else-passthrough branch stops the HTML converter from corrupting non-HTML payloads. Both confirmed on real wire traffic.

Automated checks (on the real PR source)

  • Unit testsvitest run src/tools/web-fetch.test.ts: 19/19 passed.
  • Prettier--check on all 3 changed files: clean.
  • Type-check0 type errors in the PR's files (web-fetch.ts, web-fetch.test.ts). (The full-package tsc --build reports errors only in unrelated src/providers/__tests__/* files — a pre-existing artifact of my divergent-dist worktree setup, not caused by this PR.)
  • ⚠️ ESLint — not runnable in my environment (eslint-plugin-check-file is missing from node_modules, so the flat config fails to load anywhere). Formatting is covered by Prettier above; CI lint still applies.

Observations (non-blocking, optional follow-ups)

  1. Truncate-before-convert for HTML. This PR changes convert(responseText).substring(0, MAX)convert(responseText.substring(0, MAX), …). For HTML pages larger than MAX_CONTENT_LENGTH (100 KB) this yields somewhat less text than before (you get the converted text of the first 100 KB of source HTML, rather than the first 100 KB of fully-converted text). It's a reasonable trade — it bounds html-to-text work on huge documents — but it is a behavior change for very large pages and isn't covered by a test.
  2. Markup content types other than text/html now pass through raw. e.g. application/xhtml+xml previously went through the HTML→text converter and now hits the new passthrough branch (raw markup to the model). Rare in practice, and the q=0.1 weight makes servers prefer the text formats, so this is low-impact — just noting it as an edge of the new branch.

Neither blocks merge. Nice, focused fix with a real-world payoff for JSON APIs. 🚀

中文版(点击展开)

维护者本地验证 —— 真实构建 + 线级 A/B ✅

我在本地用真实的 WebFetchTool 源码(没有 mock 工具本身)对一个本地 mock 服务器做了验证:该服务器模拟 GitHub REST 这类纯 JSON API 的行为(当 Accept 头里没有通配符时返回 415,有通配符时返回 200 + application/json)。这样可以确定性地复现 issue 里的前后对比,不依赖 api.github.com

结论:行为与描述完全一致 —— 内容协商修复和 JSON 直通都生效,可以合并。 下面有两条不阻塞的观察。

环境

  • PR head 043965f7b,独立 worktree;base = merge-base d350dd8df。Node v22.22.2,Linux。
  • 方法:用 tsx 直接跑真实的 web-fetch.ts(真实 fetchWithTimeout、真实线上 Accept 头)。唯一被 stub 的是下游模型调用(getBaseLlmClient().generateText),用来捕获工具实际抽取出来的 textContent —— 摘要路径本身不在本 PR 改动范围内。

真实线级 A/B(同一套 harness,仅在 base↔PR 之间替换 web-fetch.ts

维度 BASE(PR 之前) PR 043965f7b
auto 模式实际发出的 Accept 头 text/markdown, text/html, text/plain text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1
对 text-only 返回 415 的 JSON API HTTP 415 → web_fetch_fallback_failed(根本到不了模型) HTTP 200 → 成功,模型拿到 body 并回答 published_at = 2026-01-27T11:50:52Z
JSON body 的处理(强制 200) ❌ 被丢进 HTML 转换器 → 被破坏Notes: Fix & ship; keep spacing.<b>…</b> 被剥掉、&amp;&、连续空格被合并) 原样直通Notes: <b>Fix</b> &amp; ship; keep spacing.

所以核心的 415 场景由低优先级的 */*;q=0.1 修复,新增的 else 直通分支也避免了 HTML 转换器破坏非 HTML 内容。两点都在真实线上流量中得到确认。

自动化检查(针对真实 PR 源码)

  • 单元测试 —— vitest run src/tools/web-fetch.test.ts19/19 通过
  • Prettier —— 对 3 个改动文件 --check:干净。
  • 类型检查 —— PR 自身文件(web-fetch.tsweb-fetch.test.ts)0 类型错误。(整包 tsc --build 仅在无关的 src/providers/__tests__/* 报错,这是我 worktree 的 dist 版本不一致导致的既有现象,与本 PR 无关。)
  • ⚠️ ESLint —— 在我的环境里跑不起来(node_moduleseslint-plugin-check-file,flat config 无法加载)。格式问题已由上面的 Prettier 覆盖;CI 的 lint 仍然有效。

观察(不阻塞,可选的后续)

  1. HTML 改为「先截断再转换」。 本 PR 把 convert(responseText).substring(0, MAX) 改成 convert(responseText.substring(0, MAX), …)。对超过 MAX_CONTENT_LENGTH(100 KB)的 HTML 页面,这会比之前拿到更少的文本(拿到的是「源 HTML 前 100 KB」转换后的文本,而不是「完整转换后文本」的前 100 KB)。这是个合理的取舍——能限制 html-to-text 在超大文档上的开销——但对超大页面是一种行为变化,且没有测试覆盖。
  2. text/html 外的标记类内容类型现在会原样直通。 例如 application/xhtml+xml 之前会走 HTML→文本转换,现在落到新的直通分支(把原始标记交给模型)。实际很少见,而且 q=0.1 的权重让服务器更倾向返回文本格式,所以影响很小——只是指出这是新分支的一个边界。

两点都不阻塞合并。是个聚焦、对 JSON API 很实用的小修复。🚀

Verified locally by the maintainer with a real source-level wire harness (build + 19/19 unit tests + base↔PR A/B). Posted via Qwen Code.

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks ready to ship. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category/tools Tool integration and execution scope/web-search Web search functionality status/ready-for-merge Ready to be merged type/bug Something isn't working as expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

web_fetch can't fetch JSON APIs — fails with HTTP 415 because it only sends text/* Accept headers

5 participants