Skip to content

fix(agent): decode \uXXXX (+surrogate pairs) in CDP string results#132

Open
vmemjp wants to merge 1 commit intojustrach:mainfrom
vmemjp:fix/unescape-json-uxxxx
Open

fix(agent): decode \uXXXX (+surrogate pairs) in CDP string results#132
vmemjp wants to merge 1 commit intojustrach:mainfrom
vmemjp:fix/unescape-json-uxxxx

Conversation

@vmemjp
Copy link
Copy Markdown

@vmemjp vmemjp commented Apr 22, 2026

Summary

unescapeJson in src/agent_main.zig passed \uXXXX sequences through as
literals, so kuri-agent eval / text / autoSnap emitted
\u30b3\u30f3... instead of コンテンツ on any non-ASCII page.

Fix

Moved the helper into src/util/json.zig as jsonUnescape:

  • Parse \uXXXX → UTF-8 via std.unicode.utf8Encode
  • Handle UTF-16 surrogate pairs
  • Handle \b \f \r (matching jsonEscape in the same file)
  • Permissive on malformed input (matches jsonEscape's style)

Three call sites in agent_main.zig (autoSnap, cmdEval, cmdText)
migrate to the shared helper.

Tests

11 inline tests in src/util/json.zig covering basic escapes, BMP
\uXXXX, surrogate pairs, and five malformed shapes. zig build test
passes.

Measured impact

kuri-agent eval document.body.innerText on the Japanese Wikipedia
article for Zig (ja.wikipedia.org/wiki/Zig_(プログラミング言語)),
built with -Doptimize=ReleaseFast:

Binary Output Chars Wall time (median of 3)
main 15,753 bytes 15,753 47 ms
this branch 9,814 bytes 5,868 46 ms

Scope

Only the kuri-agent output path. src/crawler/fetcher.zig has a parallel
unescapeJson with the same gap — happy to migrate it in a follow-up.
Validated with Zig 0.15.2 (build.zig.zon declares 0.15.0 minimum).

@justrach
Copy link
Copy Markdown
Owner

Can you update this PR to build and test against Zig 0.16.0? The active release/CI path is now targeting 0.16.0, so it would help to rebase and rerun on that toolchain before review.

`unescapeJson` in `src/agent_main.zig` passed `\uXXXX` through as
literal escape sequences, so `kuri-agent eval` / `text` / `autoSnap`
emitted `コン...` instead of `コンテンツ` on any non-ASCII
page.

Moved the helper into `src/util/json.zig` as `jsonUnescape`, added
`\b \f \r \uXXXX` + UTF-16 surrogate pair handling (permissive on
malformed input, matching `jsonEscape`'s style), and migrated the
five call sites in `agent_main.zig` (`autoSnap`, `cmdEval`, `cmdText`,
`cmdHeaders`, `cmdAudit`). 11 new inline tests; `zig build` succeeds
and `src/util/json.zig` tests pass on Zig 0.16.0.

Measured on ja.wikipedia.org/wiki/Zig_(プログラミング言語) with
`-Doptimize=ReleaseFast`: `kuri-agent eval document.body.innerText`
went from 15,753 → 9,814 bytes (1.60× fewer bytes, 2.68× fewer chars),
wall time 47 → 46 ms (median of 3).

Rebased onto upstream/main (e1cc5c4) which migrated to Zig 0.16.
Merged against the new `compat.writeToStdout` stdout helper.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vmemjp vmemjp force-pushed the fix/unescape-json-uxxxx branch from 88c7ffe to 0134fd0 Compare April 23, 2026 06:52
@vmemjp
Copy link
Copy Markdown
Author

vmemjp commented Apr 23, 2026

Rebased onto upstream/main (e1cc5c4) — the Zig 0.16 migration from #134 and
release/0.3.1 have landed in the meantime. Current HEAD: 0134fd0.

Scope note

Grew from 3 → 5 call sites. The upstream merges added two new sites
(cmdHeaders, cmdAudit) that also depended on the inline unescapeJson
helper; since this PR removes that helper and moves its replacement into
src/util/json.zig, they had to migrate in the same commit. Output paths
also moved to compat.writeToStdout(...) from the 0.16 migration.

Verified on Zig 0.16.0

  • zig build -Doptimize=ReleaseFast — all five binaries build
    (kuri, kuri-agent, kuri-browse, kuri-fetch, merjs-e2e)
  • zig test src/util/json.zig — 13/13 pass (2 jsonEscape + 11 jsonUnescape)

Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants