Skip to content

Commit ee2abcd

Browse files
committed
Merge remote-tracking branch 'origin/main' into oss/fix-1208-ask-user-layout
# Conflicts: # autoplan/SKILL.md # canary/SKILL.md # codex/SKILL.md # context-restore/SKILL.md # context-save/SKILL.md # cso/SKILL.md # design-consultation/SKILL.md # design-html/SKILL.md # design-review/SKILL.md # design-shotgun/SKILL.md # devex-review/SKILL.md # document-release/SKILL.md # health/SKILL.md # investigate/SKILL.md # land-and-deploy/SKILL.md # landing-report/SKILL.md # learn/SKILL.md # office-hours/SKILL.md # open-gstack-browser/SKILL.md # pair-agent/SKILL.md # plan-ceo-review/SKILL.md # plan-design-review/SKILL.md # plan-devex-review/SKILL.md # plan-eng-review/SKILL.md # plan-tune/SKILL.md # qa-only/SKILL.md # qa/SKILL.md # retro/SKILL.md # review/SKILL.md # scrape/SKILL.md # scripts/resolvers/preamble/generate-ask-user-format.ts # setup-deploy/SKILL.md # setup-gbrain/SKILL.md # ship/SKILL.md # skillify/SKILL.md # sync-gbrain/SKILL.md # test/fixtures/golden/claude-ship-SKILL.md # test/fixtures/golden/codex-ship-SKILL.md # test/fixtures/golden/factory-ship-SKILL.md
2 parents 8d6bf71 + 7489506 commit ee2abcd

68 files changed

Lines changed: 1172 additions & 94 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,72 @@
11
# Changelog
22

3+
## [1.32.0.0] - 2026-05-10
4+
5+
## **Seven contributor PRs land. Three are security or hardening.**
6+
## **Root-token comparison, IPv6 link-local, NUL transcripts, sidebar tabs, build resilience, model IDs, CJK escape — all fixed in one wave.**
7+
8+
Seven community PRs land together, hand-picked through `/plan-eng-review` plus a Codex outside-voice review that reshaped the wave mid-flight. The headline fixes are real: the root-token authentication path no longer throws on a multibyte input that matches JS character length but not UTF-8 byte length, direct `http://[fe80::N]/` URLs are now rejected the same way ULA addresses already were, `gbrain put` strips NUL bytes from pasted transcript content so Postgres doesn't reject the write, and the build script doesn't tear down when run on a fresh worktree with no git HEAD yet.
9+
10+
Two PRs in the original 9-PR plan got moved to follow-up reviews after Codex caught load-bearing problems: the SVG-XSS fix (#1153) needs a sanitizer integration rebuild, and the hook-command variable swap (#1141) needs runtime verification in plugin + dev-symlink modes. Both will land as their own PRs.
11+
12+
### The numbers that matter
13+
14+
Diff against `main` at v1.31.1.0, measured from the seven landed PRs after eng + Codex review reshaping. The wave is intentionally repo-local — no new dependencies, no risky integration changes.
15+
16+
| Metric | v1.31.1.0 | v1.32.0.0 | Δ |
17+
|---|---|---|---|
18+
| Community PRs landed | 3 | 7 | **+4** |
19+
| Security / hardening fixes | 0 | 3 | **+3** |
20+
| Behavior changes that ship to users | 1 | 7 | **+6** |
21+
| Free tests | 379 | 380 | +1 |
22+
| Memory-ingest tests | 18 | 19 | +1 |
23+
| LOC (excluding mechanical regen) | — | ~150 | — |
24+
| SKILL.md files regenerated (CJK preamble cascade) | — | 35 | — |
25+
| Preamble byte budget | 36,500 | 39,000 | +2,500 |
26+
27+
The seven shipped PRs cover three categories. **Security:** root-token UTF-8 compare hardened, IPv6 link-local blocked, sidebar tab awareness expanded. **Correctness:** gbrain ingestion tolerates pasted-NUL transcripts, build resilient to unborn HEAD. **Polish:** AskUserQuestion preamble forbids `\uXXXX` escaping of CJK characters, eval suite tracks the current Opus model ID.
28+
29+
### What this means for users
30+
31+
If you run `pair-agent` and someone hits your tunnel with a multibyte token guess that happens to match length, the auth path returns false instead of crashing. If a transcript you ingest into `gbrain` has a NUL byte in pasted output, the write succeeds instead of returning `invalid byte sequence`. If you bring up `bun run build` on a brand-new Conductor worktree before the first commit, the build runs to completion. If your sidebar agent watches a tab on a non-localhost site, it now actually sees the URL and title. If you ask Claude a long question in Chinese, you stop getting `\u`-escaped codepoints rendered as nonsense glyphs.
32+
33+
### Itemized changes
34+
35+
#### Added
36+
37+
- **#1257** Extension manifest gets the `tabs` permission. Sidebar tab awareness off-localhost now works — `chrome.tabs.query()` returns real `url`/`title` for sites outside `host_permissions` instead of undefined, so `snapshotTabs` writes real values into `tabs.json` and `active-tab.json` instead of silently skipping. Heads up: this widens the extension's permission scope; users will see the broader prompt on next install. Contributed by @fredchu.
38+
39+
#### Fixed
40+
41+
- **#1416** `isRootToken` constant-time compare hardened. Compares UTF-8 byte lengths via `Buffer.byteLength` before `crypto.timingSafeEqual`, which throws on length-mismatched buffers. A multibyte input whose JS string length matches but byte length differs now returns false instead of crashing on the auth path. Four regression tests cover multibyte byte-length mismatch, extra-prefix length mismatch, same-length last-byte flip, and empty-input-against-set-root. Contributed by @RagavRida.
42+
- **#1411** `gstack-memory-ingest` strips NUL bytes from the transcript body before piping to `gbrain put`. Postgres rejects 0x00 in UTF-8 text columns, and some Claude Code transcripts contain NUL inside pasted content or tool output. The fix uses `body.replace(/\x00/g, "")` so the regex literal stays reviewable in diffs and survives editors that strip control bytes. New regression test reuses the existing fake-gbrain writer harness at `test/gstack-memory-ingest.test.ts:376`. Contributed by @billy-armstrong.
43+
- **#1249** URL validation now blocks direct IPv6 link-local navigation. `fe80::/10` is centralised into `BLOCKED_IPV6_PREFIXES = ['fc', 'fd', 'fe8', 'fe9', 'fea', 'feb']` so `http://[fe80::N]/` is rejected by the same path that already blocked ULA addresses. Previously the link-local guard only fired during AAAA resolution; direct-literal URLs slipped through. Contributed by @hiSandog.
44+
- **#1207** `bun run build` resilient to missing git HEAD. The three chained `.version` writes (`browse/dist`, `design/dist`, `make-pdf/dist`) each now use `{ git rev-parse HEAD 2>/dev/null || true; } > ...`, so an unborn HEAD produces an empty file. `readVersionHash` already returns null on empty/trim, and the CLI's stale-binary check short-circuits on null — the "no version known" path flows through existing null handling without polluting `state.binaryVersion` with a sentinel string. Contributed by @topitopongsala.
45+
- **#1205** AskUserQuestion preamble forbids `\uXXXX` escaping of non-ASCII characters. Adds rule 12 plus a self-check item: models that hand-escape CJK strings get codepoints wrong, so `管理工具` ends up rendered as `㄃3用箱`. Long ≠ escape. Keep characters literal. The new rule cascades through the gen-skill-docs pipeline; 35 SKILL.md files regenerate to pick it up. Contributed by @joe51317-dotcom.
46+
- **#1392** Mechanical bump of remaining `claude-opus-4-6` → `4-7` references across the E2E eval suite. Covers `test/helpers/eval-store.ts` and five `test/skill-e2e-*.test.ts` files. Contributed by @johnnysoftware7.
47+
48+
#### For contributors
49+
50+
- The AskUserQuestion preamble byte budget ratchets from 36,500 → 39,000 to absorb the new CJK rule (rule 12 + self-check item). Generated SKILL.md files for all 35 tier-≥2 skills regenerate as a single mechanical commit.
51+
- Two PRs from the original 9-PR plan moved to follow-up reviews after Codex outside-voice caught load-bearing problems: #1153 (SVG sanitizer) needs the sanitizer integration rebuilt against the current `setTabContent` boundary in `browse/src/write-commands.ts:319` (the original PR removed `.svg` from the allowlist; the right fix is to keep it allowed and sanitize via DOMPurify before `setTabContent`). #1141 (CLAUDE_PLUGIN_ROOT) needs runtime verification in both plugin-installed and dev-symlink modes plus scope expansion to the non-frontmatter shell snippet at `investigate/SKILL.md.tmpl:107`.
52+
- Five gate-tier evals hardened against non-determinism / TTY rendering quirks after the wave's first `test:gate` run surfaced them as flakes (verified pre-existing on `main`, then fixed): `office-hours-builder-wildness` retiers `gate` → `periodic` because LLM-judge creativity scoring belongs in periodic per the tier-classification rules. `plan-design-with-ui` AUQ-detection tail expands 2.5KB → 5KB so the full Step 0 box-rendered AUQ fits inside the regex window. `ask-user-question-format-compliance` budget stretches 300s → 540s (poll), 360s → 600s (PTY session), 420s → 660s (bun wrapper) to accommodate `/plan-ceo-review`'s multi-bash-block preamble on substantive branches. `benchmark-providers` gemini smoke drops the brittle `toContain('ok')` assertion in favor of a shape check on the adapter result. `skillify` scrape-prototype-path accepts JSON shape variants (`results`, `data`, `hits`, bare arrays of `{title, score}` objects) instead of grepping for the literal `"items":[` key.
53+
- Housekeeping: the three source PRs absorbed into v1.31.1.0 (#1242, #1394, #1393) get closed with credit comments pointing at the merge SHA.
54+
55+
## [1.31.1.0] - 2026-05-10
56+
57+
## **Three small community fixes land cleanly.**
58+
## **`/careful` works on macOS again, Codex Step 0 stops colliding, `/make-pdf` setup runs in the right place.**
59+
60+
A short patch wave from three contributors. macOS users who ran `/careful` with `rm -rf node_modules` were silently hitting the warning gate instead of the safe exception path because BSD sed doesn't understand `\s`. The Codex skill's `## Step 0: Check codex binary` header was colliding with the platform-detect prelude that also runs first. `/make-pdf`'s SETUP block was rendered after the Telemetry footer instead of immediately after the Preamble Bash, so `$P` could be referenced before it was set. Each fix is tightly scoped and ships with a regression test (or template ordering invariant) that catches the original failure shape.
61+
62+
This release came out of a contributor-wave triage pass that closed ~75 stale PRs, dropped 11 candidates that needed focused review with specific feedback to each contributor, and lined the survivors through `/plan-eng-review` + Codex outside-voice review before merge. One additional security PR (token-registry timing-safe comparison) was rejected at the codex-review gate after Codex caught a subtle multi-byte UTF-8 buffer-mismatch bug that would have thrown on the auth path instead of returning false; that finding now lives as feedback on the original PR.
63+
64+
### Fixed
65+
66+
- **#1242** `careful/bin/check-careful.sh` uses `[[:space:]]` instead of `\s` in the safe-rm exception regex. macOS sed -E does not support `\s`, which silently broke the exception detection — `rm -rf node_modules` now correctly skips the warning gate on macOS, matching Linux behavior. Removes the `detectSafeRmWorks()` platform-conditional from `test/hook-scripts.test.ts` so both platforms are tested at the same bar. Contributed by @ToraDady.
67+
- **#1394** Codex skill `## Step 0: Check codex binary` renamed to `## Step 0.4: Check codex binary` so the header no longer collides with the new platform-detect prelude (also numbered Step 0). Affects both `codex/SKILL.md.tmpl` and the regenerated `codex/SKILL.md`. Contributed by @mvanhorn.
68+
- **#1393** `/make-pdf` MAKE-PDF SETUP block moves from after the Telemetry footer to right after the Preamble Bash, so `$P` is set before any subsequent step references it. The implementation switches from the `{{MAKE_PDF_SETUP}}` placeholder pattern to programmatic insertion via `generateMakePdfSetup` in `scripts/resolvers/preamble.ts`, gated on `ctx.skillName === 'make-pdf'`. New `make-pdf setup ordering` test in `test/gen-skill-docs.test.ts` asserts the SETUP block sits after the Preamble heading and before Plan Mode / Telemetry / workflow headings. Contributed by @jbetala7.
69+
370
## [1.31.0.0] - 2026-05-09
471

572
## **AskUserQuestion stops getting silently buried in plan files.**

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.31.0.0
1+
1.32.0.0

autoplan/SKILL.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,26 @@ Tool payload rules:
330330
- Ask one decision per tool call when possible; batch at most two related questions/tabs. Sequence independent decisions instead of sending 3+ tabs.
331331
- Do not duplicate the same trade-off text in both `question` and `options[].description`. Prefer putting option-specific trade-offs in `options[].description`.
332332

333+
12. **Non-ASCII characters — write directly, never \u-escape.** When any
334+
string field (question, option label, option description) contains
335+
Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit
336+
the literal UTF-8 characters in the JSON string. **Never escape them
337+
as `\uXXXX`.** Claude Code's tool parameter pipe is UTF-8 native
338+
and passes characters through unchanged. Manually escaping requires
339+
recalling each codepoint from training, which is unreliable for long
340+
CJK strings — the model regularly emits the wrong codepoint (e.g.
341+
writes `\u3103` thinking it is 管 U+7BA1, but `\u3103` is
342+
actually ㄃, so the user sees `管理工具` rendered as `㄃3用箱`).
343+
The trigger is long, multi-line questions with hundreds of CJK
344+
characters: that is exactly when reflexive escaping kicks in and
345+
exactly when miscoding is most damaging. Long ≠ escape. Keep
346+
characters literal.
347+
348+
Wrong: `"question": "請選擇\uXXXX\uXXXX\uXXXX\uXXXX"`
349+
Right: `"question": "請選擇管理工具"`
350+
351+
Only JSON-mandatory escapes remain allowed: `\n`, `\t`, `\"`, `\\`.
352+
333353
### Self-check before emitting
334354

335355
Before calling AskUserQuestion, verify:
@@ -345,6 +365,8 @@ Before calling AskUserQuestion, verify:
345365
- [ ] Tool call has no more than two related questions/tabs
346366
- [ ] No duplicated trade-off text between `question` and `options[].description`
347367
- [ ] You wrote the brief, then called the tool_use payload
368+
- [ ] You are calling the tool, not writing prose
369+
- [ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped
348370

349371

350372
## Artifacts Sync (skill start)

bin/gstack-memory-ingest.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -819,6 +819,11 @@ function gbrainPutPage(page: PageRecord): { ok: boolean; error?: string } {
819819
body,
820820
].join("\n");
821821
}
822+
// Strip NUL bytes — Postgres rejects 0x00 in UTF-8 text columns. Some Claude
823+
// Code transcripts contain NUL inside user-pasted content or tool output, and
824+
// surfacing those as `internal_error: invalid byte sequence` from the brain
825+
// is unhelpful when we can sanitize at write time.
826+
body = body.replace(/\x00/g, "");
822827
try {
823828
execFileSync("gbrain", ["put", page.slug], {
824829
input: body,

browse/src/token-registry.ts

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,20 @@ export function getRootToken(): string {
155155
}
156156

157157
export function isRootToken(token: string): boolean {
158-
return token === rootToken;
158+
// Constant-time compare so a tunnel-reachable caller who can provoke an
159+
// isRootToken() call (e.g., via the 403 "root over tunnel" rejection path)
160+
// can't measure byte-by-byte string-compare timing to recover the token.
161+
// Compare UTF-8 byte lengths (not JS string length) before timingSafeEqual,
162+
// which throws on length-mismatched buffers. A multibyte input whose JS
163+
// string length matches rootToken but whose UTF-8 byte length differs must
164+
// return false on the auth path, not error out.
165+
if (!rootToken) return false;
166+
const tokenBytes = Buffer.byteLength(token, 'utf8');
167+
const rootBytes = Buffer.byteLength(rootToken, 'utf8');
168+
if (tokenBytes !== rootBytes) return false;
169+
const a = Buffer.from(token, 'utf8');
170+
const b = Buffer.from(rootToken, 'utf8');
171+
return crypto.timingSafeEqual(a, b);
159172
}
160173

161174
function generateToken(prefix: string): string {

browse/src/url-validation.ts

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,15 @@ export const BLOCKED_METADATA_HOSTS = new Set([
1919
]);
2020

2121
/**
22-
* IPv6 prefixes to block (CIDR-style). Any address starting with these
23-
* hex prefixes is rejected. Covers the full ULA range (fc00::/7 = fc00:: and fd00::).
22+
* IPv6 prefixes to block (CIDR-style). ULA addresses cover fc00::/7 and
23+
* link-local addresses cover fe80::/10.
2424
*/
25-
const BLOCKED_IPV6_PREFIXES = ['fc', 'fd'];
25+
const BLOCKED_IPV6_PREFIXES = ['fc', 'fd', 'fe8', 'fe9', 'fea', 'feb'];
2626

2727
/**
2828
* Check if an IPv6 address falls within a blocked prefix range.
29-
* Handles the full ULA range (fc00::/7), not just the exact literal fd00::.
29+
* Handles the full ULA range (fc00::/7) and link-local range (fe80::/10),
30+
* not just exact literals like fd00:: or fe80::1.
3031
* Only matches actual IPv6 addresses (must contain ':'), not hostnames
3132
* like fd.example.com or fcustomer.com.
3233
*/
@@ -95,9 +96,7 @@ async function resolvesToBlockedIp(hostname: string): Promise<boolean> {
9596
const v6Check = resolve6(hostname).then(
9697
(addresses) => addresses.some(addr => {
9798
const normalized = addr.toLowerCase();
98-
return BLOCKED_METADATA_HOSTS.has(normalized) || isBlockedIpv6(normalized) ||
99-
// fe80::/10 is link-local — always block (covers all fe80:: addresses)
100-
normalized.startsWith('fe80:');
99+
return BLOCKED_METADATA_HOSTS.has(normalized) || isBlockedIpv6(normalized);
101100
}),
102101
() => false, // ENODATA / ENOTFOUND — no AAAA records, not a risk
103102
);

browse/test/sidebar-tabs.test.ts

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,3 +254,15 @@ describe('manifest: ws permission + xterm-safe CSP', () => {
254254
}
255255
});
256256
});
257+
258+
describe('manifest: live tab awareness needs "tabs" permission', () => {
259+
// Without "tabs", chrome.tabs.query() returns tab objects with undefined
260+
// url/title for any site outside host_permissions (e.g., everything except
261+
// 127.0.0.1). snapshotTabs() then writes empty strings into tabs.json and
262+
// active-tab.json silently skips the write — the sidebar agent loses track
263+
// of what page the user is on. activeTab is too narrow (only after a user
264+
// gesture on the extension action) for background polling.
265+
test('permissions includes "tabs"', () => {
266+
expect(MANIFEST.permissions).toContain('tabs');
267+
});
268+
});

browse/test/token-registry.test.ts

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,39 @@ describe('token-registry', () => {
2828
expect(info!.scopes).toEqual(['read', 'write', 'admin', 'meta', 'control']);
2929
expect(info!.rateLimit).toBe(0);
3030
});
31+
32+
// Regression: the previous fix did a JS string-length short-circuit before
33+
// crypto.timingSafeEqual, but the buffers passed in are UTF-8. A multibyte
34+
// input with matching string length but mismatched byte length would slip
35+
// past the check and crash inside timingSafeEqual. Auth path must return
36+
// false, not error.
37+
it('returns false for a multibyte token whose string length matches but UTF-8 byte length differs', () => {
38+
// 'root-token-for-tests' is 20 ASCII chars (20 bytes).
39+
// 'é'.repeat(20) is 20 chars but 40 UTF-8 bytes.
40+
const multibyte = 'é'.repeat(20);
41+
expect(multibyte.length).toBe('root-token-for-tests'.length);
42+
expect(Buffer.byteLength(multibyte, 'utf8')).not.toBe(
43+
Buffer.byteLength('root-token-for-tests', 'utf8'),
44+
);
45+
expect(() => isRootToken(multibyte)).not.toThrow();
46+
expect(isRootToken(multibyte)).toBe(false);
47+
});
48+
49+
it('returns false for a token that differs only in length (same prefix)', () => {
50+
expect(isRootToken('root-token-for-tests-extra')).toBe(false);
51+
expect(isRootToken('root-token-for-test')).toBe(false);
52+
});
53+
54+
it('returns false for a same-length token that differs only in the last byte', () => {
55+
const expected = 'root-token-for-tests';
56+
const wrong = expected.slice(0, -1) + (expected.endsWith('x') ? 'y' : 'x');
57+
expect(wrong.length).toBe(expected.length);
58+
expect(isRootToken(wrong)).toBe(false);
59+
});
60+
61+
it('returns false for the empty string even when root is set', () => {
62+
expect(isRootToken('')).toBe(false);
63+
});
3164
});
3265

3366
describe('createToken', () => {

browse/test/url-validation.test.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,10 @@ describe('validateNavigationUrl', () => {
9999
await expect(validateNavigationUrl('http://[fc00::]/')).rejects.toThrow(/cloud metadata/i);
100100
});
101101

102+
it('blocks direct IPv6 link-local addresses', async () => {
103+
await expect(validateNavigationUrl('http://[fe80::2]/')).rejects.toThrow(/cloud metadata/i);
104+
});
105+
102106
it('does not block hostnames starting with fd (e.g. fd.example.com)', async () => {
103107
await expect(validateNavigationUrl('https://fd.example.com/')).resolves.toBe('https://fd.example.com/');
104108
});

0 commit comments

Comments
 (0)