Skip to content

Commit be5847c

Browse files
adamtasteslikegoodgarrytansmithjoshuaradubachAZ-1224
authored
Fix gbrain ingest writer and source ID generation issues (#3)
* v1.26.5.0 fix wave: gbrain ingest writer (hybrid frontmatter) + gbrain-valid source ids (#1344) * fix: use correct `gbrain put <slug>` CLI verb in memory ingest `put_page` is the MCP tool name, not a CLI subcommand. The actual gbrain verb is `put <slug>` with content via stdin and tags in YAML frontmatter. Every transcript / memory ingest fails today on clean installs. Switch to the right verb and inject title/type/tags into the frontmatter that buildTranscriptPage / buildArtifactPage already produce. Bundled in the same function: - timeout: 30s → 60s. Auto-link reconciliation hits 30s once the brain has a few hundred pages. - maxBuffer: 1MB → 16MB. Without it Node truncates gbrain's stderr and callers see only `Command failed:` with no detail. - Surface stderr/stdout in the returned error instead of the bare exception. Verified: bun test test/gstack-memory-ingest.test.ts -> 15/15 pass. bun test on the three test files touching this path -> 362/362. * fix(sync-gbrain): generate gbrain-valid source ids for repos with dots or long names `deriveCodeSourceId` previously concatenated the canonicalized remote with only `/` and whitespace stripped, leaving dots from hostnames (`github.com`) and no length cap. gbrain rejects any source id containing characters outside [a-z0-9-] or longer than 32 chars, so `github.com/<org>/<repo>` produced `gstack-code-github.com-<org>-<repo>` (40 chars, plus dots) and registration failed: code source registration failed: Invalid source id "gstack-code-github.com-radubach-platform". Must be 1-32 lowercase alnum chars with optional interior hyphens. Fix: - Drop the host segment (`github.com` is the same for nearly every user and just consumes the 32-char budget). Use only the last two path segments (org-repo). - Sanitize any remaining non-alnum to hyphens, then collapse and trim. - For genuinely long org/repo names that still exceed the budget, keep the tail (most distinctive end of the slug) and append a 6-char sha1 hash for collision resistance. Adds a regression test that spawns the CLI in temp git repos with controlled remotes (dot in hostname, SCP-style, multi-dot host, long names forcing hash-truncation) and asserts every derived id is ≤32 chars and matches the gbrain validator regex. * fix(memory-ingest): hybrid frontmatter writer + tightened gbrain availability probe PR #1328 (merged in the prior commit) correctly injects title/type/tags into the YAML frontmatter that buildTranscriptPage already prepends. But buildArtifactPage emits raw markdown without frontmatter, so design-docs, learnings, and builder-profile-entries were landing in gbrain with empty title/type/tags. Add the no-frontmatter wrap branch so artifact pages get the same metadata the inject branch provides for transcripts. Also bring in gbrainAvailable()'s --help probe (originally proposed in PR #1341 by Alex Medina), with the regex tightened from /(^|\s)put(\s|$)/m to /^\s+put\s/m. Anchoring on the indented subcommand format gbrain's help actually uses keeps the probe from matching "put" appearing as prose in help text, while still failing fast with one clean error if a future gbrain renames or removes the put subcommand. Updates the V1.5 NOTE doc block at the top of the file to describe the current put-via-stdin shape rather than the legacy put_page flag form. Co-Authored-By: Alex Medina <oficina@puntoverdemc.com> * test+fix(memory-ingest): strengthen regression tests, fix inject for malformed-close frontmatter Imports the shim-based regression tests from PR #1341 (Alex Medina) and strengthens them to assert title, type, and tags actually arrive in put stdin — not just `agent: claude-code`. Asserting the metadata fields matches the regression class that's caused this fix wave: writers can "succeed" while metadata is silently lost. The original PR #1341 tests would have passed even with title/type/tags missing. Strengthening the test surfaced a deeper issue. buildTranscriptPage joins frontmatter array elements with "\n" and does not append a trailing newline, so the close fence is "\n---<content>" directly, not "\n---\n". PR #1328's inject branch searched for "\n---\n" and never matched — which means even with PR #1328 alone, transcript pages were landing in gbrain with no title/type/tags. Two-line fix: search for "\n---" only, since the inject lands before the close fence regardless of what follows it. Also imports PR #1341's V1.5 NOTE doc-block update and the section comment refresh so the prose stays accurate against the new writer shape. Co-Authored-By: Alex Medina <oficina@puntoverdemc.com> * fix+test(gbrain-sync): handle empty-slug edge in constrainSourceId, add no-origin and basename-empty regression tests PR #1330 (merged in the prior commit) addressed the dot-in-host and length-overflow cases for source-id derivation, but constrainSourceId silently returned "${prefix}-" when the input sanitized to an empty slug — invalid per gbrain's `^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$` validator on the trailing hyphen. Adds an explicit empty-slug branch that falls back to a sha1-prefixed id ("gstack-code-<6hex>") so the output stays gbrain-valid for every input shape. Two new regression tests cover the corners PR #1330's coverage left exposed: - no-origin fallback: a cwd repo with no `origin` remote configured must still derive a valid id from the basename. - basename-sanitizes-to-empty: a repo whose path basename is all non-alnum (e.g. "___") must produce the hash-only fallback, not an invalid trailing-hyphen id. Both run the CLI inside temp git repos for genuine end-to-end coverage (matches the pattern PR #1330 established for its own four remote-shape cases). Co-Authored-By: Richard Dubach <radubach@gmail.com> * chore: bump VERSION to 1.26.5.0 + CHANGELOG entry for fix wave PATCH bump. Three bug fixes (memory-ingest put_page CLI verb mismatch, hybrid frontmatter writer for transcripts AND artifacts, gbrain-valid source-id derivation for github-hosted repos), no new user capability. CHANGELOG release-summary leads with what users can now do (clean- install transcripts populate the brain, github-hosted repos register code sources) and tabulates before/after numbers from real gbrain v0.25.1 smoke output. Itemized changes credit @smithjoshua, @AZ-1224, and @radubach for the originating PRs plus the additional hybrid branch + strengthened tests added on top per Codex plan-review. * docs(todos): file P2 (gbrain install-pin staleness) + P3 (source-id host-collision) follow-ups Two follow-ups surfaced during the v1.26.5.0 fix-wave plan review. P2 — Issue #1305 part 2: bin/gstack-gbrain-install pins gbrain to v0.18.2 (commit 08b3698) but doesn't move when gstack ships features that depend on newer gbrain ops or schema. Fresh /setup-gbrain on v1.26.x lands users on schema 24 with v1.26 features expecting 32+. Captured for a future fix-wave. P3 — Codex P1.3 from the v1.26.5.0 plan review: deriveCodeSourceId drops the host segment to fit gbrain's 32-char source-id budget, which means github.com/acme/foo and gitlab.com/acme/foo collapse to the same source id. Real but rare; PR #1330 author explicitly considered this and chose budget over cross-host uniqueness. Captured as a long-tail concern. --------- Co-authored-by: Joshua Smith <joshualowellsmith@gmail.com> Co-authored-by: Richard Dubach <radubach@gmail.com> Co-authored-by: Alex Medina <oficina@puntoverdemc.com> * v1.27.0.0 feat: /setup-gbrain Path 4 (remote MCP) + brain → artifacts rename (#1351) * feat: gstack-gbrain-mcp-verify helper for remote MCP probe Probes a remote gbrain MCP endpoint with bearer auth. POSTs initialize, classifies failures into NETWORK / AUTH / MALFORMED with one-line remediation hints, and runs a tools/list capability probe to detect sources_add MCP support (forward-compat for when gbrain ships URL ingest). Token consumed from GBRAIN_MCP_TOKEN env, never argv. Required to set both 'application/json' AND 'text/event-stream' in Accept; that gotcha costs 10 minutes of debugging when missed (regression-tested). Live-verified against wintermute (gbrain v0.27.1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: gstack-artifacts-init + gstack-artifacts-url helpers artifacts-init replaces brain-init with provider choice (gh / glab / manual), per-user gstack-artifacts-$USER repo, HTTPS-canonical storage in ~/.gstack-artifacts-remote.txt, and a "send this to your brain admin" hookup printout. Always prints the command, never auto-executes — gbrain v0.26.x has no admin-scope MCP probe (codex Finding #3). artifacts-url centralizes HTTPS↔SSH/host/owner-repo conversion so callers don't each string-mangle (codex Finding #10). The remote-conflict check in artifacts-init compares at the canonical level so re-running with HTTPS input doesn't trip on a stored SSH URL for the same logical repo. The "URL form not supported" branch prints a two-line clone-then-path form for gbrain v0.26.x; the supported branch is a one-liner with --url ready for when gbrain ships URL ingest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: extend gstack-gbrain-detect with mcp_mode + artifacts_remote Adds two new fields to detect's JSON output: - gbrain_mcp_mode: local-stdio | remote-http | none Resolved via 3-tier fallback (codex Finding D3): claude mcp get --json → claude mcp list text-grep → ~/.claude.json jq read. If Anthropic moves the file format, the first two tiers absorb it. - gstack_artifacts_remote: HTTPS URL from ~/.gstack-artifacts-remote.txt Falls back to ~/.gstack-brain-remote.txt during the v1.27.0.0 migration window so detect doesn't return empty between upgrade and migration. Existing detect tests still pass (15/15). New 19 tests cover every fallback tier independently, plus a schema regression for /sync-gbrain compat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: setup-gbrain Path 4 (remote MCP) + artifacts rename Path 4 lets users paste an HTTPS MCP URL + bearer token and registers it as an HTTP-transport MCP without needing a local gbrain CLI install. The flow: - Step 2 gains a fourth option (Remote gbrain MCP) - Step 4 adds Path 4 sub-flow: collect URL, secret-read bearer, verify via gstack-gbrain-mcp-verify (NETWORK / AUTH / MALFORMED classifier) - Step 5 (local doctor), Step 7.5 (transcript ingest), Step 5a's stdio branch all skip on Path 4 - Step 5a adds an HTTP+bearer registration form: claude mcp add --transport http --header "Authorization: Bearer ..." - Step 7 renamed "session memory sync" → "artifacts sync" and now calls gstack-artifacts-init (which always prints the brain-admin hookup command — no auto-execute, codex Finding #3) - Step 8 CLAUDE.md block branches: remote-http includes URL + server version (never the token); local-stdio keeps engine + config-file - Step 9 smoke test on Path 4 prints the curl-equivalent for post-restart verification (MCP tools aren't visible mid-session) - Step 10 verdict block has separate templates per mode Idempotency: re-running with gbrain_mcp_mode=remote-http already in detect output skips Step 2 entirely and goes to verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: rename gbrain_sync_mode → artifacts_sync_mode (v1.27.0.0 prep) Hard rename, no dual-read alias (codex Finding D4). The on-disk migration script (Phase C, separate commit) renames the config key in users' ~/.gstack/config.yaml and any CLAUDE.md blocks. Touched call sites: - bin/gstack-config defaults + validation + list/defaults output - bin/gstack-gbrain-detect (gstack_brain_sync_mode field still emitted with the same name for downstream-tool compat; reads new key) - bin/gstack-brain-sync, bin/gstack-brain-enqueue, bin/gstack-brain-uninstall - bin/gstack-timeline-log (comment ref) - scripts/resolvers/preamble/generate-brain-sync-block.ts: renames key, branches on gbrain_mcp_mode=remote-http to emit "ARTIFACTS_SYNC: remote-mode (managed by brain server <host>)" instead of the local mode/queue/last_push line (codex Finding #11) - bin/gstack-brain-restore + bin/gstack-gbrain-source-wireup: read ~/.gstack-artifacts-remote.txt with ~/.gstack-brain-remote.txt fallback during the migration window - bin/gstack-artifacts-init: tolerant of unrecognized URL forms (local paths, file://, self-hosted gitea) so test infrastructure and unusual remotes work without canonicalization - test/brain-sync.test.ts: gstack-brain-init → gstack-artifacts-init - test/skill-e2e-brain-privacy-gate.test.ts: artifacts_sync_mode keys - test/gen-skill-docs.test.ts: budget 35K → 36.5K for the new MCP-mode probe in the preamble resolver - health/SKILL.md.tmpl, sync-gbrain/SKILL.md.tmpl: comment + verdict line Hard delete: - bin/gstack-brain-init (replaced by bin/gstack-artifacts-init in v1.27.0.0) - test/gstack-brain-init-gh-mock.test.ts (replaced by gstack-artifacts-init.test.ts) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files after artifacts-sync rename Mechanical regen via \`bun run gen:skill-docs --host all\`. All */SKILL.md files reflect the renamed config key (gbrain_sync_mode → artifacts_sync_mode), the renamed remote-helper file (~/.gstack-artifacts-remote.txt with brain fallback), the renamed init script (gstack-artifacts-init), and the new ARTIFACTS_SYNC: remote-mode status line that fires when a remote-http MCP is registered. Golden fixtures (test/fixtures/golden/*-ship-SKILL.md) refreshed to match the regenerated default-ship output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: v1.27.0.0 migration — gstack-brain → gstack-artifacts rename Journaled, interruption-safe migration. Six steps, each writes to ~/.gstack/.migrations/v1.27.0.0.journal on success; re-entry resumes from the next un-done step. On final success, journal is replaced by ~/.gstack/.migrations/v1.27.0.0.done. Steps: 1. gh_repo_renamed gh/glab repo rename gstack-brain-$USER → gstack-artifacts-$USER (idempotent: detects already-renamed and skips) 2. remote_txt_renamed mv ~/.gstack-brain-remote.txt → artifacts file, rewriting URL path to match the new repo name 3. config_key_renamed sed -i in ~/.gstack/config.yaml flips gbrain_sync_mode → artifacts_sync_mode 4. claude_md_block sed flips "- Memory sync:" → "- Artifacts sync:" in cwd CLAUDE.md and ~/.gstack/CLAUDE.md 5. sources_swapped gbrain sources add NEW (verify) → remove OLD (codex Finding #6: add-before-remove ordering, no downtime window). On remote-MCP mode, prints commands for the brain admin instead of executing. 6. done touchfile + delete journal User opt-out: any "n" or "skip-for-now" answer at the initial prompt writes a marker file that prevents re-prompting; user can re-invoke via /setup-gbrain --rerun-migration. 11 unit tests cover: nothing-to-migrate, GitHub happy path, idempotent re-run, journal-resume mid-flight, remote-MCP print-only path, add-before-remove ordering verification, add-fail → old source stays registered, CLAUDE.md field rewrite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: regression suite + E2E for v1.27.0.0 rename Three new regression tests guard the rename's blast radius (per codex Findings #1, #8, #9, #12): - test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl, test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode); fails CI if any non-allowlisted file references them. - test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has no stale references in any */SKILL.md (the cross-product blind spot). - test/setup-gbrain-path4-structure.test.ts: structural lint over the Path 4 prose contract — STOP gates after verify failure, never-write- token rules, mode-aware CLAUDE.md block, bearer always via env-var. Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs): - test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs an HTTP MCP server, drives the skill via Agent SDK with a stubbed bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets the remote-http block, the secret token NEVER leaks to CLAUDE.md. - test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401; asserts the AUTH classifier hint surfaces, no MCP registration occurs, CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP" rule. touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at gate-tier so CI catches Path 4 regressions on every PR. Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log (legacy gstack-brain-init mentions in headers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * release: v1.27.0.0 — /setup-gbrain Path 4 + brain → artifacts rename Bumps VERSION 1.26.4.0 → 1.27.0.0 (MINOR per CLAUDE.md scale-aware bump guidance: ~1500 line net change including a new path in /setup-gbrain, two new bin helpers, a journaled migration, 59 new tests, and a config key rename across the codebase). CHANGELOG entry covers: Path 4 (Remote MCP) end-to-end, the brain → artifacts rename, the journaled migration, the verify-helper error classifier, the artifacts-init multi-host provider choice. Includes the canonical Garry-voice headline + numbers table + audience close per the release-summary format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: demote setup-gbrain Path 4 E2E to periodic-tier The Agent SDK E2E tests for Path 4 (skill-e2e-setup-gbrain-remote and skill-e2e-setup-gbrain-bad-token) are inherently non-deterministic — the model interprets "follow Path 4 only" prompts flexibly and can skip Step 8 (CLAUDE.md write) or shortcut past the verify helper, which makes the gate-tier assertions flaky. The deterministic gate coverage for Path 4 is in test/setup-gbrain-path4-structure.test.ts: a fast structural lint that catches AUQ-pacing regressions and prose contract drift in <200ms with zero token spend. That test is the right tool for catching the failure mode the gate-tier was meant to guard against. The Agent SDK E2E tests stay available on-demand for periodic-tier runs (EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-setup-gbrain-*.test.ts). Also tightened the verify-error assertion to the literal field shape ("error_class": "AUTH") instead of a substring match that false-matches the parent claude session's "needs-auth" MCP discovery markers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: sync package.json version to 1.27.0.0 VERSION was bumped to 1.27.0.0 in f6ec11eb but package.json was not updated in the same commit. The gen-skill-docs.test.ts assertion "package.json version matches VERSION file" caught the drift. This is the DRIFT_STALE_PKG case the /ship Step 12 idempotency check is designed for; the fix is the documented sync-only repair (no re-bump, package.json synced to existing VERSION). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v1.27.1.0 fix: anti-shortcut clause + gate-tier AskUserQuestion floor tests for all plan-* skills (#1354) * feat(test/helpers): runPlanSkillFloorCheck — minimal AskUserQuestion-floor observer Adds a focused PTY observer that exits at the first non-permission numbered-option render. Catches the May 2026 transcript-bug class (model wrote plan + ExitPlanMode without firing any AUQ) without needing to fingerprint or navigate past the AUQ. Why separate from runPlanSkillCounting: plan-mode AUQs render every option on a single logical line via cursor-positioning escapes that stripAnsi can't simulate, so parseNumberedOptions returns < 2 options and never records a fingerprint. Counting tests work on 25-min budgets because eventually one frame parses cleanly; gate-tier floor tests need to exit early on the first observation. Trades fingerprint precision for early-exit reliability. Also drops COMPLETION_SUMMARY_RE check from this helper — it matches "GSTACK REVIEW REPORT" anywhere in the buffer including when the agent does recon by reading existing plan files. plan_ready (claude's actual "Ready to execute" confirmation) is the reliable terminal signal for "agent finished without asking." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(resolvers): generateAntiShortcutClause shared resolver Adds {{ANTI_SHORTCUT_CLAUSE}} placeholder backed by a single resolver function in scripts/resolvers/review.ts. Plan-* review skills can now include the clause via one placeholder line in their .tmpl rather than cloning the paragraph four times. Future tightening edits one resolver, all four skills update on next gen-skill-docs. Wired into the existing RESOLVERS map alongside generateReviewDashboard and generatePlanFileReviewReport — no gen-skill-docs.ts change needed because the generator already does generic placeholder substitution against that map. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(plan-*-review): anti-shortcut clause in all four review skills Inserts {{ANTI_SHORTCUT_CLAUSE}} placeholder immediately after the **Anti-skip rule:** paragraph in plan-{eng,ceo,design,devex}-review SKILL.md.tmpl. The four templates use different surrounding section headers (eng "Review Sections (after scope is agreed)" vs ceo/design/devex variants), so anchoring on the paragraph rather than the heading works across all four. Closes the May 2026 transcript-bug loophole: existing STOP gates name forbidden actions only AFTER a per-section finding is identified. The anti-shortcut clause adds the pre-emptive rule — "the plan file is the OUTPUT of the interactive review, not a substitute for it" — covering the case the transcript exhibited (skip per-section walk, dump every finding into one plan write, call ExitPlanMode). Regenerated SKILL.md for all hosts via bun run gen:skill-docs --host all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: gate-tier AskUserQuestion floor tests for all plan-* review skills Adds 4 finding-floor tests (one per plan-* skill) that catch the May 2026 transcript-bug class — model wrote a plan and called ExitPlanMode without firing any review-phase AskUserQuestion. Asserts via runPlanSkillFloorCheck that ANY non-permission AUQ render fires before the agent reaches plan_ready. Verified: - Eng floor: passed in 59s - CEO floor: passed in 197s - Design floor: passed - Devex floor: passed - Total ~$2-6 per CI run; only triggers on diff against the 4 plan-* templates, the shared resolver review.ts, the seeds fixture, or the PTY runner helper. Fixtures live in test/fixtures/forcing-finding-seeds.ts, one constant per skill. Each seed is engineered to force at least one obvious finding under that skill's review focus (architectural smell for eng, scope-creep for ceo, UI-slop for design, painful onboarding for devex). Touchfiles wiring: - E2E_TOUCHFILES: 4 plan-*-finding-floor entries with deps on the matching skill template, the shared resolver, the seeds fixture, and the PTY runner helper - E2E_TIERS: all 4 entries marked 'gate' - touchfiles.test.ts: count assertion bumped 21→22 with explicit plan-ceo-finding-floor containment check Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.27.1.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v1.28.0.0 feat: browse --headed/--proxy/--navigate + gstack/llms.txt + webdriver-only stealth (#1363) * feat(browse): SOCKS5 bridge with auth + cred redaction helper Adds browse/src/socks-bridge.ts: a 127.0.0.1-only SOCKS5 listener that accepts unauthenticated connections from Chromium and relays them through an authenticated upstream proxy. Chromium does not prompt for SOCKS5 auth at launch, so this bridge is the workaround for using auth-required residential SOCKS5 upstreams. - startSocksBridge({ upstream, port: 0 }) → ephemeral 127.0.0.1 listener - testUpstream({ upstream, retries: 3, backoffMs: 500, budgetMs: 5000 }) pre-flight that connects to a known endpoint (default 1.1.1.1:443) - Stream-error policy: kill affected client + upstream sockets on any error mid-stream; no transport retries (a transport-layer retry can corrupt browser traffic) Adds browse/src/proxy-redact.ts: single source of truth for redacting credentials in any logged proxy URL or upstream config. Every code path that prints proxy config goes through this helper. Adds the socks npm dep (~30KB) and 16 tests covering: 127.0.0.1-only bind, byte-for-byte round trip through the bridge, auth rejection, mid-stream upstream drop kills client conn, listener teardown, testUpstream success + retry-exhaust paths, redaction of every credential shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(browse): --proxy and --headed flags wire bridge into daemon Adds the global --proxy <url> and --headed flags to the browse CLI. Resolves cred policy and routes the daemon launch through the SOCKS5 bridge (or pass-through for HTTP/HTTPS) before chromium.launch(). CLI (cli.ts): - extractGlobalFlags() strips --proxy/--headed from argv, parses URL via Node URL class, validates D9 cred-mixing (env BROWSE_PROXY_USER/PASS + URL creds → exit 1 with hint), composes canonical proxy URL with resolved creds, computes a stable configHash for daemon-mismatch - ensureServer() now reads existing daemon's configHash from state file and refuses (exit 1 with disconnect hint) if --proxy/--headed mismatch the existing daemon. No silent restart that would drop tab state. - All proxy-related stderr lines go through redactProxyUrl proxy-config.ts (new): - parseProxyConfig() — URL parser + D9 cred-mixing detector + scheme allowlist - computeConfigHash() — stable hash of (proxy URL minus creds + headed flag) - toUpstreamConfig() — map ParsedProxyConfig → socks-bridge.UpstreamConfig Server (server.ts): - Reads BROWSE_PROXY_URL at startup; for SOCKS5+auth, runs testUpstream pre-flight (5s budget, 3 retries, 500ms backoff) and exits 1 on failure with redacted error - Spawns startSocksBridge() on 127.0.0.1:<ephemeral> and points Chromium at it via socks5://127.0.0.1:<port> - HTTP/HTTPS or unauth SOCKS5 → pass-through to chromium.launch proxy.server (with username/password if present) - State file gains optional configHash for daemon-mismatch check - Bridge tears down via process.on('exit') Browser manager (browser-manager.ts): - New setProxyConfig({ server, username, password }) called by server.ts before launch - chromium.launch() and both launchPersistentContext sites pass the proxy config through when set Tests: 22 new across proxy-config (parse + cred-mixing + hash stability) and extractGlobalFlags (flag stripping + cred-mixing rejection + cred rotation hash stability + redaction). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(browse): Xvfb auto-spawn with PID + start-time validation Adds browse/src/xvfb.ts: a Linux-only Xvfb auto-spawn module for running headed Chromium in containers without DISPLAY. The module walks a display range to pick a free one (never hardcodes :99) and validates orphan PIDs by BOTH /proc/<pid>/cmdline matching 'Xvfb' AND start-time matching the recorded value before sending any signal. Defends against PID reuse — refuses to kill anything that doesn't match both checks. - shouldSpawnXvfb(env, platform) — pure decision: skip on macOS/Windows, on Linux skip when DISPLAY or WAYLAND_DISPLAY is set (codex F2) - pickFreeDisplay(99..120) — probes via xdpyinfo - spawnXvfb(display) — returns { pid, startTime, display } handle - isOurXvfb(pid, startTime) — both-checks validator - cleanupXvfb(state) — best-effort, validates ownership before SIGTERM Wired into server.ts startup: when shouldSpawnXvfb says yes, picks a free display, spawns Xvfb, sets DISPLAY for chromium.launchHeaded, and records xvfbPid/xvfbStartTime/xvfbDisplay in the state file. Cleanup runs on process.on('exit'). The CLI's disconnect path also runs cleanupXvfb() in the force-cleanup branch when the server is dead. Disconnect now applies to any non-default daemon (headed mode OR configHash-tagged daemon — i.e. one started with --proxy/--headed), not just headed mode. Adds xvfb + x11-utils to .github/docker/Dockerfile.ci so CI exercises the Linux container --headed path on every run. Without it the most common production path would go untested. Tests: 17 new across decision logic, PID validation defenses (cmdline mismatch, start-time mismatch), no-op safety on bad inputs, and a Linux+Xvfb-installed gate for the spawn → validate → cleanup round trip. Tests skip on macOS/Windows automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(browse): webdriver-mask stealth + Chromium-through-bridge e2e D7 (codex narrowing): mask navigator.webdriver only via addInitScript. The wintermute approach (fake plugins=[1..5], fake languages=['en-US', 'en'], stub window.chrome) is intentionally NOT applied — modern fingerprinters check consistency between plugins.length, languages, userAgent, and platform, and synthesizing fixed values can flag MORE bot-like, not less. The honest minimum is webdriver, which Chromium exposes as a known automation tell. Adds browse/src/stealth.ts: single source of truth for the stealth init script and launch args. Both browser-manager.launch() (headless) and launchHeaded() (persistent context with extension) call applyStealth(context) and pass STEALTH_LAUNCH_ARGS into chromium.launch. The pre-existing launchHeaded stealth that did fake plugins/languages is removed for the same reason. The cdc_/__webdriver runtime cleanup and Permissions API patch are kept — they remove automation-injected artifacts, not synthesize fake natural-browser values. Adds bridge-chromium-e2e.test.ts (codex F3): the test that proves the FEATURE works. Real Chromium with proxy.server = 'socks5://127.0.0.1: <bridgePort>' navigates to a local HTTP fixture; the auth upstream's connect counter and the HTTP fixture's hit counter both increment, proving traffic actually traversed bridge → auth-upstream → destination. Without this test, we could ship a working byte-relay and a broken Chromium integration and never know. Adds bridge-port-restart.test.ts (codex F1, reframed): old test assumed two daemons coexist, which contradicts D2 single-daemon model. Reframed as restart-then-restart, asserting fresh ephemeral ports (never the hardcoded 1090) on each spin-up. Adds stealth-webdriver.test.ts: navigator.webdriver=false in both fresh contexts and persistent contexts; navigator.plugins/languages are NOT replaced with the wintermute fake list (D7 verification). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(gstack): generate llms.txt — single-file capability index for AI agents Adds scripts/gen-llms-txt.ts: produces gstack/llms.txt at repo root, indexing every skill (47), every browse command (75), and design commands when the design CLI is present. Per the llmstxt.org convention, agents can read one file to learn what gstack offers instead of crawling 47 SKILL.md files. Sources: - skill SKILL.md.tmpl frontmatter (name + description block scalar) - browse/src/commands.ts COMMAND_DESCRIPTIONS (sorted by category) - design/src/commands.ts COMMAND_DESCRIPTIONS if present (best-effort) Wired into scripts/gen-skill-docs.ts as a post-step so it regenerates on every `bun run gen:skill-docs` (the same script that re-emits all SKILL.md files). Failures are non-fatal warnings, not build breaks — the generator never blocks SKILL.md regen. Strict mode (--strict, also used by tests) throws when a skill is missing name or description in its frontmatter, catching missing metadata before it ships. Tests: shape (top-level sections, sort order, single-line summary discipline), every-skill-and-command-appears, strict-mode rejection of incomplete frontmatter, and freshness check that the committed gstack/llms.txt matches what the generator produces now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(browse): --navigate flag on download for browser-triggered files Adds the --navigate strategy from community PR #1355 (originally from @garrytan-agents). When set, download navigates to the URL with waitUntil:'commit' and captures the resulting browser download via page.waitForEvent('download'), then saves via download.saveAs(). Handles URLs that trigger files via Content-Disposition headers, multi-hop CDN redirects requiring browser cookies, or anti-bot CDN chains where page.request.fetch() can't follow the auth/redirect chain. Defaults still use the existing direct-fetch strategy. --navigate is opt-in. Goes through the same validateNavigationUrl SSRF gate as goto, so download --navigate cannot reach IPv4 metadata endpoints (AWS IMDSv1, GCP/Azure equivalents) or arbitrary internal hosts. Inferred content type from suggested filename for common extensions (epub, pdf, zip, gz, mp3/mp4, jpg/jpeg/png, txt, html, json) — falls back to application/octet-stream. Same 200MB cap as Strategy 1. Frames the use case generically (anti-bot CDN, Content-Disposition, redirect chains) rather than naming any specific site, per project voice rules. Co-Authored-By: @garrytan-agents Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: v1.28.0.0 — browse SKILL section + VERSION + CHANGELOG VERSION 1.27.1.0 → 1.28.0.0 (MINOR — substantial new capability: five new flags/features, ~600 LOC added, new socks dep, multiple new modules). browse/SKILL.md.tmpl: new "Headed Mode + Proxy + Anti-Bot Sites" section between User Handoff and Snapshot Flags. Documents --headed (auto-Xvfb on Linux), --proxy (with embedded SOCKS5 bridge for auth), download --navigate, the cred-mixing policy, daemon-discipline (refuse-on-mismatch), the narrowed webdriver-only stealth, container support caveats, and the fail-fast/no-retry failure modes. CHANGELOG entry follows the release-summary format from CLAUDE.md: two-line headline, lead paragraph, "The numbers that matter" table tied to specific test files that prove each capability, "What this means for AI agents" closing tied to a real workflow shift, then itemized Added/Changed/Fixed/For-contributors sections. Browse SKILL.md regenerated via bun run gen:skill-docs. gstack/llms.txt regenerated automatically from the same pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(browse): integration coverage for daemon mismatch + proxy fail-fast Adds two integration tests that exercise the full process boundary, not just the module-level wiring. daemon-mismatch-refuse.test.ts (D2): - Stubs a healthy state file with a fake configHash and a fake /health HTTP server, runs the actual cli.ts binary with a mismatching --proxy, asserts exit 1 + 'different config' / 'browse disconnect' hint in stderr. - Same shape with the plain-daemon-meets---headed case. - Positive case: matching configHash → CLI does NOT emit the mismatch hint (regardless of whether the actual command succeeds). server-proxy-fail-fast.test.ts: - Starts the rejecting SOCKS5 upstream, spawns server.ts with BROWSE_PROXY_URL pointing at it, BROWSE_HEADLESS_SKIP=1 to skip Chromium launch. - Asserts exit 1, 'FAIL upstream' in stderr (testUpstream pre-flight ran), no raw credential leakage in any output (redaction works on the failure path), and exit within 30s upper bound. Both tests use the existing spawn-bun-cli pattern from commands.test.ts so they run on the same CI infrastructure as the rest of the bun test suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gen-skill-docs): keep module sync so test require() still works Two regressions caught by the full test suite after the v1.28.0.0 landing pass: 1) package.json version mismatch — VERSION was bumped to 1.28.0.0 but package.json still pinned to 1.27.1.0. test/gen-skill-docs.test.ts asserts they match. 2) Top-level await in scripts/gen-llms-txt.ts (CLI entry block) and scripts/gen-skill-docs.ts (post-step) made gen-skill-docs an async module. test/gen-skill-docs.test.ts uses require() to pull extractVoiceTriggers/processVoiceTriggers from gen-skill-docs, which Bun rejects on async modules with: "TypeError: require() async module ... unsupported. use 'await import()' instead." Fix: wrap the await blocks in void IIFEs so the modules remain sync from a require() perspective. After fix: all 379 gen-skill-docs tests pass, all 77 new feature tests pass (3 skipped on macOS — Linux+Xvfb gates). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): apply codex adversarial findings on the new lifecycle Codex outside-voice review caught five real production-failure modes in the v1.28.0.0 proxy/headed lifecycle. Fixed: 1) `browse disconnect` skip-graceful for proxy-only daemons (browse/src/cli.ts). The graceful /command POST went out with stray `domains,` shorthand and (even fixed) the server's disconnect handler only tears down headed mode — proxy-only daemons returned 200 "Not in headed mode" while leaving the bridge running. Now disconnect short-circuits to force-cleanup for non-headed daemons, which kicks process.on('exit') in server.ts to close the bridge + Xvfb. 2) sendCommand crash retry preserves --proxy / --headed (browse/src/cli.ts). The ECONNRESET retry path called startServer() with no extraEnv, silently dropping the proxied flags. A daemon that died mid-command would silently restart in default direct/headless mode and bypass the SOCKS bridge. Now reapplies BROWSE_PROXY_URL, BROWSE_HEADED, and BROWSE_CONFIG_HASH from the resolved global flags. 3) `connect` honors --proxy (browse/src/cli.ts). The headed-mode `connect` command built its own serverEnv that didn't include BROWSE_PROXY_URL, so `browse --proxy <url> connect` launched headed Chromium without the proxy. Now threads proxyUrl + configHash into the connect serverEnv. 4) SOCKS5 bridge handles fragmented TCP frames (browse/src/socks-bridge.ts). Previously used once('data') and parsed each chunk as a complete SOCKS5 frame — TCP doesn't preserve message boundaries and split greetings/CONNECT requests caused intermittent handshake failures. Replaced with a single state machine that buffers chunks and uses size predicates on the SOCKS5 header to know when a complete frame has arrived. Pauses the client socket during upstream connect and replays any remainder bytes into the upstream on success. 5) Xvfb cleanup-then-state-delete ordering (browse/src/server.ts). emergencyCleanup() previously deleted the state file BEFORE any Xvfb cleanup could read it, orphaning Xvfb on uncaughtException / unhandledRejection. Now reads the state file first, calls cleanupXvfb() (which validates cmdline + start-time before kill), then deletes the state file. Adds a regression test for #4: writes the SOCKS5 greeting + CONNECT one byte at a time with 5ms ticks, asserts a clean round trip after the fragmented handshake. Codex's sixth finding (bridge advertises NO_AUTH on 127.0.0.1, so any co-located process can use the authenticated upstream) is documented as a known limitation — gstack's threat model assumes single-user hosts. Adding bridge-side auth is a separate change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update BROWSER.md + TODOS.md for v1.28.0.0 BROWSER.md picks up a "Headed mode + proxy + browser-native downloads (v1.28.0.0)" subsection inside Real-browser mode plus the new source-map entries (socks-bridge.ts, proxy-config.ts, proxy-redact.ts, xvfb.ts, stealth.ts). TODOS.md anti-bot-stealth item updated to reflect the v1.28 narrowing — the "fake plugins" line is no longer accurate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): include bun.lock in image build for deterministic install CI evals all failed on PR #1363 with: error: Could not resolve: "smart-buffer". Maybe you need to "bun install"? error: Could not resolve: "ip-address". Maybe you need to "bun install"? at /opt/node_modules_cache/socks/build/client/socksclient.js:15 The cached node_modules layer in the pre-baked Docker image had `socks` (the new dep) but was missing its transitive deps (smart-buffer, ip-address). The image build copied only package.json into the build context — without bun.lock, `bun install` resolved a different tree than local `bun install` did, dropping required transitive deps. Reproduces locally as 229 packages (correct) when bun.lock is present or absent. Why CI diverged isn't fully understood — possibly Docker layer cache reuse across image rebuilds — but the deterministic fix is to include the lockfile in the image build context and use `--frozen-lockfile`, matching what every CI doc recommends. Changes: - .github/docker/Dockerfile.ci: COPY bun.lock alongside package.json, switch `bun install` → `bun install --frozen-lockfile` so any future lockfile drift fails loudly during image build instead of producing a partially-installed cache that breaks downstream eval jobs. - .github/workflows/evals.yml: include bun.lock in the image-tag hash so adding/removing a dep invalidates the image, AND copy bun.lock into the docker context alongside package.json. - .github/workflows/evals-periodic.yml: same updates. - .github/workflows/ci-image.yml: rebuild trigger now fires on bun.lock changes too; build context includes bun.lock. Image hash changes → fresh image gets built on next CI run → install matches the lockfile exactly → no missing transitive deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): use hardlink copy instead of symlink for node_modules cache After the bun.lock fix landed, the eval matrix STILL failed identically: Could not resolve: "smart-buffer" / "ip-address" at /opt/node_modules_cache/socks/build/client/socksclient.js But the hash-tagged image actually contains smart-buffer + ip-address + socks all flat in /opt/node_modules_cache (verified by pulling and inspecting the image). 207 packages, all present. Root cause: the workflow used `ln -s /opt/node_modules_cache node_modules` to restore deps. Bun build (and Node module resolution generally) walks a file's realpath to find sibling deps. From the symlinked /workspace/node_modules/socks/build/client/socksclient.js, realpath resolves to /opt/node_modules_cache/socks/build/client/socksclient.js, and walking up to find a node_modules/smart-buffer dir fails — there's no `node_modules` segment in the realpath. Switch `ln -s` → `cp -al` (hardlink-copy). Each file in the cache becomes a hardlink at /workspace/node_modules/<pkg>, sharing inodes (no data copy). Realpath of /workspace/node_modules/socks/.../socksclient.js stays inside /workspace/node_modules, so sibling deps resolve correctly. Speed is comparable to symlink — `cp -al` on ~200 packages on tmpfs is sub-second. Same caching story preserved. Both evals.yml and evals-periodic.yml updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): cp -r instead of cp -al — /opt and /workspace are different filesystems The hardlink-copy fix landed and immediately broke with: cp: cannot create hard link 'node_modules/<file>' to '/opt/node_modules_cache/<file>': Invalid cross-device link GitHub Actions runners mount the workspace volume at /workspace (overlay-fs layered onto the runner image), and /opt is the runner image's own filesystem. Cross-filesystem hardlinks aren't supported. Switch `cp -al` → `cp -r`. Cost: ~5s for ~200 packages of small JS files vs ~0s for the broken symlink. Still cheaper than the ~15s `bun install` fallback. Realpath of /workspace/node_modules/<pkg>/... stays inside /workspace, so bun build's sibling-dep resolution works. Both evals.yml and evals-periodic.yml updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v1.29.0.0 feat: worktree-aware gbrain code sources via path-hash IDs and CWD pin (#1382) * feat: worktree-aware gbrain code sources via path-hash IDs and CWD pin Conductor sibling worktrees of the same repo no longer collide on a shared gstack-code-<slug> source ID. /sync-gbrain now derives a path-hashed source ID per worktree, runs gbrain sources attach to write .gbrain-source in the worktree root, and removes the legacy unsuffixed source on first new-format sync to prevent orphan accumulation. Bug fixes surfaced by /codex during /ship: - Silent attach failure now treated as stage failure (no more ok:true while pin is missing → unqualified code-def hits wrong source). - Startup preamble checks .gbrain-source in the cwd worktree, not global state, so an unsynced worktree no longer claims "indexed" because a sibling synced. - Code stage no longer skipped on remote-MCP (Path 4); the early-exit was in the SKILL template, not the orchestrator. - Source registration routes through lib/gbrain-sources.ts only; deleted the near-duplicate ensureSourceRegisteredSync from the orchestrator. Requires gbrain v0.30.0+ (uses sources attach). Phase 0 spike report: ~/.gstack/projects/garrytan-gstack/2026-05-08-gbrain-split-engine-spike.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump version and changelog (v1.29.0.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391) * fix(codex): use resume-compatible flags * fix: V-001 security vulnerability Automated security fix generated by Orbis Security AI * docs: align prompt-injection thresholds to security.ts (v1.6.4.0 catch-up) CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped 0.60 → 0.75 in d75402bb (v1.6.4.0, "cut Haiku classifier FP from 44% to 23%, gate now enforced", #1135). browse/src/security.ts:37 has WARN: 0.75 and BROWSER.md:743 was updated alongside that commit; CLAUDE.md and ARCHITECTURE.md still read 0.60. Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold table). No code change. No behavior change. Pure doc-vs-code alignment. Verification: $ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md browse/src/security.ts:37: WARN: 0.75, CLAUDE.md:290: - \`WARN: 0.75\` ... ARCHITECTURE.md:159: ...>= \`WARN\` (0.75)... BROWSER.md:743: - \`WARN: 0.75\` ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: Korean/CJK IME input and rendering in Sidebar Terminal Fixes #1272 This commit addresses three separate Korean/CJK bugs in the Sidebar Terminal: **Bug 1 - IME Input**: Korean text typed via IME composition was not reaching the PTY correctly. Added compositionstart/compositionend event listeners to suppress partial jamo fragments and only send the final composed string. **Bug 2a - Font Rendering**: Added CJK monospace font fallbacks ("Noto Sans Mono CJK KR", "Malgun Gothic") to both the xterm.js fontFamily config and the CSS --font-mono variable. This ensures consistent cell-width calculations for Korean characters. **Bug 2b - UTF-8 Boundary Detection**: Added buffering logic to prevent multi-byte UTF-8 characters (Korean is 3 bytes) from being split across WebSocket chunks. This follows the same pattern as PR #1007 which fixed the sidebar-agent path, but extends it to the terminal-agent path. Special thanks to @ldybob for the excellent root cause analysis and proposed solutions in issue #1272. Tested on WSL2 + Windows 11 with Korean IME. * fix(ship): tighten Plan Completion gate (VAS-449 remediation) VAS-446 shipped with a PLAN.md acceptance criterion (domain-hq has /docs/dashboard.md) silently skipped. /ship's Plan Completion subagent existed at ship time (added in v1.4.1.0) but the gate let the failure through. Four structural fixes: 1. Path concreteness rule: items naming a concrete filesystem path MUST be classified DONE/NOT DONE via [ -f <path> ], never UNVERIFIABLE. 2. Validator detection: CONTENT-SHAPE items scan target repo's package.json for validate-* scripts and run them before falling back to UNVERIFIABLE. 3. Per-item UNVERIFIABLE confirmation: replaces blanket "I've checked each one" with per-item Y/N/D loop. The blanket-confirm path is the exact failure VAS-449 surfaced. 4. Subagent fail-closed: if Plan Completion subagent + inline fallback both fail, surface explicit AskUserQuestion instead of silent pass. Replaces the prior "Never block /ship on subagent failure" fail-open. Locked in by test/ship-plan-completion-invariants.test.ts (5 assertions, no LLM dependency, ~60ms). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): bash.exe wrap for telemetry on Windows reportAttemptTelemetry() in browse/src/security.ts calls spawn(bin, args) where bin is the gstack-telemetry-log bash script. On Windows this fails silently with ENOENT — CreateProcess can't dispatch on shebang lines. Adopts v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (from browse/src/claude-bin.ts:resolveClaudeCommand, introduced in #1252) for resolving bash.exe. resolveBashBinary() honors GSTACK_BASH_BIN absolute-path or PATH-resolvable override, falling back to Bun.which('bash') which finds Git Bash on the standard Windows install. buildTelemetrySpawnCommand() wraps the script invocation on win32 only; POSIX path is bit-identical. Returns null when bash can't be resolved on Windows so caller skips spawn — local attempts.jsonl audit trail keeps working without surfacing a Windows-only failure. 8 new unit tests cover resolveBashBinary (POSIX bash, absolute override, quote-stripping, BASH_BIN fallback, empty-PATH null) and buildTelemetrySpawnCommand (POSIX pass-through, win32 bash wrap, win32 null on unresolvable, arg-array immutability). POSIX path is bit-identical — Bun.which('bash') on Linux/macOS returns the same /bin/bash or /usr/bin/bash that the old hardcoded spawn relied on. * fix(make-pdf): Bun.which-based binary resolution for browse + pdftotext on Windows Extends v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (introduced in browse/src/claude-bin.ts via #1252) to the two other binary resolvers in the codebase: make-pdf/src/browseClient.ts:resolveBrowseBin and make-pdf/src/pdftotext.ts:resolvePdftotext. Same Windows quirks (fs.accessSync(X_OK) degrades to existence-check; `which` isn't available outside Git Bash; bun --compile --outfile X emits X.exe), same Bun.which-based fix shape, same env override convention. Changes: - GSTACK_BROWSE_BIN / GSTACK_PDFTOTEXT_BIN as the v1.24-aligned overrides; BROWSE_BIN / PDFTOTEXT_BIN remain as back-compat aliases. - Bun.which() replaces execFileSync('which', ...) for PATH lookup. Handles Windows PATHEXT natively; no more `where`-vs-`which` branch. - findExecutable(base) helper exported from each module, probes .exe/.cmd/.bat after the bare-path miss on win32. Linux/macOS behavior is bit-identical (isExecutable short-circuits before the win32 branch ever runs). - macCandidates renamed posixCandidates (always was — /opt/homebrew, /usr/local, /usr/bin). No Windows candidates added; Poppler installs scatter across Scoop/Chocolatey/portable zips and guessing causes false positives. - Error messages get a Windows install hint (scoop install poppler / oschwartz10612) and `setx` example for GSTACK_*_BIN. - Pre-existing test 'honors BROWSE_BIN when it points at a real executable' was hardcoded /bin/sh — made cross-platform via a REAL_EXE constant (cmd.exe on win32, /bin/sh on POSIX). Was a Windows-CI blocker on its own. Coordination: PR #1094 (@BkashJEE) covered browseClient.ts independently with a narrower scope; this PR's pdftotext + cross-platform tests + GSTACK_*_BIN naming are additive. Either order of merge works. Test plan: - bun test make-pdf/test/browseClient.test.ts make-pdf/test/pdftotext.test.ts on win32 — 29 pass, 0 fail (12 new assertions: findExecutable POSIX/win32/null, resolveBrowseBin GSTACK_BROWSE_BIN + BROWSE_BIN + precedence + quote-strip, same shape for resolvePdftotext + Windows install hint in error message). - POSIX branch unchanged — fs.accessSync(X_OK) on Linux/macOS short-circuits before any win32 logic runs, matching the v1.24 claude-bin.ts pattern. * fix(browse): NTFS ACL hardening for Windows state files via icacls gstack's ~/.gstack/ state directory holds bearer tokens, canary tokens, agent queue contents (with prompt history), session state, security-decision logs, and saved cookie bundles — all written with { mode: 0o600 } / 0o700. On Windows, those mode bits are a silent no-op: Node's fs module doesn't translate POSIX modes to NTFS ACLs, and inherited ACLs leave every "restricted" file readable by other principals on the machine (verified via icacls — six ACEs, the intended user is the LAST of six). Threat model is non-trivial on: - Self-hosted CI runners (different service account on the same Windows box can read developer tokens, canary tokens, prompt history) - Shared development machines (agencies, studios, lab environments) - Multi-tenant servers with shared home directories Orthogonal to v1.24.0.0's binary-resolution work — complementary at the write side. v1.24's bin/gstack-paths resolves ~/.gstack/ correctly across plugin / global / local installs; this PR ensures files written into those resolved paths actually get the POSIX 0o600 semantic translated to NTFS. The fix: - New browse/src/file-permissions.ts (158 LOC, 5 public + 1 test-reset). restrictFilePermissions / restrictDirectoryPermissions wrap chmod (POSIX) or icacls /inheritance:r /grant:r <user>:(F) (Windows). writeSecureFile / appendSecureFile / mkdirSecure are drop-in wrappers for the common patterns. - 19 call sites converted across 9 source files: browser-manager.ts, browser-skill-write.ts, cli.ts, config.ts, meta-commands.ts, security-classifier.ts, security.ts (4 sites), server.ts (5 sites), terminal-agent.ts (8 sites), tunnel-denial-log.ts. - (OI)(CI) inheritance flags on directories mean files created via fs.write* *inside* an mkdirSecure-created dir inherit the owner-only ACL automatically — important for tunnel-denial-log.ts where appends use async fsp.appendFile. Error handling: icacls failures (nonexistent path, missing icacls.exe, hardened environments) log a one-shot warning to stderr and proceed. Once-per-process gating prevents log spam if the condition persists. Filesystem stays functional; the file just ends up with inherited ACLs. Test plan: - bun test browse/test/file-permissions.test.ts — 13 pass, 0 fail (POSIX mode-bit assertions, Windows no-throw, mkdir idempotence, recursive creation, Buffer payloads, append-creates-then-reapplies-once semantics) - bun test browse/test/security.test.ts — 38 pass, 0 fail (existing security test suite plus the bash-binary resolution tests added in fix #1119; the converted writeFileSync/appendFileSync/mkdirSync sites in security.ts integrate cleanly) - Empirical icacls before/after on a real file — 6 ACEs → 1 ACE - bun build typecheck on all modified files — clean (server.ts has a pre-existing playwright-core/electron resolution issue unrelated to this PR) POSIX behavior is bit-identical to old code — fs.chmodSync(path, 0o6XX) on the helper's POSIX branch matches the inline { mode: 0o6XX } it replaces. Linux and macOS see no behavior change. Inviting pushback on three judgment calls (in PR description): 1. icacls vs npm library 2. ACL scope — just user, or user + SYSTEM? 3. Graceful degradation — once-per-process warn, not silent, not hard-fail. * fix(browse): declare lastConsoleFlushed to restore console-log persistence flushBuffers() references a `lastConsoleFlushed` cursor at server.ts:337 and assigns it at :344, but the `let lastConsoleFlushed = 0;` declaration is missing — only the network and dialog siblings are declared at lines 327-328. Result: every 1-second flushBuffers tick (line 376) throws `ReferenceError: lastConsoleFlushed is not defined`, gets swallowed by the catch at line 369 ("[browse] Buffer flush failed: ..."), and the console branch's append never runs. browse-console.log is never written in any production deployment since this regressed. Discovered by stress-testing the daemon with 15 concurrent CLIs against cold state — the race surfaced the buffer-flush error spam in one spawned daemon's stderr. Verified by running the daemon against a real file:// page with console.log events: in-memory `browse console` returns the entries, but `.gstack/browse-console.log` is never created on disk. Regression introduced by 1a100a2a "fix: eliminate duplicate command sets in chain, improve flush perf and type safety" — the flush refactor switched from `Bun.write` to `fs.appendFileSync` and added the `lastConsoleFlushed` cursor pattern alongside its network/dialog siblings, but missed the matching `let` declaration. Tests don't currently exercise flushBuffers, so the regression shipped silently. Fix: - Declare `let lastConsoleFlushed = 0;` next to `lastNetworkFlushed` and `lastDialogFlushed` (browse/src/server.ts:327) - Add a source-level guard test (browse/test/server-flush-trackers.test.ts) that fails any future refactor that adds a fourth `last*Flushed` cursor without the matching declaration. Same pattern as terminal-agent.test.ts and dual-listener.test.ts — read source as text, assert invariant, no daemon required. Test plan: - [x] New regression test fails on current main, passes with the fix - [x] `bun run build` clean - [x] Manual smoke: spawn daemon -> goto file:// page with console.log -> wait 4s -> .gstack/browse-console.log now exists with the expected entries (163 bytes vs zero before) 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(browse): per-process state-file temp path to fix concurrent-write ENOENT The daemon writes `.gstack/browse.json` via the standard atomic-rename pattern: `writeFileSync(tmp, …) → renameSync(tmp, stateFile)`. Four sites in server.ts use this pattern (initial daemon-startup state at :2002, /tunnel/start handler at :1479, BROWSE_TUNNEL=1 inline tunnel update at :2083, BROWSE_TUNNEL_LOCAL_ONLY=1 update at :2113), and all four hard-code the same temp filename `${stateFile}.tmp`. Under concurrent writers the shared filename races on the rename: t0 Writer A: writeFileSync(stateFile + '.tmp', payloadA) t1 Writer B: writeFileSync(stateFile + '.tmp', payloadB) // overwrites A t2 Writer A: renameSync(stateFile + '.tmp', stateFile) // moves B's payload t3 Writer B: renameSync(stateFile + '.tmp', stateFile) // ENOENT — file gone Reproduced empirically with 15 concurrent CLIs against a fresh `.gstack/`: [browse] Failed to start: ENOENT: no such file or directory, rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json' Pre-fix success rate: **0 / 15** under cold-start race. Post-fix success rate: **15 / 15**, zero ENOENT. Fix: - New `tmpStatePath()` helper (server.ts:333) returns `${stateFile}.tmp.${pid}.${randomBytes(4).toString('hex')}` - All 4 call sites use `tmpStatePath()` instead of the shared literal - Atomic rename still gives last-writer-wins semantics on the final state.json content; only behavior change is that concurrent writers no longer kill each other on the rename step Source-level guard test (browse/test/server-tmp-state-path.test.ts) locks two invariants: (1) no remaining `stateFile + '.tmp'` literals, (2) every state-write `writeFileSync` call uses `tmpStatePath()`. Same read-source-as-text pattern as terminal-agent.test.ts and dual-listener.test.ts — no daemon required, runs in tier-1 free. Test plan: - [x] Targeted source-level guard test passes (3 / 0) - [x] `bun run build` clean - [x] Live regression: 15 concurrent CLIs against cold state → 15 / 15 healthy, 0 ENOENT (vs 0 / 15 pre-fix) - [x] No `.tmp.*` orphans left behind after rename succeeds - [x] Related test cluster (server-auth, dual-listener, cdp-mutex, findport) — same pre-existing flakes as `main`, no new regressions introduced 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(browse): clear refs when iframe auto-detaches in getActiveFrameOrPage Asymmetric cleanup between two equivalent staleness conditions: onMainFrameNavigated() → clearRefs() + activeFrame = null ✓ getActiveFrameOrPage() → activeFrame = null (refs NOT cleared) ✗ Both paths see the same staleness condition — refs were captured against a frame that no longer exists. The main-frame path correctly clears both pieces of state. The iframe-detach path nulls the frame but leaves the refMap intact. The lazy click-time check in `resolveRef` (tab-session.ts:97) partially saves us — `entry.locator.count()` on a detached-frame locator throws or returns 0, so the click errors out as "Ref X is stale". But the user has no signal that frame context silently changed underfoot: the next `snapshot` runs against `this.page` (main) while old iframe refs still litter `refMap` with the same role+name keys. New refs collide with stale ones, the resolver picks one at random, the user clicks the wrong element. TODOS.md line 816-820 documents "Detached frame auto-recovery" as a shipped iframe-support feature in v0.12.1.0. This restores the documented intent — the recovery should leave the session in a clean state, not a half-cleared one. Fix: 1 line — add `this.clearRefs()` next to `this.activeFrame = null` inside the if-branch. Test plan: - [x] New regression test: 4/4 pass - refs cleared when getActiveFrameOrPage detects detached iframe - refs preserved when active frame is still attached (no regression) - refs preserved when no frame set (page-level path untouched) - matches onMainFrameNavigated symmetry — both paths reach the same clean end state - [x] `bun run build` clean 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(codex): resolve python for JSON parser * fix: add fail-fast probe for base branch in ship step 12 * fix(plan-devex-review): remove contradictory plan-mode handshake * fix(design): honor Retry-After header in variants 429 handler Closes #1244. The 429 handler in `generateVariant` discarded the `Retry-After` response header and fell straight through to a local exponential schedule (2s/4s/8s). In image-generation batches, that burns retry attempts inside the provider's cooldown window and the request never recovers. Now we parse `Retry-After` per RFC 7231 — both delta-seconds (`Retry-After: 5`) and HTTP-date (`Retry-After: Fri, 31 Dec 1999 23:59:59 GMT`). Honored waits are capped at 60s to bound stalls from hostile or buggy headers. Delta-seconds are validated as digits-only (rejects `2abc`). When `Retry-After` is honored (including 0 / past-date "retry now"), the next iteration's leading exponential sleep is skipped so we don't double-wait. Invalid or missing headers fall through to the existing exponential schedule unchanged. Behavior matrix: | Header | Behavior | |---------------------------------|-------------------------------------------| | Retry-After: 5 | wait 5s, skip leading on next attempt | | Retry-After: 999999 | capped to 60s, skip leading | | Retry-After: 2abc | invalid, fall through to exponential | | Retry-After: 0 | wait 0, skip leading (retry immediately) | | Retry-After: <past HTTP-date> | wait 0, skip leading | | Retry-After: <future date> | wait diff capped at 60s, skip leading | | no header | fall through to existing exponential | `generateVariant` now accepts an optional `fetchFn` parameter (defaults to `globalThis.fetch`) so tests can inject a stub. Production call sites are unchanged. Tests cover the five behavior buckets above, asserting both the 1st-to-2nd call timing gap and call counts. All five pass in ~8s. Co-Authored-…
1 parent d84f8d6 commit be5847c

278 files changed

Lines changed: 27791 additions & 3609 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/docker/Dockerfile.ci

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -77,17 +77,26 @@ RUN npx playwright install-deps chromium
7777
# render in DejaVu Sans. playwright install-deps happens to pull this in today,
7878
# but the dep is implicit and could change — install explicitly so upgrades
7979
# can't silently regress rendering.
80+
#
81+
# Xvfb is also installed here so the browse --headed integration tests
82+
# (headed-xvfb, headed-orphan-cleanup) can exercise the Linux container
83+
# auto-spawn path on every CI run. Without Xvfb in the image, the most
84+
# common production --headed path goes untested.
8085
RUN for i in 1 2 3; do \
81-
apt-get update && apt-get install -y --no-install-recommends fonts-liberation fontconfig && break || \
86+
apt-get update && apt-get install -y --no-install-recommends fonts-liberation fontconfig xvfb x11-utils && break || \
8287
(echo "fonts-liberation install retry $i/3"; sleep 10); \
8388
done \
8489
&& fc-cache -f \
8590
&& rm -rf /var/lib/apt/lists/*
8691

87-
# Pre-install dependencies (cached layer — only rebuilds when package.json changes)
88-
COPY package.json /workspace/
92+
# Pre-install dependencies (cached layer — only rebuilds when package.json or
93+
# bun.lock changes). Copy BOTH so install is deterministic and matches local
94+
# resolution. Without bun.lock here, bun install resolved transitive deps
95+
# differently in CI vs local (observed on v1.28.0.0: socks landed but
96+
# smart-buffer + ip-address didn't make it into the cached node_modules).
97+
COPY package.json bun.lock /workspace/
8998
WORKDIR /workspace
90-
RUN bun install && rm -rf /tmp/*
99+
RUN bun install --frozen-lockfile && rm -rf /tmp/*
91100

92101
# Install Playwright Chromium to a shared location accessible by all users
93102
ENV PLAYWRIGHT_BROWSERS_PATH=/opt/playwright-browsers

.github/workflows/actionlint.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name: Workflow Lint
22
on: [push, pull_request]
33
jobs:
44
actionlint:
5-
runs-on: ubuntu-latest
5+
runs-on: ubicloud-standard-8
66
steps:
77
- uses: actions/checkout@v4
88
- uses: rhysd/actionlint@v1.7.11

.github/workflows/ci-image.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,20 +9,21 @@ on:
99
paths:
1010
- '.github/docker/Dockerfile.ci'
1111
- 'package.json'
12+
- 'bun.lock'
1213
# Manual trigger
1314
workflow_dispatch:
1415

1516
jobs:
1617
build:
17-
runs-on: ubicloud-standard-2
18+
runs-on: ubicloud-standard-8
1819
permissions:
1920
contents: read
2021
packages: write
2122
steps:
2223
- uses: actions/checkout@v4
2324

2425
# Copy lockfile + package.json into Docker build context
25-
- run: cp package.json .github/docker/
26+
- run: cp package.json bun.lock .github/docker/
2627

2728
- uses: docker/login-action@v3
2829
with:

.github/workflows/evals-periodic.yml

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ env:
1515

1616
jobs:
1717
build-image:
18-
runs-on: ubicloud-standard-2
18+
runs-on: ubicloud-standard-8
1919
permissions:
2020
contents: read
2121
packages: write
@@ -25,7 +25,7 @@ jobs:
2525
- uses: actions/checkout@v4
2626

2727
- id: meta
28-
run: echo "tag=${{ env.IMAGE }}:${{ hashFiles('.github/docker/Dockerfile.ci', 'package.json') }}" >> "$GITHUB_OUTPUT"
28+
run: echo "tag=${{ env.IMAGE }}:${{ hashFiles('.github/docker/Dockerfile.ci', 'package.json', 'bun.lock') }}" >> "$GITHUB_OUTPUT"
2929

3030
- uses: docker/login-action@v3
3131
with:
@@ -43,7 +43,7 @@ jobs:
4343
fi
4444
4545
- if: steps.check.outputs.exists == 'false'
46-
run: cp package.json .github/docker/
46+
run: cp package.json bun.lock .github/docker/
4747

4848
- if: steps.check.outputs.exists == 'false'
4949
uses: docker/build-push-action@v6
@@ -56,7 +56,7 @@ jobs:
5656
${{ env.IMAGE }}:latest
5757
5858
evals:
59-
runs-on: ubicloud-standard-2
59+
runs-on: ubicloud-standard-8
6060
needs: build-image
6161
container:
6262
image: ${{ needs.build-image.outputs.image-tag }}
@@ -101,10 +101,14 @@ jobs:
101101
echo "TMPDIR=/home/runner/.cache"
102102
} >> "$GITHUB_ENV"
103103
104+
# Recursive copy (cp -r) instead of symlink: bun build resolves a
105+
# file's realpath when looking for sibling deps. See evals.yml for the
106+
# full explanation. cp -al would be faster but /opt and /workspace
107+
# are on different overlay-fs layers, so cross-device hardlink fails.
104108
- name: Restore deps
105109
run: |
106110
if [ -d /opt/node_modules_cache ] && diff -q /opt/node_modules_cache/.package.json package.json >/dev/null 2>&1; then
107-
ln -s /opt/node_modules_cache node_modules
111+
cp -r /opt/node_modules_cache node_modules
108112
else
109113
bun install
110114
fi

.github/workflows/evals.yml

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ env:
1515
jobs:
1616
# Build Docker image with pre-baked toolchain (cached — only rebuilds on Dockerfile/lockfile change)
1717
build-image:
18-
runs-on: ubicloud-standard-2
18+
runs-on: ubicloud-standard-8
1919
permissions:
2020
contents: read
2121
packages: write
@@ -25,7 +25,7 @@ jobs:
2525
- uses: actions/checkout@v4
2626

2727
- id: meta
28-
run: echo "tag=${{ env.IMAGE }}:${{ hashFiles('.github/docker/Dockerfile.ci', 'package.json') }}" >> "$GITHUB_OUTPUT"
28+
run: echo "tag=${{ env.IMAGE }}:${{ hashFiles('.github/docker/Dockerfile.ci', 'package.json', 'bun.lock') }}" >> "$GITHUB_OUTPUT"
2929

3030
- uses: docker/login-action@v3
3131
with:
@@ -43,7 +43,7 @@ jobs:
4343
fi
4444
4545
- if: steps.check.outputs.exists == 'false'
46-
run: cp package.json .github/docker/
46+
run: cp package.json bun.lock .github/docker/
4747

4848
- if: steps.check.outputs.exists == 'false'
4949
uses: docker/build-push-action@v6
@@ -56,7 +56,7 @@ jobs:
5656
${{ env.IMAGE }}:latest
5757
5858
evals:
59-
runs-on: ${{ matrix.suite.runner || 'ubicloud-standard-2' }}
59+
runs-on: ${{ matrix.suite.runner || 'ubicloud-standard-8' }}
6060
needs: build-image
6161
container:
6262
image: ${{ needs.build-image.outputs.image-tag }}
@@ -110,11 +110,19 @@ jobs:
110110
echo "TMPDIR=/home/runner/.cache"
111111
} >> "$GITHUB_ENV"
112112
113-
# Restore pre-installed node_modules from Docker image via symlink (~0s vs ~15s install)
113+
# Restore pre-installed node_modules from Docker image via recursive
114+
# copy. Symlink (`ln -s`) breaks bun's module resolution because bun
115+
# resolves a file's realpath when walking up to find node_modules/<dep>;
116+
# from a symlinked path, realpath escapes the workspace and sibling
117+
# deps no longer resolve. Hardlink copy (`cp -al`) fails because /opt
118+
# and /workspace are on different overlay-fs layers ("Invalid
119+
# cross-device link"). Recursive copy works on every layout. Cost:
120+
# ~5s for ~200 packages of small JS files vs ~0s for symlink — still
121+
# vastly cheaper than rerunning `bun install` (network + resolution).
114122
- name: Restore deps
115123
run: |
116124
if [ -d /opt/node_modules_cache ] && diff -q /opt/node_modules_cache/.package.json package.json >/dev/null 2>&1; then
117-
ln -s /opt/node_modules_cache node_modules
125+
cp -r /opt/node_modules_cache node_modules
118126
else
119127
bun install
120128
fi
@@ -147,7 +155,7 @@ jobs:
147155
retention-days: 90
148156

149157
report:
150-
runs-on: ubicloud-standard-2
158+
runs-on: ubicloud-standard-8
151159
needs: evals
152160
if: always() && github.event_name == 'pull_request'
153161
timeout-minutes: 5
@@ -211,7 +219,7 @@ jobs:
211219
$(echo -e "$SUITE_LINES")
212220
213221
---
214-
*12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite*"
222+
*12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite*"
215223
216224
if [ "$FAILED" -gt 0 ]; then
217225
FAILURES=""

.github/workflows/make-pdf-gate.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ jobs:
2222
strategy:
2323
fail-fast: false
2424
matrix:
25-
os: [ubuntu-latest, macos-latest]
25+
os: [ubicloud-standard-8, macos-latest]
2626
# Windows is tolerant-mode — Xpdf / Poppler-Windows extraction
2727
# differs enough from the Linux/macOS baseline that the strict
2828
# exact-diff gate is unreliable. Enable once the normalized
@@ -48,7 +48,7 @@ jobs:
4848
run: brew install poppler
4949

5050
- name: Install poppler-utils (Ubuntu)
51-
if: matrix.os == 'ubuntu-latest'
51+
if: matrix.os == 'ubicloud-standard-8'
5252
run: sudo apt-get update && sudo apt-get install -y poppler-utils
5353

5454
- name: Install Playwright Chromium

.github/workflows/pr-title-sync.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ concurrency:
1313
jobs:
1414
sync:
1515
name: Sync PR title to VERSION
16-
runs-on: ubuntu-latest
16+
runs-on: ubicloud-standard-8
1717
permissions:
1818
contents: read
1919
pull-requests: write

.github/workflows/skill-docs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name: Skill Docs Freshness
22
on: [push, pull_request]
33
jobs:
44
check-freshness:
5-
runs-on: ubuntu-latest
5+
runs-on: ubicloud-standard-8
66
steps:
77
- uses: actions/checkout@v4
88
- uses: oven-sh/setup-bun@v2

.github/workflows/version-gate.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ concurrency:
1414
jobs:
1515
check:
1616
name: Check VERSION is not stale vs queue
17-
runs-on: ubuntu-latest
17+
runs-on: ubicloud-standard-8
1818
permissions:
1919
contents: read
2020
pull-requests: read

.github/workflows/windows-free-tests.yml

Lines changed: 30 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,24 @@
11
name: Windows Free Tests
22

3-
# Curated subset of the free test suite that runs on windows-latest.
3+
# Curated subset of the free test suite that runs on a paid faster Windows runner.
44
#
55
# Codex's v1.18.0.0 review flagged that the existing evals.yml workflow uses
66
# a Linux container, so a windows-latest matrix entry there isn't a drop-in.
77
# This workflow is non-container, runs the curated Windows-safe subset, plus
88
# targeted resolver tests that exercise the Bun.which-based claude binary
99
# resolution + the GSTACK_CLAUDE_BIN override path on Windows.
1010
#
11-
# What this DOES NOT do (out of scope for v1.18.0.0):
11+
# Runner: GitHub-hosted free `windows-latest`. The whole rest of CI runs on
12+
# Ubicloud (Linux), but Ubicloud doesn't ship Windows runners and we don't
13+
# want to flip on GitHub's org-level larger-runner billing for just this one
14+
# job. 4 cores, ~60s spin-up, $0. The wave-coverage tests this runs are
15+
# small enough that total job time stays under 2 minutes.
16+
#
17+
# What this DOES NOT do (still out of scope, tracked as follow-up):
1218
# - Run the full free suite on Windows. The 24 tests that hardcode /bin/sh,
1319
# spawn('sh',...), or raw /tmp/ paths are excluded by scripts/test-free-shards.ts
1420
# --windows-only. They need POSIX-bound surfaces to be ported off shell
15-
# primitives before they can run on Windows. Tracked as a follow-up TODO.
21+
# primitives before they can run on Windows.
1622
# - Run Playwright/browser-backed tests. Browse server bring-up on Windows is
1723
# a separate concern (PR #1238 windows-pty-bun-pty-fix is in flight).
1824

@@ -27,6 +33,8 @@ concurrency:
2733

2834
jobs:
2935
windows-free-tests:
36+
# Ubicloud Windows runner (same provider as the Linux evals workflow).
37+
# To revert: swap to `windows-latest` (GitHub's free 4-core Windows runner).
3038
runs-on: windows-latest
3139
timeout-minutes: 15
3240

@@ -91,8 +99,23 @@ jobs:
9199
continue-on-error: true
92100

93101
- name: Verify new portability work on Windows
94-
# 31 tests targeting the new code paths added by v1.20.0.0. These
95-
# MUST pass for the release-note headline ("curated Windows lane added")
96-
# to be truthful.
97-
run: bun test test/gstack-paths.test.ts browse/test/claude-bin.test.ts test/test-free-shards.test.ts
102+
# Tests targeting the v1.20.0.0 lane plus v1.30.0.0 fix-wave additions
103+
# plus v1.36.0.0 Windows-install hardening (sanitizer + _link_or_copy
104+
# helper + build-script subshells + doc/config-key drift guard).
105+
# v1.30.0.0 extension covers icacls hardening (#1308), bash.exe telemetry
106+
# wrap (#1306), and Bun.which-based binary resolvers (#1307). These must
107+
# pass on Windows for the wave's "Windows hardening" framing to be honest.
108+
run: |
109+
bun test \
110+
test/gstack-paths.test.ts \
111+
browse/test/claude-bin.test.ts \
112+
test/test-free-shards.test.ts \
113+
browse/test/file-permissions.test.ts \
114+
browse/test/security.test.ts \
115+
browse/test/server-sanitize-surrogates.test.ts \
116+
test/setup-windows-fallback.test.ts \
117+
test/build-script-shell-compat.test.ts \
118+
test/docs-config-keys.test.ts \
119+
make-pdf/test/browseClient.test.ts \
120+
make-pdf/test/pdftotext.test.ts
98121
shell: bash

0 commit comments

Comments
 (0)