🪟 Windows Canary: combined build/test of all open Windows fixes by thedotmack · Pull Request #2699 · thedotmack/claude-mem

thedotmack · 2026-05-29T01:46:17Z

🪟 Windows Canary — combined build/test branch for all open Windows fixes

We have a large cluster of open Windows PRs that each fix a different piece of the same cross-platform story (spawn, hooks, chroma, mcp launcher, supervisor, settings, runtime deps). They overlap heavily — several touch the same launcher strings and the same ChromaMcpManager spawn path — so merging them one-by-one keeps re-introducing conflicts and nobody can see them working together.

This branch bundles them into one canary so we can validate the combined result.

⚠️ Calling all Windows contributors

@CaTeIM @ousamabenyounes @sabinss4 @YOMXXX @alessandropcostabr @Bruce583102198 @PulseCheckAI — your Windows PRs are bundled here.

I (@thedotmack) do not have a Windows machine, so I genuinely cannot test this myself — we have to work this one out together. Please:

git fetch origin && git checkout windows-canary

npm ci (or bun install) then npm run build

Install/point Claude Code at this build and exercise it on real Windows (PowerShell, Git Bash, and cmd.exe if you can)

Report back in this thread: ✅ works / ❌ broke, with the shell + Windows version. Screenshots/logs of any console-window flashes, MCP error -32000, EPIPE, or path-resolution failures are gold.

If your fix needed adjustment to combine with the others (see "Conflict decisions" below), please confirm the resolution still solves your original issue.

✅ PRs bundled in this canary (12)

PR	Author	What it fixes
#2208	@thedotmack	Restore Windows spawn (PR #751 re-apply) + Windows CI
#2489	@CaTeIM	Launch `mcp-search` via `node` so Windows can spawn it
#2502	@ousamabenyounes	Drop `cmd.exe` wrap on Windows `uvx` spawn (chroma)
#2507	@sabinss4	Add PowerShell call operator (`&`) to hook commands
#2593	@YOMXXX	Escape `cmd.exe` metacharacters in `uvx` args
#2594	@YOMXXX	Replace `sh` launcher with cross-platform `node -e` (#2461)
#2598	@YOMXXX	Replace fragile `$SHELL -lc` PATH probe with `printenv`-first
#2599	@YOMXXX	Make Windows `captureProcessStartToken` actually return a token (#2578)
#2636	@alessandropcostabr	Strip BOM from `settings.json` + lifecycle stdin-guard exemption
#2654	@Bruce583102198	Suppress printf write errors when pipe breaks in inline bash hooks
#2658	@CaTeIM	Restore plugin runtime deps after auto-upgrade
#2668	@PulseCheckAI	Spawn `uvx` directly on Windows instead of via `cmd.exe` (#2667)

🔀 Conflict decisions (please sanity-check these)

These PRs proposed competing approaches to the same problem — only one can be active at a time. Here's what the canary currently uses and what got superseded:

Chroma uvx spawn — three PRs, two approaches:
- fix(chroma): drop cmd.exe wrap on Windows uvx spawn (2426) #2502 + fix(chroma): spawn uvx directly on Windows instead of via cmd.exe (closes #2667) #2668 agree: spawn uvx directly (no cmd.exe wrap). ← canary uses this.
- fix(chroma): escape cmd.exe metacharacters in uvx args on Windows #2593 keeps the cmd.exe wrap but escapes metacharacters (^<>&|, %%). Merged as a no-op since direct-spawn makes the wrap unnecessary. @YOMXXX — if direct-spawn doesn't work on your Windows setup, this is the fallback; please test.
mcp-search launcher — both node-based:
- fix(mcp): replace sh launcher with cross-platform node -e (closes #2461) #2594 spawns a subprocess with signal-forwarding. ← canary uses this (closes Fix Windows compatibility: replace sh spawn + silence printf EPIPE #2461, kept fix(mcp): launch mcp-search via node so Windows can spawn it #2489's distribution test green).
- fix(mcp): launch mcp-search via node so Windows can spawn it #2489 loads mcp-server.cjs in-process via require (superseded, but fix(mcp): launch mcp-search via node so Windows can spawn it #2489 still contributes the node move + its test).
Hook PATH probe — fix(hooks): replace fragile $SHELL -lc PATH probe with printenv-first pattern #2598's printenv-first probe + fix: suppress printf write errors when pipe breaks in inline bash hooks #2654's 2>/dev/null pipe-suppression are combined (complementary).

🚫 Deliberately excluded (need a rebase or a decision)

fix: Codex transcript ingestion + queue self-deadlock on Windows (closes #2192) #2209 (@thedotmack) — Codex transcript + queue self-deadlock. The watcher.ts half (Windows transcript-append fix) is clean, but the SessionRoutes.ts half collides with main's since-merged handleGeneratorExit / GeneratorExitHandler refactor. The [Bug] Codex transcript ingestion on Windows finds no/late items and can strand pending observations #2192 in-flight-requeue fix is not present in the new handler and must be re-ported there. Needs a rebase onto main before it can join.
fix: replace printf '%s\n' with echo in hook path-finding subshells (2439) #2501 (@ousamabenyounes) — echo instead of printf '%s\n' in hook subshells (Windows (Git Bash): printf write error Permission denied blocks UserPromptSubmit hook #2439). Superseded by fix(hooks): replace fragile $SHELL -lc PATH probe with printenv-first pattern #2598's printenv-first probe already in the canary; its tests assert the old launcher format. If printf '%s\n' genuinely misbehaves on your shell, please say so — that would override fix(hooks): replace fragile $SHELL -lc PATH probe with printenv-first pattern #2598.
fix(hooks): silence Windows Git Bash EPIPE/EACCES on plugin-path resolver #2526 (@ajs-uk-dev) — Silence Windows Git Bash EPIPE/EACCES on the path resolver. Functionally covered by fix: suppress printf write errors when pipe breaks in inline bash hooks #2654's 2>/dev/null pipe-suppression already in the canary. Please confirm fix: suppress printf write errors when pipe breaks in inline bash hooks #2654 fully resolves your case, or what's still leaking.
Windows: replace bash hooks with PowerShell to eliminate visible console windows #2476 (draft) — PowerShell-hook rewrite. A competing architecture to the bash-hook fixes here; left out as a draft.
Add Codex ingestion provider and Windows Chroma launch fix #2614 (draft) — Codex ingestion + Windows chroma; draft, overlaps the chroma trio.

🧪 Verification done locally (macOS — Windows still needs you)

npm run build ✅ all bundles regenerate from merged source
tsc --noEmit: 24 errors — identical to origin/main; 0 introduced by this merge (the pre-existing TS baseline is fixed by Fix upstream TypeScript baseline #2511)
bun test: 54 failures — identical to origin/main; 0 introduced by this merge (pre-existing test baseline is fixed by Fix upstream test baseline regressions #2512)
Windows/chroma/infra suites that this branch touches (tests/infrastructure, tests/services/sync): 186 pass / 0 fail

In other words: this branch adds the Windows fixes without regressing the baseline. The remaining red is pre-existing and owned by #2511/#2512.

Notes

Diffstat: 20 files, +1,572 / −606 (incl. regenerated bundles).
This is a canary, not a merge target — once Windows folks confirm what works, we'll land the validated fixes via their individual PRs (or fast-track them).
CHANGELOG.md intentionally not touched (auto-generated).

🤖 Generated with Claude Code

Re-applies the PowerShell Start-Process -WindowStyle Hidden daemon spawn that PR #751 (e6ae017) introduced and commit d13662d reverted. Also fixes the bun-runner cmd /c popup, sets detached:false on Windows for SDK subprocesses (so windowsHide actually works and claude.exe doesn't outlive the worker), and adds windows-latest CI to prevent regression. - ProcessManager.spawnDaemon: PowerShell -EncodedCommand branch back. Returns 0 sentinel on success — callers MUST use pid === undefined for failure detection, never falsy checks. - bun-runner.js: drop "cmd /c" wrapper. shell:true lets Node resolve bun.cmd via PATHEXT and respects windowsHide (the explicit cmd.exe wrapper was popping a visible window per hook — #2150, #2186). - process-registry.ts spawnSdkProcess: detached:false on Windows. Mixing detached:true with windowsHide:true is documented-undefined on Windows; with detached:false, windowsHide actually hides claude.exe and the SDK subprocess dies with the parent (#2190, #2198). - .github/workflows/windows.yml: smoke test counts visible cmd windows before/after spawn + grep guard that the Start-Process branch survives. WSL bash stdin (#2188) is acknowledged but deferred — the bash → node pipe boundary needs a real Windows VM to test, beyond this PR's scope. PTY for Claude CLI SDK mode (#2173, #2177) is also deferred per plan. Closes #2150, #2169, #2186, #2187, #2190, #2198 Refs #2183 (Windows perf — same root cause) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

actions/setup-node@v4 cache: npm requires a package-lock.json and this project uses Bun (only bunfig.toml exists at root). Drop the cache directive, switch npm ci to npm install --no-audit --no-fund, and narrow the build step to npm run build — build-and-sync also runs a marketplace sync + worker restart that hardcodes ~/.claude/plugins, which doesn't exist on CI.

…oke with static guards CodeRabbit majors on #2208: 1. plugin/scripts/bun-runner.js — shell:true with separate spawnArgs triggers DEP0190 on Node 22+ and breaks paths/args containing spaces. Build a single fully-quoted command string (mirroring findBun()'s 'where bun' approach) and pass spawnArgs=[]. 2. .github/workflows/windows.yml — the dynamic smoke step that counted visible cmd windows around 'claude-mem start' exits 1 on 'claude-mem is not installed' before exercising the spawn path, AND PowerShell try/catch doesn't suppress native exit codes regardless. Replace with three static regression guards covering the exact patterns PR #2208 protects: - PowerShell Start-Process + WindowStyle Hidden in spawnDaemon - bun-runner shell:true with empty spawnArgs (DEP0190 guard) - windowsHide set on SDK spawn factory (issue #2190)

The "Anti-regression" steps grep ProcessManager.ts/bun-runner.js/process-registry.ts for specific strings (Start-Process, WindowStyle Hidden, shell:true, windowsHide). Tripwires aren't fixes — they make refactoring harder forever and verify nothing the actual Windows build doesn't already verify. The npm install + npm run build on windows-latest is the real guard.

ChromaMcpManager wrapped uvx as `cmd.exe /c uvx ...` on Windows. cmd.exe reassembles argv into a single command line and re-parses the dep-override `--with` values: `onnxruntime>=1.20` becomes "redirect stdout to file `=1.20`" and `protobuf<7` becomes "redirect stdin from file `7`". The Python subprocess never starts, so every chroma-mcp tool call closed in ~30 ms with `MCP error -32000: Connection closed`. Node's child_process.spawn (via cross-spawn) resolves `uvx` -> `uvx.exe` on PATH on Windows the same way it does on POSIX, so the cmd.exe wrap is not needed. Drop it and call uvx directly on all platforms. Adds a regression test that pins the spawn command to `uvx` and asserts the dep-override version specifiers reach uvx as argv elements (not as cmd.exe redirection). Co-Authored-By: Claude <noreply@anthropic.com>

On Windows, PowerShell interprets a quoted absolute path as a string literal rather than an executable. Hook commands like: C:\path\to\bun.exe C:\path\to\worker-service.cjs hook ... throw an exception because PowerShell never invokes them as a process. Prefixing with the call operator (&) fixes this: & C:\path\to\bun.exe C:\path\to\worker-service.cjs hook ... Apply the fix to all three hook installers: GeminiCliHooksInstaller, CursorHooksInstaller, and WindsurfHooksInstaller. The call operator is only added when process.platform === win32; Linux/macOS are unchanged. Users already experiencing this issue can re-run the install command to regenerate their settings.json with the corrected hook commands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Closes #2585, #2591, #2495, #2429, #2405. On Windows the worker spawns chroma-mcp via `cmd.exe /c uvx --with onnxruntime>=1.20 --with protobuf<7 chroma-mcp ...`. cmd.exe interprets unquoted `<`, `>`, `&`, `|` in the command line as redirection / pipe metacharacters before launching uvx, so the version pins are parsed as I/O redirects to files named `7` / `=1.20`. The child dies in ~50ms before the MCP handshake completes, surfacing as `MCP error -32000: Connection closed` on every Chroma sync. Node's child_process auto-quoting only triggers on whitespace, so these metacharacters reach cmd.exe raw. Five separate issues report the same symptom on Windows 10/11 (PowerShell, cmd, Git Bash). Fix: prefix each cmd.exe metacharacter (`^ < > & |`) with cmd's own escape character `^` so it's treated as literal. POSIX path is unchanged; the escape only runs when `process.platform === 'win32'`. Verification: a freshly-built worker-service.cjs contains `p.replace(/([\^<>&|])/g,"^$1")` and the resulting command line is `cmd.exe /c uvx --with onnxruntime^>=1.20 --with protobuf^<7 ...`, which the issue reporter (#2585) confirmed lets chroma-mcp connect in ~6 seconds and backfill complete. The escape character class includes `^` itself so an arg containing a literal caret is escaped to `^^`, matching cmd.exe's quoting rules.

The mcp-search MCP server in plugin/.mcp.json launched via: "command": "sh", "args": ["-c", "_C=... ls -dt ... | while read ... exec node ..."] On Windows, neither cmd nor PowerShell ships `sh`, and most users do not have Git Bash on the system PATH that Claude Desktop / Claude Code spawns from. Result: `/mcp` panel shows mcp-search failed with ENOENT, blocking all mem-search skill usage on Windows. Same launcher also breaks any host that strips PATH before spawning MCP servers. Fix: rewrite the launcher as inline node -e. Node is guaranteed to be on PATH for any host running claude-mem (the plugin's other scripts require it). The new launcher preserves the exact same fallback search order: 1. CLAUDE_PLUGIN_ROOT / PLUGIN_ROOT env (host-injected) 2. <cwd>/plugin 3. <cwd> 4. ~/.codex/plugins/cache/claude-mem-local/claude-mem/<version>/ (mtime-descending; matches `ls -dt` semantics) 5. ~/.codex/plugins/cache/thedotmack/claude-mem/<version>/ 6. <CLAUDE_CONFIG_DIR or ~/.claude>/plugins/cache/thedotmack/claude-mem/<version>/ 7. <CLAUDE_CONFIG_DIR or ~/.claude>/plugins/marketplaces/thedotmack/plugin Both the codex and claude cache fallback paths are preserved as literal strings so build-hooks.js's distribution check still passes. Verification: - JSON parses cleanly - node --check on the inline JS reports no syntax errors - Running the launcher logic on the repo cwd resolves PLUGIN_ROOT to <cwd>/plugin (same as the old sh script) - npm run build's mcp-search assertion passes No change to runtime behavior on macOS / Linux — node is the same spawn target the old sh script ultimately exec'd into. The only difference is the search-and-launch logic now runs in node itself rather than in a sub-shell.

Addresses Greptile review on #2593. cmd.exe expands `%VAR%` against the process environment even inside `cmd /c`, so a user-supplied value passed via buildCommandArgs() that happens to contain `%` characters — most plausibly `chromaApiKey` when CLAUDE_MEM_CHROMA_API_KEY is set to a password-style string — would be silently mangled: `%TOKEN%` becomes the value of $TOKEN (or empty if unset). The result is an opaque auth failure that looks unrelated to Windows quoting. Fix: double `%` to `%%` before the caret pass. Order matters — running the caret replacement first would convert `%` to `^%` which cmd.exe then interprets as "literal %", but the original intent (literal % staying literal) requires the canonical `%%` form because escaping with `^` only works inside double-quoted contexts (we're passing args via /c, not quoted). Verified: rebuilt worker-service.cjs contains the chained `p.replace(/%/g,"%%").replace(/([\^<>&|])/g,"^$1")`. Bundle size unchanged. POSIX path still bypasses both replacements.

Addresses both points in the Greptile review on #2594. The previous launcher had two POSIX regressions vs. the old `exec node ...` approach: 1. Signal forwarding gap: the MCP host holds the launcher's PID, not mcp-server.cjs's. SIGTERM to the launcher would exit the launcher via Node's default handler but leave the child orphaned, holding its stdio descriptors / file locks. 2. Signal-death exit code: spawn's `exit` callback fires with `(code, signal)`. When the child dies via signal, `code` is null and `signal` is e.g. `'SIGTERM'`. The previous handler returned `0` (clean exit), masking the real cause from the host. Fix: - Register handlers for SIGTERM/SIGINT/SIGHUP on the launcher that forward to `ch.kill(sig)`. try/catch in case the child is already dead. - In `ch.on('exit')`, when signal is non-null, deregister our own handler for that signal and re-raise it via `process.kill(process.pid, sig)`. The deregister step is critical — Node calls registered handlers instead of dying, so without it the launcher would catch the signal and process.exit(0) instead of inheriting the signal-death semantics (waitpid would see WIFEXITED instead of WIFSIGNALED, and the host would see exit code 0). The fallback `process.exit(128)` covers exotic platforms where re-raise is rejected. Verification (sleep child standin): launcher=9617 child=9630 OK: child died launcher exit=143 (expect 128+15=143) POSIX hosts now see the launcher process die with the same signal the host sent, matching the old `exec node` semantics. Windows behavior unchanged (it never had the orphan problem — the OS reaps the process tree).

… pattern Closes #2589, #2588, #2461, #2439. Six hooks in plugin/hooks/hooks.json (SessionStart×2, UserPromptSubmit, PostToolUse, PreToolUse, Stop) start with: export PATH="$($SHELL -lc 'echo $PATH' 2>/dev/null):$PATH"; ... Under Git Bash on Windows (MSYS bash 5.x), when stdin is not a TTY (which it never is for a hook subprocess), `$SHELL -lc` returns empty or fails in a way that bash interprets as a broken pipe, producing the well-documented: /usr/bin/bash: line 1: printf: write error: Permission denied cascading through every subsequent pipe in the hook command. Net effect on Windows: every hook fires this stderr noise on every invocation, and downstream `printf | while IFS=read -r` blocks get spurious EPIPE/EACCES errors. Fix: switch to the same battle-tested probe that plugin/hooks/codex-hooks.json already uses: _HP=$(printenv PATH 2>/dev/null || true); if [ -z "$_HP" ] && [ -n "${SHELL:-}" ]; then _HP=$("$SHELL" -lc 'printf %s "$PATH"' 2>/dev/null || true); fi; _HP=$(printf '%s' "$_HP" | tr ' ' ':'); export PATH="${_HP:+$_HP:}$PATH"; Three improvements vs. the broken version: 1. `printenv PATH` is the canonical way to read an environment variable from a POSIX shell — works under bash/zsh/dash/Git Bash, no TTY dependency, no pipe to break. The login-shell probe runs only as a fallback when printenv returns empty (which would only happen if PATH is genuinely unset). 2. `"$SHELL" -lc 'printf %s "$PATH"'` (quoted, with printf instead of echo) is the safer form: `echo` interprets backslashes in some shells, `printf %s` is unambiguous. 3. `tr ' ' ':'` on the captured PATH defends against Git Bash / MSYS quirks where a path containing literal spaces could break downstream PATH parsing. The Setup hook (line 11) already uses a hand-rolled PATH that includes nvm / brew / /usr/local — it doesn't share the broken pattern and is intentionally left untouched. This is the same probe codex-hooks.json (used by the Codex CLI integration) has shipped with for some time — it's been validated across the same matrix of shells we need to support here. Verification: $ grep -c "SHELL -lc 'echo \$PATH'" plugin/hooks/hooks.json 0 $ grep -c "printenv PATH" plugin/hooks/hooks.json 6 $ node -e "JSON.parse(...)" # JSON OK $ bash -c '_HP=$(printenv PATH ...); ... echo "len=$(echo $PATH | wc -c)"' PATH length=3626 contains-usr-local=1 Note on scope: this PR addresses the root cause (the broken $SHELL -lc probe). PR #2526 silenced the downstream printf EPIPE symptom by adding 2>/dev/null to the consumer pipe, but the noise was still being generated. With this fix, the symptom goes away entirely because the upstream probe no longer fails.

…n a token Closes #2578. Before this change, `captureProcessStartToken()` had: if (process.platform === 'win32') { return null; } …which made `verifyPidFileOwnership()` short-circuit at: const currentToken = captureProcessStartToken(info.pid); if (currentToken === null) return true; // <-- always on Windows The net effect: on Windows, the PID-reuse defense was disabled. When claude-mem's worker died and Windows reassigned its PID to an unrelated process (issue #2578 reports SignalRgbLauncher.exe winning the lottery), `claude-mem status` happily reported the worker as healthy and `start` refused to launch a fresh one, leaving the user with a permanent deadlock until they manually deleted `worker.pid`. This commit lights up the Windows branch using PowerShell `Get-Process -Id <pid>` and composes the start token from `(StartTime.Ticks | ProcessName)`. The combination handles both flavors of PID reuse: - **Same image restarted** — e.g. bun.exe crashed and the OS immediately assigned the freed PID to another bun.exe spawn. StartTime.Ticks (100-nanosecond resolution) differs. - **Different image inherits the PID** — the #2578 case where an unrelated app like SignalRgbLauncher.exe ends up with claude-mem's old PID. ProcessName differs. `Get-Process` is preferred over `Get-CimInstance Win32_Process` because it's ~3x faster on startup and works without elevation for processes owned by the current user — which is the only case that matters for a user-installed CC plugin. Failure modes: - Process doesn't exist → PowerShell's `-ErrorAction Stop` throws, the catch returns empty output, captureProcessStartToken returns null, verifyPidFileOwnership falls back to "PID alive → trust" (matches the old behavior, no regression on this edge). - Get-Process succeeds but stdout is empty → return null (same fallback). - PowerShell spawn itself fails (extremely rare, e.g. ENOENT because powershell.exe is gone) → return null (same fallback). Cost: ~300–500ms PowerShell warm-up per call. This function is on the worker start / stop / status path, not a hot loop, so the latency is acceptable. POSIX paths (linux /proc, mac/BSD `ps -p $pid -o lstart=`) are untouched. Verification on macOS (POSIX branch unaffected): $ node -e 'const{spawnSync}=require("child_process"); console.log(spawnSync("ps",["-p",String(process.pid),"-o","lstart="],{encoding:"utf-8"}).stdout.trim())' Thu May 21 21:38:20 2026 Windows verification deferred to follow-up — the PowerShell command itself is straight `Get-Process` so the failure mode if anything is wrong is "captureProcessStartToken returns null", which is exactly what the function returned before this PR. So the worst-case regression on Windows is a no-op; the best case (and intended case) is that PID reuse finally gets caught.

…Greptile) Addresses the Greptile review on #2599. Node.js's `spawnSync` reports failures via two channels: - `result.status` — set to null when the child never exited normally (timeout, killed by signal); set to the integer exit code otherwise. - `result.error` — set to the Error instance when the spawn itself failed (ENOENT, EACCES, etc.), even before the child ran. The Windows branch (added in this PR) and the BSD/mac branch (long standing) both only checked `result.status !== 0`. That works today because `status` is null on every spawn failure — null !== 0 is true, so the function falls through to `return null` correctly. But it leaves an implicit assumption about Node's `spawnSync` semantics that a reader has to reconstruct. Updated both branches to the idiomatic `result.error || result.status !== 0` guard. Semantically equivalent today; future-proof if Node ever changes spawn-failure reporting; and makes the intent explicit to the next person reading this code. Both branches share the same one-line guard now (Windows + BSD/mac), keeping them symmetric. Linux branch reads /proc directly so it doesn't go through spawnSync. esbuild minification preserves the new `||` guard — verified that the rebuilt worker-service.cjs contains `r.error||r.status` and `e.error||e.status` (esbuild renamed variables but kept the logic). No behavior change on any platform; this is a defense-in-depth clarification of the existing fallback path.

Claude Code spawns MCP servers WITHOUT a shell, so the `.mcp.json` `command` is resolved against PATH directly. On Windows `sh` is not on PATH (Git Bash is not guaranteed to be there), so `command: "sh"` fails with ENOENT and the mcp-search server never starts. Switch only the bundled plugin/.mcp.json launcher to `command: "node"` (node is always present as the runtime) and perform the same plugin-root resolution in JS, then require() the resolved mcp-server.cjs. Hooks are intentionally left untouched: they run via "shell": "bash", which Claude Code provides itself, so the maintainer's sh-based hook launchers keep working on Windows. Only the shell-less MCP spawn needed a node bootstrap. Tests: the MCP-startup assertions now validate the node launcher; the hook-command assertions remain unchanged (still sh).

…ands from #2188 stdin guard Two defense-in-depth fixes for Windows environments: 1. SettingsDefaultsManager: strip UTF-8 BOM (U+FEFF) before JSON.parse. Windows tools (editors, formatters, Claude Code Edit hook) may prepend BOM to settings.json, which Bun rejects with "Unrecognized token", causing a silent fallback to defaults that breaks server-beta routing. 2. bun-runner.js: exempt lifecycle subcommands (start/stop/restart/status) from the #2188 empty-stdin guard. These subcommands manage the worker daemon and never consume stdin. Killing the child on empty stdin prevents the daemon from starting on platforms where Claude Code does not pipe a payload for SessionStart (e.g. Windows CC <= 2.1.145). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

On Cygwin/Windows with bash 5.3+, when the while loop inside the command-substitution pipeline breaks early, subsequent printf calls in the left-side block can produce "write error: Permission denied" (EPIPE manifesting as EACCES). Redirect stderr of the pipeline's feeder block to /dev/null so these harmless pipe-closure errors don't surface as hook errors to the user. All 7 inline hook commands (Setup, SessionStart x2, UserPromptSubmit, PostToolUse, PreToolUse, Stop) use the same pipeline pattern and are fixed identically.

@modelcontextprotocol/sdk validates tool schemas with Ajv. mcp-server.cjs bundles the SDK, but Ajv emits compiled validators that require their runtime helpers by package path (e.g. require("ajv/dist/runtime/equal")), which esbuild leaves external. These packages were never declared in the generated plugin/package.json, so `bun install` in the plugin cache never fetched them — and the MCP server crashes resolving "ajv/..." the moment a tool is called (notably on Windows, where the cache had a fresh install). Add ajv and ajv-formats to the generated plugin dependencies, plus a build guardrail that fails the build if mcp-server.cjs requires either package without it being declared, mirroring the existing zod guardrail.

When the host auto-upgrades the plugin, it creates a fresh version directory in the plugin cache without node_modules. The worker and MCP server then fail to resolve their runtime deps (zod, ajv via the MCP SDK, tree-sitter grammars) and claude-mem silently stops working until the user notices the upgrade hint and runs the install command manually — a recurring papercut on Windows. The Setup-phase version-check now kicks off a detached, throttled `bun install` in the resolved plugin root when node_modules is absent. Properties: - No-op on healthy installs (node_modules present) — zero behavior change for users who are not in the broken post-upgrade state. - Detached + unref so the Setup hook stays non-blocking; deps land within ~1 min. - 10-minute cooldown marker so a running or repeatedly-failing install does not respawn every session. - shell:true so the host PATH resolves `bun` (bun.cmd on Windows, bun on Unix). - Fully best-effort: any failure is swallowed and the existing upgrade hint still documents the manual recovery command.

- Store the .deps-install-attempted cooldown marker in the version-specific plugin root instead of the global data dir. Otherwise a second upgrade landing within the 10-minute window would be suppressed and left without node_modules. `root` is writable (bun install targets it). - Close the inherited log file descriptor in the parent after spawning the detached child, so the child is its sole owner.

cmd.exe interprets shell-metacharacters in its /c argument string BEFORE handing them to uvx, and Node's arg-array quoting does not survive that re-parse (cmd.exe strips the outer quote pair from /c per the documented parser behavior). The chroma-mcp command built by buildCommandArgs() contains `--with onnxruntime>=1.20` and `--with protobuf<7`, which cmd.exe parses as stdout redirection to file `=1.20` and stdin redirection from file `7` respectively. uvx is invoked with truncated args, exits with "The system cannot find the file specified", and Chroma sync fails with "MCP error -32000: Connection closed" on every observation on Windows. uvx ships as uvx.exe on Windows (real PE executable, not a .cmd shim per the uv installer), so Node's child_process can spawn it directly via CreateProcess without a shell wrapper. Verified on Windows 11 + Node 22.22.2 + Bun 1.3.14 + uv 0.11.8: after the patch, `/api/chroma/status?deep=1` reports status=healthy with round-trip semantic search succeeding in ~650ms.

… spaces

…leakage Save the original ComSpec value before setting it for Windows simulation and restore/delete it in afterAll, keeping env isolation symmetric with the existing platform rollback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # plugin/scripts/worker-service.cjs # src/services/sync/ChromaMcpManager.ts

# Conflicts: # src/services/sync/ChromaMcpManager.ts

# Conflicts: # plugin/scripts/bun-runner.js # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # src/services/infrastructure/ProcessManager.ts # src/supervisor/process-registry.ts

# Conflicts: # plugin/scripts/server-beta-service.cjs # plugin/scripts/worker-service.cjs # plugin/ui/viewer-bundle.js

# Conflicts: # plugin/hooks/hooks.json

# Conflicts: # plugin/.mcp.json

Rebuilt plugin/scripts/*.cjs and viewer-bundle.js via `npm run build` so the generated artifacts reflect the merged source from all bundled Windows PRs (during conflict resolution the stale bundles were taken verbatim, then regenerated here). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-05-29T01:52:01Z

Greptile Summary

This canary PR bundles 12 Windows-compatibility fixes into one combined branch for collective testing before individual merges: it replaces the sh -c MCP launcher with a cross-platform node -e launcher, removes the cmd.exe /c uvx wrapper in ChromaMcpManager, adds PowerShell & call operator support in Windsurf/Gemini hook installers, implements a printenv-first PATH probe, strips UTF-8 BOM from settings.json, and fixes Windows captureProcessStartToken PID-reuse detection.

MCP launcher (plugin/.mcp.json): replaced a Bash sh -c inline script with a node -e inline JavaScript launcher that performs the same plugin-root search and spawns mcp-server.cjs with signal forwarding — eliminating the sh dependency on Windows.
ChromaMcpManager: dropped the Windows-only cmd.exe /c uvx … wrapper; uvx is now spawned directly on all platforms, preventing >= and < in --with arguments from being misinterpreted as I/O redirection by cmd.exe.
ProcessManager.ts: DATA_DIR / PID_FILE constants are now hardcoded to path.join(homedir(), '.claude-mem'), diverging from the paths.dataDir() / paths.workerPid() helpers that honour the CLAUDE_MEM_DATA_DIR env-var and settings-file override.

Confidence Score: 3/5

The ChromaMcpManager, BOM stripping, and captureProcessStartToken changes look correct and well-tested, but the DATA_DIR hardcoding in ProcessManager.ts will silently misdirect PID file and migration-marker writes for any user who has configured CLAUDE_MEM_DATA_DIR, causing the worker health check to permanently report the worker as dead on those setups.

The MCP launcher rewrite and uvx direct-spawn are well-reasoned and the new test coverage for the Windows spawn path is solid. ProcessManager.ts no longer calls paths.dataDir(), which in the original code read CLAUDE_MEM_DATA_DIR from the environment and settings file before falling back to ~/.claude-mem — the hardcoded replacement creates a split that makes the daemon appear dead for users with non-default data directories. The PowerShell & operator in the hooks installers is also applied unconditionally by OS rather than by the shell the target tool actually uses, which could break Windows Gemini CLI or Windsurf users who invoke hooks outside PowerShell.

src/services/infrastructure/ProcessManager.ts (DATA_DIR hardcoding) and src/services/integrations/WindsurfHooksInstaller.ts + GeminiCliHooksInstaller.ts (unconditional PowerShell call operator)

Important Files Changed

Filename	Overview
src/services/infrastructure/ProcessManager.ts	DATA_DIR hardcoded to ~/.claude-mem, dropping paths.dataDir() which honours CLAUDE_MEM_DATA_DIR; PID file, migration markers, and DB-backup paths will go to the wrong location for users with a custom data directory.
src/services/sync/ChromaMcpManager.ts	Drops Windows cmd.exe/c uvx wrapper — direct uvx spawn on all platforms prevents redirection misparse of --with version specifiers; POSIX tree-kill logic added for orphan cleanup.
plugin/.mcp.json	MCP launcher replaced from sh -c to node -e inline JS; cross-platform plugin-root search with signal forwarding to child process; removes sh dependency on Windows.
src/shared/SettingsDefaultsManager.ts	BOM stripping added before JSON.parse to handle Windows-tool-generated settings.json files; correct and non-breaking.
src/supervisor/process-registry.ts	Windows captureProcessStartToken now queries PowerShell Get-Process for StartTime.Ticks
src/services/integrations/WindsurfHooksInstaller.ts	Adds PowerShell & call operator prefix on win32; correct for PowerShell hook execution but would break cmd.exe invocation if Windsurf uses that shell on Windows.
src/services/integrations/GeminiCliHooksInstaller.ts	Same PowerShell & operator added as WindsurfHooksInstaller; same caveat applies — only safe if Gemini CLI always invokes hooks via PowerShell on Windows.
plugin/hooks/hooks.json	PATH probe changed to printenv-first with $SHELL -lc fallback; reduces shell startup overhead and works on shells that don't support -lc flag.
tests/services/sync/chroma-mcp-manager-windows-spawn.test.ts	New Windows spawn regression test verifying uvx is spawned directly without cmd.exe wrapper and version specifiers survive as argv elements.
tests/infrastructure/plugin-distribution.test.ts	Updated to assert node-based MCP startup command contains expected JS content (CLAUDE_CONFIG_DIR, plugin root paths, mcp-server.cjs); passes against new .mcp.json.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Claude Code spawns MCP server] --> B["node -e inline-JS launcher"]
    B --> C{Search plugin root candidates}
    C -->|Found| D["spawn mcp-server.cjs with stdio inherit"]
    C -->|Not found| E["stderr error + exit 1"]
    D --> F[Forward SIGTERM SIGINT SIGHUP to child]
    D --> G{Child exits}
    G -->|signal| H["re-raise signal on self"]
    G -->|code| I["process.exit(code)"]

    subgraph ChromaMcpManager spawn path
        J["connectInternal()"] --> K["spawn uvx directly on all platforms"]
        K --> L["argv: --with onnxruntime>=1.20 --with protobuf<7"]
        L --> M["No cmd.exe redirection misparse"]
    end

    subgraph DATA_DIR regression
        N["paths.dataDir() - honours CLAUDE_MEM_DATA_DIR"] -.removed.-> O["Hardcoded ~/.claude-mem"]
        O --> P["PID file + migration markers go to wrong path for custom data dirs"]
    end

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
src/services/infrastructure/ProcessManager.ts:23-25
**DATA_DIR hardcoding drops custom-data-directory support**

`paths.dataDir()` (previously used here) reads `CLAUDE_MEM_DATA_DIR` from the environment first, then falls back to a value from `settings.json`, then falls back to `~/.claude-mem`. The replacement hardcodes only the last fallback. Any user who has set `CLAUDE_MEM_DATA_DIR` (via env var or `settings.json`) will now have their PID file, migration marker files, and DB backup paths silently written to `~/.claude-mem` instead of the configured directory. The practical effect is that `validateWorkerPidFile` reads from a path that the worker never writes to, so the worker appears perpetually dead and the caller may spawn duplicate instances or refuse to start.

### Issue 2 of 3
src/services/integrations/WindsurfHooksInstaller.ts:144-145
**`& ` call operator assumes PowerShell; breaks cmd.exe invocation**

The `& ` prefix is PowerShell-specific: it is the call operator required to invoke a quoted executable path as a command in PowerShell. If Windsurf (or Gemini CLI in the parallel change) executes hook commands via `cmd.exe` instead of PowerShell on Windows, the leading `& ` will be interpreted as a command separator in `cmd.exe` and the hook will fail silently. The condition `process.platform === 'win32'` is applied at install time without checking which shell the target tool actually uses to invoke hooks. Confirm that Windsurf and Gemini CLI always route hook commands through PowerShell on Windows before treating this as unconditionally safe.

### Issue 3 of 3
plugin/.mcp.json:8
**MCP launcher exit sentinel 128 on Windows signal re-raise failure**

The launcher forwards `SIGTERM`, `SIGINT`, and `SIGHUP` to the child. The exit handler calls `process.kill(process.pid, sig)` when the child exits due to a signal. If this throws on Windows (where many POSIX signals are unsupported), the catch branch calls `process.exit(128)` — a non-zero, non-standard exit code. Callers that check the launcher's exit code for success/failure may misinterpret 128 as an error, even though the child itself may have exited cleanly.

_{Reviews (1): Last reviewed commit: "build: regenerate bundles from merged Wi..." | Re-trigger Greptile}

greptile-apps · 2026-05-29T01:52:05Z

+// Standard paths for PID file management
+const DATA_DIR = path.join(homedir(), '.claude-mem');
+const PID_FILE = path.join(DATA_DIR, 'worker.pid');


DATA_DIR hardcoding drops custom-data-directory support

paths.dataDir() (previously used here) reads CLAUDE_MEM_DATA_DIR from the environment first, then falls back to a value from settings.json, then falls back to ~/.claude-mem. The replacement hardcodes only the last fallback. Any user who has set CLAUDE_MEM_DATA_DIR (via env var or settings.json) will now have their PID file, migration marker files, and DB backup paths silently written to ~/.claude-mem instead of the configured directory. The practical effect is that validateWorkerPidFile reads from a path that the worker never writes to, so the worker appears perpetually dead and the caller may spawn duplicate instances or refuse to start.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/services/infrastructure/ProcessManager.ts Line: 23-25 Comment: **DATA_DIR hardcoding drops custom-data-directory support** `paths.dataDir()` (previously used here) reads `CLAUDE_MEM_DATA_DIR` from the environment first, then falls back to a value from `settings.json`, then falls back to `~/.claude-mem`. The replacement hardcodes only the last fallback. Any user who has set `CLAUDE_MEM_DATA_DIR` (via env var or `settings.json`) will now have their PID file, migration marker files, and DB backup paths silently written to `~/.claude-mem` instead of the configured directory. The practical effect is that `validateWorkerPidFile` reads from a path that the worker never writes to, so the worker appears perpetually dead and the caller may spawn duplicate instances or refuse to start. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-29T01:52:06Z


-  return `"${bunPath}" "${workerServicePath}" hook windsurf ${hookCommand}`;
+  const callOperator = process.platform === 'win32' ? '& ' : '';


& call operator assumes PowerShell; breaks cmd.exe invocation

The & prefix is PowerShell-specific: it is the call operator required to invoke a quoted executable path as a command in PowerShell. If Windsurf (or Gemini CLI in the parallel change) executes hook commands via cmd.exe instead of PowerShell on Windows, the leading & will be interpreted as a command separator in cmd.exe and the hook will fail silently. The condition process.platform === 'win32' is applied at install time without checking which shell the target tool actually uses to invoke hooks. Confirm that Windsurf and Gemini CLI always route hook commands through PowerShell on Windows before treating this as unconditionally safe.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/services/integrations/WindsurfHooksInstaller.ts Line: 144-145 Comment: **`& ` call operator assumes PowerShell; breaks cmd.exe invocation** The `& ` prefix is PowerShell-specific: it is the call operator required to invoke a quoted executable path as a command in PowerShell. If Windsurf (or Gemini CLI in the parallel change) executes hook commands via `cmd.exe` instead of PowerShell on Windows, the leading `& ` will be interpreted as a command separator in `cmd.exe` and the hook will fail silently. The condition `process.platform === 'win32'` is applied at install time without checking which shell the target tool actually uses to invoke hooks. Confirm that Windsurf and Gemini CLI always route hook commands through PowerShell on Windows before treating this as unconditionally safe. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-29T01:52:07Z

-        "-c",
-        "_C=\"${CLAUDE_CONFIG_DIR:-$HOME/.claude}\"; _E=\"${CLAUDE_PLUGIN_ROOT:-${PLUGIN_ROOT:-}}\"; _P=$({ [ -n \"$_E\" ] && printf '%s\\n' \"$_E\"; printf '%s\\n' \"$PWD/plugin\" \"$PWD\"; ls -dt \"$HOME/.codex/plugins/cache/claude-mem-local/claude-mem\"/[0-9]*/ \"$HOME/.codex/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ \"$_C/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ 2>/dev/null; printf '%s\\n' \"$_C/plugins/marketplaces/thedotmack/plugin\"; } | while IFS= read -r _R; do [ -d \"$_R/plugin/scripts\" ] && _Q=\"$_R/plugin\" || _Q=\"$_R\"; [ -f \"$_Q/scripts/mcp-server.cjs\" ] && { printf '%s\\n' \"$_Q\"; break; }; done); [ -n \"$_P\" ] || { echo \"claude-mem: mcp server not found\" >&2; exit 1; }; exec node \"$_P/scripts/mcp-server.cjs\""
+        "-e",
+        "const f=require('fs'),p=require('path'),o=require('os'),c=require('child_process');const h=o.homedir();const C=process.env.CLAUDE_CONFIG_DIR||p.join(h,'.claude');const E=process.env.CLAUDE_PLUGIN_ROOT||process.env.PLUGIN_ROOT||'';const d=process.cwd();const L=x=>{try{return f.readdirSync(x).filter(n=>/^\\d/.test(n)).map(n=>p.join(x,n)).filter(z=>{try{return f.statSync(z).isDirectory()}catch{return false}}).sort((a,b)=>f.statSync(b).mtimeMs-f.statSync(a).mtimeMs)}catch{return[]}};const K=[E,p.join(d,'plugin'),d,...L(p.join(h,'.codex/plugins/cache/claude-mem-local/claude-mem')),...L(p.join(h,'.codex/plugins/cache/thedotmack/claude-mem')),...L(p.join(C,'plugins/cache/thedotmack/claude-mem')),p.join(C,'plugins/marketplaces/thedotmack/plugin')].filter(Boolean);let R=null;for(const k of K){const r=f.existsSync(p.join(k,'plugin','scripts'))?p.join(k,'plugin'):k;if(f.existsSync(p.join(r,'scripts','mcp-server.cjs'))){R=r;break}}if(!R){process.stderr.write('claude-mem: mcp server not found\\n');process.exit(1)}const ch=c.spawn(process.execPath,[p.join(R,'scripts','mcp-server.cjs')],{stdio:'inherit'});for(const s of ['SIGTERM','SIGINT','SIGHUP'])process.on(s,()=>{try{ch.kill(s)}catch{}});ch.on('exit',(code,sig)=>{if(sig){process.removeAllListeners(sig);try{process.kill(process.pid,sig)}catch{process.exit(128)}}else process.exit(code==null?0:code)})"


MCP launcher exit sentinel 128 on Windows signal re-raise failure

The launcher forwards SIGTERM, SIGINT, and SIGHUP to the child. The exit handler calls process.kill(process.pid, sig) when the child exits due to a signal. If this throws on Windows (where many POSIX signals are unsupported), the catch branch calls process.exit(128) — a non-zero, non-standard exit code. Callers that check the launcher's exit code for success/failure may misinterpret 128 as an error, even though the child itself may have exited cleanly.

Prompt To Fix With AI

This is a comment left during a code review. Path: plugin/.mcp.json Line: 8 Comment: **MCP launcher exit sentinel 128 on Windows signal re-raise failure** The launcher forwards `SIGTERM`, `SIGINT`, and `SIGHUP` to the child. The exit handler calls `process.kill(process.pid, sig)` when the child exits due to a signal. If this throws on Windows (where many POSIX signals are unsupported), the catch branch calls `process.exit(128)` — a non-zero, non-standard exit code. Callers that check the launcher's exit code for success/failure may misinterpret 128 as an error, even though the child itself may have exited cleanly. How can I resolve this? If you propose a fix, please make it concise.

YOMXXX · 2026-05-29T02:08:49Z

Hi @thedotmack — thanks for bundling these together, the canary approach makes a lot of sense given the overlap.

My PRs in the canary

#2593 (cmd.exe metachar escaping) — agree that direct-spawn (#2502/#2668) is the better path. Happy for #2593 to sit as a no-op fallback; if direct-spawn works across environments, we can close it.

#2594 (node -e MCP launcher) — glad this was picked as the launcher approach. One note on Greptile's Issue 3 below.

#2598 (printenv-first PATH probe) — also just pushed a follow-up fix (e060a674) removing tr ' ' ':' which was corrupting Windows paths containing spaces (e.g. /c/Program Files/nodejs). That fix should probably be pulled into the canary as well.

#2599 (Windows captureProcessStartToken) — straightforward, no concerns.

On the Greptile issues

Issue 1 (DATA_DIR hardcoding in ProcessManager.ts) — this looks like a real regression. paths.dataDir() honours CLAUDE_MEM_DATA_DIR from env/settings, so hardcoding ~/.claude-mem would silently break multi-profile setups. Probably worth restoring paths.dataDir() here.

Issue 2 (PowerShell & call operator) — this came from #2507. The concern is valid: & is PowerShell-specific and would break under cmd.exe. Worth confirming with @sabinss4 which shell Windsurf/Gemini CLI actually use for hook invocation on Windows.

Issue 3 (exit code 128 on Windows signal re-raise) — from my #2594. The process.exit(128) fallback only fires when process.kill(pid, sig) throws (Windows doesn't support most POSIX signals). I can change this to process.exit(0) since the child already exited — the launcher's job is done at that point. Will push a fix.

Happy to help address Issues 1 and 3 if that would be useful — just let me know.

bigmanBass666 · 2026-05-29T13:55:33Z

Windows test report (posted via Claude Code agent on behalf of @bigmanBass666)

Environment:

Windows 11 Home China (22631)
Shell: Git Bash (MSYS2), PowerShell 5.1
claude-mem version: 13.3.0 (from installed_plugins.json)
Worker: 12.1.0 (stale, not yet restarted after upgrade)

Issue reproduced:

mcp-search MCP server fails to start — MCP error -32000
.mcp.json in 13.3.0 uses "command": "sh" with a pipe-based path resolver
Reproducing the exact sh -c command in Git Bash produces: printf: write error: Permission denied
Root cause: { producer; } | while read ... break; done — when consumer breaks early, remaining printf calls get EACCES on Windows MSYS emulated pipes (silent SIGPIPE on Linux/macOS)

Workaround confirmed working:
Patched 13.3.0/.mcp.json to use direct node invocation (same as 12.1.0):

{
  "mcpServers": {
    "mcp-search": {
      "type": "stdio",
      "command": "node",
      "args": ["${CLAUDE_PLUGIN_ROOT}/scripts/mcp-server.cjs"]
    }
  }
}

MCP server starts successfully after this change. Worker health endpoint at localhost:37777/api/health responds normally.

This confirms PR #2489's approach is correct. The sh launcher is fundamentally broken on Windows due to MSYS pipe emulation behavior. Using node directly bypasses the entire issue.

Note: The worker was still running 12.1.0 at test time (pre-restart). After a full Claude Code restart the SessionStart hook should bring up the 13.3.0 worker. The .mcp.json patch will need to be re-applied after each plugin auto-update until the canary lands.

✅ node-based launcher works
❌ sh -c pipe resolver fails on Windows (EACCES)

CaTeIM · 2026-05-29T18:06:22Z

✅ Windows test report — @CaTeIM

Environment: Windows 11 Pro 10.0.28000 · node v24.15.0 · bun 1.3.14 · uvx at ~/.local/bin/uvx
Shells exercised: PowerShell 7, Git Bash, cmd.exe (limited — see notes)

Setup

git checkout windows-canary → npm ci → npm run build → all clean, no errors.

End-to-end checks

Area	Result	Detail
`mcp-search` launcher (PR #2594, supersedes my #2489)	✅	`node -e` bootstrap spawns `mcp-server.cjs` via `child_process.spawn(process.execPath, …, {stdio:'inherit'})` with SIGTERM/INT/HUP forwarding. Real `initialize` handshake from PowerShell with `UseShellExecute=false` (same path Claude Code uses) returns proper `serverInfo` for `claude-mem 13.3.0`. Confirms the "Connection closed" failure from my #2489 is still resolved by #2594's approach.
Plugin runtime deps (my #2658 Fix A)	✅	`plugin/package.json` still declares `ajv ^8.20.0` + `ajv-formats ^3.0.1`. Build guardrail preserved.
Auto-install on upgrade (my #2658 Fix B)	✅	`ensureRuntimeDeps()` in `version-check.js` tested across all three paths: broken (spawns detached `bun install`, marker written in the version root per the Greptile review fix, log opened), cooldown (second call within 10 min no-ops), healthy (node_modules present → no-op).
Chroma `uvx` direct-spawn (PR #2502/#2668)	✅	`tests/services/sync/chroma-mcp-manager-windows-spawn.test.ts` "spawns uvx directly on Windows (no cmd.exe /c wrapper)" passes (2/2 on canary; fails on `main`).

Test suite (`tests/infrastructure` + `tests/services/sync`)

Canary on Windows: 163 pass / 15 fail / 8 skip (12 unique failing test names after dedup) vs main on Windows: 165 pass / 13 fail / 8 skip (12 unique).

Name-only diff vs main (timings stripped):

Canary fixes 1 test: ChromaMcpManager Windows spawn (2426) > spawns uvx directly on Windows (no cmd.exe /c wrapper) ✅
Canary introduces 1 failure: ProcessManager > captureProcessStartToken > returns null on win32 (liveness-only fallback path) ⚠️ — this test (tests/infrastructure/process-manager.test.ts:384) forces process.platform = 'win32' and asserts captureProcessStartToken(pid) returns null. PR fix(supervisor): make Windows captureProcessStartToken actually return a token (closes #2578) #2599 (@YOMXXX) changes this function to actually return a real token on win32, which makes the assertion fail. The test needs to be updated or removed alongside fix(supervisor): make Windows captureProcessStartToken actually return a token (closes #2578) #2599 — the parenthetical "(liveness-only fallback path)" described the OLD pre-fix(supervisor): make Windows captureProcessStartToken actually return a token (closes #2578) #2599 behavior.
11 common failures (same on canary and main on Windows) = pre-existing Windows baseline unrelated to this merge: runOneTimeV12_4_3Cleanup (4), HealthMonitor isPortInUse / waitForPortFree (4), ProcessManager.resolveWorkerRuntimePath, ChromaMcpManager singleton (2). The maintainer doesn't see these on macOS; they look like Windows-specific test plumbing issues (port reuse semantics, Windows process trees, file-locking during cleanup), not regressions from this canary.

Live evidence for #2654 / #2526 (EPIPE suppression)

Independent of the canary, my running Claude Code session — using the currently-installed (not canary) plugin — is generating the following on every PostToolUse observation hook right now:

/usr/bin/bash: line 1: printf: write error: Permission denied

This is exactly the EPIPE/EACCES leakage that #2654 suppresses with 2>/dev/null and that #2526 also targets. Strong real-world signal that these belong in the canary as-is.

Live deploy

After the isolated checks above, I pointed my Claude Code at the canary by copying the built plugin/ contents into the active plugin cache at ~/.claude/plugins/cache/thedotmack/claude-mem/13.3.0/ (the dir Claude Code actually reads — not the marketplace clone; that was the bit that tripped me up in #2489's earlier sessions). After the worker restart:

claude mcp list shows plugin:claude-mem:mcp-search as node -e <#2594 bootstrap with c.spawn + signal forwarding> — ✓ Connected.
Worker healthy on http://127.0.0.1:37777.
MCP initialize handshake against the cache's mcp-server.cjs returns proper serverInfo.

To pick this up in my live GUI session I'll do a full Claude Code restart.

Notes

cmd.exe shell test was inconclusive — my repro had quoting issues with paths containing spaces ("Google Drive"). PowerShell's UseShellExecute=false path is what Claude Code actually uses, so this is more representative than cmd.exe anyway.
I'm happy to re-run any specific shell scenario or attach raw logs — ping me.

🤖 Generated with Claude Code

CaTeIM · 2026-05-29T19:22:18Z

Follow-up after live restart — confirming #2654 in production: after switching my Claude Code session to the canary plugin (cache 13.3.0 deployed + restart), my PostToolUse observation hook no longer emits printf: write error: Permission denied. Same machine, same Git Bash, same hook flow — the only thing that changed is the bundled hook now suppresses the broken pipe with 2>/dev/null. Before the restart, every Bash tool call in this very session was producing the EPIPE; after the restart, zero occurrences across many subsequent calls.

cc @Bruce583102198 — your fix is doing exactly what it claims.

alessandropcostabr · 2026-05-30T12:24:52Z

Windows 11 Home SL 26200 — full triple-shell report (@alessandropcostabr)

Environment: Windows 11 Home SL 10.0.26200.8457 · node v24.11.1 · bun 1.3.14 · git 2.53 · Claude Code 2.1.145

Deploy

Plugin marketplaces/thedotmack/plugin/ replaced byte-for-byte by canary bcf437fb's pre-built plugin/ (md5 C7C58737..., 4,031,405 bytes). bun install ran clean (706 packages); npm run build fails on @better-auth/kysely-adapter ↔ kysely drift — irrelevant since the canary commit already ships regenerated bundles. Backup kept at plugin.bak-pre-canary-... (md5 744A9AB5..., the previous 23/05 build).

Discriminator counts in the deployed bundle

Marker	Count	Source
`/api/context/inject` (canary client path)	7	—
`/v1/context/inject` (legacy)	0	(intentionally removed)
`isTTY` stdin guard	1	#2188
SessionStart lifecycle exempt	1	#2636
`node -e` MCP launcher (`.mcp.json`)	✅	#2594
BOM strip in `SettingsDefaultsManager`	present	#2636
`@better-auth/api-key` (`x-api-key` default)	1	(see note below)

Per-shell results

Shell	`claude mcp list`	Live tool calls	`printf write error`	Console flash
PowerShell 7	`claude-mem ✓ Connected` (+ 6 others)	observations STORED (obsId 34634)	0	none
Git Bash (MSYS)	`claude-mem ✓ Connected`	session registered, tool calls captured	0 (confirms #2654)	none
cmd.exe (interactive)	`claude-mem ✓ Connected` (all 7 servers)	`claude --version` → `2.1.145`	0	none

Live capture E2E

Worker uptime 30+ min post-restart, 0 ERROR / 0 FATAL / 0 server-beta-fallback in the 74-line log. Pipeline notebook → CF → Mach10 → Express → BullMQ → Redis → Qdrant flowing: 68 events / 10 min acknowledged server-side, with observations generated downstream.

Worker invocation:

"C:\Users\aless\.bun\bin\bun.exe" C:\Users\aless\.claude\plugins\marketplaces\thedotmack\plugin\scripts\worker-service.cjs --daemon

Bonus — Codex CLI MCP integration confirmed

Wired [mcp_servers.claude-mem] into ~/.codex/config.toml pointing at the same mcp-server.cjs (via node direct, mirroring the .mcp.json durable-fix pattern). Inside codex REPL, /mcp lists claude-mem alongside native codex_apps with all 21 tools available (search, smart_search, memory_add, observation_search, timeline, etc.). Auth: Unsupported is expected for stdio-local MCP servers — not a defect. Codex capture hooks (Peça 2) still pending — .codex-plugin/ in the bundle only ships plugin.json without handlers, so Codex sessions can read the claude-mem corpus but won't generate observations of their own yet.

Non-blocking note for maintainers running server-beta behind a proxy

The deployed client bundle uses X-Api-Key on POST /v1/events (the @better-auth/api-key default). The server-beta middleware in src/server/middleware/{auth,postgres-auth}.ts only reads Authorization: Bearer. Anyone running server-beta behind a header-normalizing proxy (Cloudflare → Express in my case) gets 401 {"error":"Unauthorized","message":"Missing bearer API key"} even with a valid key. Direct-LAN setups don't hit it because no proxy is in the path.

I opened #2719 to make the server middleware accept X-Api-Key as fallback to Bearer — fully backward-compatible (Bearer remains canonical and wins when both are sent), 3 new unit tests passing alongside the existing 12 in tests/server/auth-api-key.test.ts, verified end-to-end against my deployment.

Net result

Canary bcf437fb is green across all three Windows shells for me. Zero changes outside the deployed bundle, backup retained for instant rollback if needed.

cc @thedotmack @CaTeIM @YOMXXX @Bruce583102198

🤖 Generated with Claude Code

…leware (#2719) * fix(server-beta): accept X-Api-Key as fallback to Bearer in auth middleware The server-beta auth middleware (`requireServerAuth` and `requirePostgresServerAuth`) only reads `Authorization: Bearer <key>`. Clients using `@better-auth/api-key` default config send raw keys via the `X-Api-Key` header — including the worker bundle shipped from the Windows-canary line (PR #2699). The mismatch produces: POST /v1/events X-Api-Key: cmem_... → 401 {"error":"Unauthorized","message":"Missing bearer API key"} Direct-LAN setups don't hit this because no proxy strips/normalizes headers and the worker's own retry path eventually round-trips; remote setups behind a proxy (Cloudflare → Express in our case) fail closed and never capture. Fix: middleware accepts either header. Bearer remains canonical so existing clients are unaffected; `X-Api-Key` is the fallback when Bearer is absent. Same change applied to the bun:sqlite-backed `auth.ts` and the Postgres-backed `postgres-auth.ts` (the latter is what server-beta actually uses). Tests added in `tests/server/auth-api-key.test.ts`: - accepts X-Api-Key when Bearer absent (positive) - prefers Bearer when both present (defense-in-depth) - rejects requests with neither header (negative regression) No new test for postgres-auth.ts (no existing suite that wires the PG pool); the parser logic is byte-identical to auth.ts so the bun:sqlite suite covers the behavioral contract. Verified end-to-end against a deployed server-beta: curl -X POST https://mem.late.app.br/v1/events \ -H 'X-Api-Key: cmem_...' -d '{"events":[]}' # before: 401 "Missing bearer API key" # after: 400 ValidationError (auth ok, body invalid — expected) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server-beta): update 401 message to mention both accepted auth headers Addresses Greptile P2 feedback on #2719: the 401 body still said "Missing bearer API key" after the middleware started accepting X-Api-Key, which made debugging harder for clients sending only X-Api-Key with a typo'd header name. Message now lists both accepted forms in both auth.ts and postgres-auth.ts, keeping the two backends consistent. Test that asserts the body in the both-absent rejection path updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Alessandro Costa <alessandro@claudio.dev> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tomeyerman · 2026-06-06T20:49:28Z

Independent Windows validation: #2806 fixed the PATH gap, but the cmd.exe quoting defect this PR targets still breaks semantic search on a clean v13.4.0 / latest main.

Flagging this so closing #2790 isn't read as "Windows Chroma fixed" — there are two distinct root causes here:

fix(chroma): put uv's bin dir on the chroma-mcp child PATH (part of #2790) #2806 (the PATH fix) is correct and necessary — it ensures uvx is resolvable on the chroma-mcp child PATH.
It is not sufficient. The spawn in connectInternal on main is unchanged — it still does ['/c', 'uvx', ...commandArgs.map(quoteForCmdExe)], so cmd.exe parses the >/< in the dep pins as redirection operators before uvx ever runs. The quoteForCmdExe helper added in v13.4.0 (Clear defect backlog: plans 01–11 + standalone fixes (suite 46→0 fail, typecheck 24→0) #2701) does not survive Node's shell:false re-quoting.

Minimal repro on Windows 11, claude-mem 13.4.0, uvx 0.9.11, Python 3.13.9, with uvx confirmed on PATH (i.e. this is post-#2806 behavior — the PATH precondition is already satisfied, which isolates the failure to the >/< quoting, not PATH):

(A) Direct spawn — exactly what this PR does — WORKS:

uvx --python 3.13 --with "onnxruntime>=1.20" --with "protobuf<7" chroma-mcp==0.2.6 --client-type persistent --data-dir <dir>
  → {"result":{ ... "serverInfo":{"name":"chroma","version":"1.6.0"}}}    (boots, waits for stdio)

(B) cmd.exe /c uvx — what main's connectInternal still does — FAILS in ~270ms:

cmd /c uvx --python 3.13 --with onnxruntime>=1.20 --with protobuf<7 chroma-mcp==0.2.6 --client-type persistent --data-dir <dir>
  → "The system cannot find the file specified"    (cmd parses <7 as stdin redirection from a file named "7")

Same args, same machine — the only difference is the cmd.exe wrapper, and it dies in ~270 ms on the redirection.

So this PR (drop the cmd.exe wrapper, spawn uvx directly) is the actual fix for the quoting defect, and #2806 is the complementary PATH half. With #2806 now merged, the PATH confound is gone, so this is the remaining blocker for Windows semantic search. The corresponding issues (#2716, #2667) are currently closed as not planned despite the defect still reproducing on latest main.

Happy to check out this branch, build it, and confirm Chroma connects end-to-end on Windows 11 if that helps land it.

rodboev · 2026-06-15T22:03:02Z

One more Windows rollup note from the newer follow-up wave: there are a few additional open Windows-focused PRs now that may be worth tracking alongside this canary if you do another bundle or maintainer pass:

fix: install Windows Claude Code hooks without bash #2945 fix: install Windows Claude Code hooks without bash ([Windows] Hook execution flashes black console window on every tool use #2869). This is the broader issue-closing slice and effectively supersedes the narrower Bun-popup-only PR fix(windows): suppress console window flash in findBun() spawnSync call (closes #2869) #2890.
fix(windows): carry windowsHide through the remaining live Windows spawn paths (closes #2900) #2921 fix(windows): carry windowsHide through the remaining live Windows spawn paths ([Windows] git subprocess calls in worker-service.cjs lack windowsHide -> console windows flash constantly #2900).
fix(setup): restore tree-sitter CLI on clean Windows installs (closes #2910) #2918 fix(setup): restore tree-sitter CLI on clean Windows installs (v13.5.6: smart_search/smart_outline/smart_unfold return 0 symbols on clean Windows install - tree-sitter-cli binary never downloaded (--ignore-scripts), #1247's extensionless existsSync check still unfixed #2910).
fix(context): avoid blocking Claude SessionStart on Windows cold boot (closes #2903) #2917 fix(context): avoid blocking Claude SessionStart on Windows cold boot ([Windows] VS Code extension's 60s init deadline exceeded by claude-mem's slow worker cold-boot — extension unusable (terminal works) #2903).
fix(windows): suppress console window flash in findBun() spawnSync call (closes #2869) #2890 fix(windows): suppress console window flash in findBun() if you still want to keep the narrow predecessor visible separately, though fix: install Windows Claude Code hooks without bash #2945 absorbs that piece.

For #2945 specifically: the Linux/macOS-facing jobs are green on the current head, and the only remaining red check is the same Windows npm-install/node-gyp Visual Studio toolchain failure already showing up on other open Windows PRs rather than a new branch-specific regression.

thedotmack and others added 30 commits April 29, 2026 13:19

docs: regenerate CHANGELOG for v13.3.0

d532723

fix(hooks): remove tr space-to-colon that corrupts Windows paths with…

e060a67

… spaces

Merge branch 'pr-2636' into windows-canary

9760556

Merge branch 'pr-2599' into windows-canary

48bb064

Merge branch 'pr-2502' into windows-canary

25be77c

Merge branch 'pr-2593' into windows-canary

631aa11

# Conflicts: # plugin/scripts/worker-service.cjs # src/services/sync/ChromaMcpManager.ts

Merge branch 'pr-2668' into windows-canary

c87de0f

# Conflicts: # src/services/sync/ChromaMcpManager.ts

Merge branch 'pr-2208' into windows-canary

3dc8143

# Conflicts: # plugin/scripts/bun-runner.js # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # src/services/infrastructure/ProcessManager.ts # src/supervisor/process-registry.ts

Merge branch 'pr-2507' into windows-canary

5531b2f

# Conflicts: # plugin/scripts/server-beta-service.cjs # plugin/scripts/worker-service.cjs # plugin/ui/viewer-bundle.js

thedotmack and others added 5 commits May 28, 2026 18:37

Merge branch 'pr-2598' into windows-canary

207fe00

Merge branch 'pr-2654' into windows-canary

68040a0

# Conflicts: # plugin/hooks/hooks.json

Merge branch 'pr-2489' into windows-canary

a28382a

Merge branch 'pr-2594' into windows-canary

b6573ac

# Conflicts: # plugin/.mcp.json

greptile-apps Bot reviewed May 29, 2026

View reviewed changes

YOMXXX mentioned this pull request May 29, 2026

fix(mcp): replace sh launcher with cross-platform node -e (closes #2461) #2594

Closed

alessandropcostabr mentioned this pull request May 30, 2026

fix(server-beta): accept X-Api-Key as fallback to Bearer in auth middleware #2719

Merged

4 tasks

MrDea1h mentioned this pull request May 30, 2026

Windows: npm-global Claude Code install mis-detected as "desktop app", breaking headless memory generation #2723

Closed

kinyenzm mentioned this pull request May 30, 2026

fix(hooks): drop pipe break that triggers EACCES "printf: write error: Permission denied" on Windows/Cygwin #2725

Closed

tomeyerman mentioned this pull request Jun 6, 2026

Regression (v13.4.0): chroma-mcp still dies instantly on Windows — #2701 cmd.exe quoting fix is incomplete #2716

Closed

YOMXXX mentioned this pull request Jun 7, 2026

fix(chroma): spawn uvx directly on Windows instead of via cmd.exe (#2716, #2667) #2816

Closed

ghost mentioned this pull request Jun 10, 2026

[Windows] Hook execution flashes black console window on every tool use #2869

Open

thedotmack mentioned this pull request Jun 16, 2026

tracking: shared Windows CI noise — tree-sitter-css node-gyp rebuild fails on GitHub hosted runner (no MSVC) #2974

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🪟 Windows Canary: combined build/test of all open Windows fixes#2699

🪟 Windows Canary: combined build/test of all open Windows fixes#2699
thedotmack wants to merge 35 commits into
mainfrom
windows-canary

thedotmack commented May 29, 2026

Uh oh!

greptile-apps Bot commented May 29, 2026

Uh oh!

greptile-apps Bot May 29, 2026

Uh oh!

greptile-apps Bot May 29, 2026

Uh oh!

greptile-apps Bot May 29, 2026

Uh oh!

YOMXXX commented May 29, 2026

Uh oh!

bigmanBass666 commented May 29, 2026

Uh oh!

CaTeIM commented May 29, 2026

Uh oh!

CaTeIM commented May 29, 2026

Uh oh!

alessandropcostabr commented May 30, 2026

Uh oh!

tomeyerman commented Jun 6, 2026

Uh oh!

rodboev commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants


		return `"${bunPath}" "${workerServicePath}" hook windsurf ${hookCommand}`;
		const callOperator = process.platform === 'win32' ? '& ' : '';

Uh oh!

Conversation

thedotmack commented May 29, 2026

🪟 Windows Canary — combined build/test branch for all open Windows fixes

⚠️ Calling all Windows contributors

✅ PRs bundled in this canary (12)

🔀 Conflict decisions (please sanity-check these)

🚫 Deliberately excluded (need a rebase or a decision)

🧪 Verification done locally (macOS — Windows still needs you)

Notes

Uh oh!

greptile-apps Bot commented May 29, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

YOMXXX commented May 29, 2026

My PRs in the canary

On the Greptile issues

Uh oh!

bigmanBass666 commented May 29, 2026

Uh oh!

CaTeIM commented May 29, 2026

Setup

End-to-end checks

Test suite (tests/infrastructure + tests/services/sync)

Live evidence for #2654 / #2526 (EPIPE suppression)

Live deploy

Notes

Uh oh!

CaTeIM commented May 29, 2026

Uh oh!

alessandropcostabr commented May 30, 2026

Windows 11 Home SL 26200 — full triple-shell report (@alessandropcostabr)

Deploy

Discriminator counts in the deployed bundle

Per-shell results

Live capture E2E

Bonus — Codex CLI MCP integration confirmed

Non-blocking note for maintainers running server-beta behind a proxy

Net result

Uh oh!

tomeyerman commented Jun 6, 2026

(A) Direct spawn — exactly what this PR does — WORKS:

(B) cmd.exe /c uvx — what main's connectInternal still does — FAILS in ~270ms:

Uh oh!

rodboev commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Test suite (`tests/infrastructure` + `tests/services/sync`)