🪟 Windows Canary: combined build/test of all open Windows fixes#2699
🪟 Windows Canary: combined build/test of all open Windows fixes#2699thedotmack wants to merge 35 commits into
Conversation
Re-applies the PowerShell Start-Process -WindowStyle Hidden daemon spawn that PR #751 (e6ae017) introduced and commit d13662d reverted. Also fixes the bun-runner cmd /c popup, sets detached:false on Windows for SDK subprocesses (so windowsHide actually works and claude.exe doesn't outlive the worker), and adds windows-latest CI to prevent regression. - ProcessManager.spawnDaemon: PowerShell -EncodedCommand branch back. Returns 0 sentinel on success — callers MUST use pid === undefined for failure detection, never falsy checks. - bun-runner.js: drop "cmd /c" wrapper. shell:true lets Node resolve bun.cmd via PATHEXT and respects windowsHide (the explicit cmd.exe wrapper was popping a visible window per hook — #2150, #2186). - process-registry.ts spawnSdkProcess: detached:false on Windows. Mixing detached:true with windowsHide:true is documented-undefined on Windows; with detached:false, windowsHide actually hides claude.exe and the SDK subprocess dies with the parent (#2190, #2198). - .github/workflows/windows.yml: smoke test counts visible cmd windows before/after spawn + grep guard that the Start-Process branch survives. WSL bash stdin (#2188) is acknowledged but deferred — the bash → node pipe boundary needs a real Windows VM to test, beyond this PR's scope. PTY for Claude CLI SDK mode (#2173, #2177) is also deferred per plan. Closes #2150, #2169, #2186, #2187, #2190, #2198 Refs #2183 (Windows perf — same root cause) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
actions/setup-node@v4 cache: npm requires a package-lock.json and this project uses Bun (only bunfig.toml exists at root). Drop the cache directive, switch npm ci to npm install --no-audit --no-fund, and narrow the build step to npm run build — build-and-sync also runs a marketplace sync + worker restart that hardcodes ~/.claude/plugins, which doesn't exist on CI.
…oke with static guards CodeRabbit majors on #2208: 1. plugin/scripts/bun-runner.js — shell:true with separate spawnArgs triggers DEP0190 on Node 22+ and breaks paths/args containing spaces. Build a single fully-quoted command string (mirroring findBun()'s 'where bun' approach) and pass spawnArgs=[]. 2. .github/workflows/windows.yml — the dynamic smoke step that counted visible cmd windows around 'claude-mem start' exits 1 on 'claude-mem is not installed' before exercising the spawn path, AND PowerShell try/catch doesn't suppress native exit codes regardless. Replace with three static regression guards covering the exact patterns PR #2208 protects: - PowerShell Start-Process + WindowStyle Hidden in spawnDaemon - bun-runner shell:true with empty spawnArgs (DEP0190 guard) - windowsHide set on SDK spawn factory (issue #2190)
The "Anti-regression" steps grep ProcessManager.ts/bun-runner.js/process-registry.ts for specific strings (Start-Process, WindowStyle Hidden, shell:true, windowsHide). Tripwires aren't fixes — they make refactoring harder forever and verify nothing the actual Windows build doesn't already verify. The npm install + npm run build on windows-latest is the real guard.
ChromaMcpManager wrapped uvx as `cmd.exe /c uvx ...` on Windows. cmd.exe reassembles argv into a single command line and re-parses the dep-override `--with` values: `onnxruntime>=1.20` becomes "redirect stdout to file `=1.20`" and `protobuf<7` becomes "redirect stdin from file `7`". The Python subprocess never starts, so every chroma-mcp tool call closed in ~30 ms with `MCP error -32000: Connection closed`. Node's child_process.spawn (via cross-spawn) resolves `uvx` -> `uvx.exe` on PATH on Windows the same way it does on POSIX, so the cmd.exe wrap is not needed. Drop it and call uvx directly on all platforms. Adds a regression test that pins the spawn command to `uvx` and asserts the dep-override version specifiers reach uvx as argv elements (not as cmd.exe redirection). Co-Authored-By: Claude <noreply@anthropic.com>
On Windows, PowerShell interprets a quoted absolute path as a string literal rather than an executable. Hook commands like: C:\path\to\bun.exe C:\path\to\worker-service.cjs hook ... throw an exception because PowerShell never invokes them as a process. Prefixing with the call operator (&) fixes this: & C:\path\to\bun.exe C:\path\to\worker-service.cjs hook ... Apply the fix to all three hook installers: GeminiCliHooksInstaller, CursorHooksInstaller, and WindsurfHooksInstaller. The call operator is only added when process.platform === win32; Linux/macOS are unchanged. Users already experiencing this issue can re-run the install command to regenerate their settings.json with the corrected hook commands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #2585, #2591, #2495, #2429, #2405. On Windows the worker spawns chroma-mcp via `cmd.exe /c uvx --with onnxruntime>=1.20 --with protobuf<7 chroma-mcp ...`. cmd.exe interprets unquoted `<`, `>`, `&`, `|` in the command line as redirection / pipe metacharacters before launching uvx, so the version pins are parsed as I/O redirects to files named `7` / `=1.20`. The child dies in ~50ms before the MCP handshake completes, surfacing as `MCP error -32000: Connection closed` on every Chroma sync. Node's child_process auto-quoting only triggers on whitespace, so these metacharacters reach cmd.exe raw. Five separate issues report the same symptom on Windows 10/11 (PowerShell, cmd, Git Bash). Fix: prefix each cmd.exe metacharacter (`^ < > & |`) with cmd's own escape character `^` so it's treated as literal. POSIX path is unchanged; the escape only runs when `process.platform === 'win32'`. Verification: a freshly-built worker-service.cjs contains `p.replace(/([\^<>&|])/g,"^$1")` and the resulting command line is `cmd.exe /c uvx --with onnxruntime^>=1.20 --with protobuf^<7 ...`, which the issue reporter (#2585) confirmed lets chroma-mcp connect in ~6 seconds and backfill complete. The escape character class includes `^` itself so an arg containing a literal caret is escaped to `^^`, matching cmd.exe's quoting rules.
The mcp-search MCP server in plugin/.mcp.json launched via:
"command": "sh",
"args": ["-c", "_C=... ls -dt ... | while read ... exec node ..."]
On Windows, neither cmd nor PowerShell ships `sh`, and most users do
not have Git Bash on the system PATH that Claude Desktop / Claude Code
spawns from. Result: `/mcp` panel shows mcp-search failed with
ENOENT, blocking all mem-search skill usage on Windows. Same launcher
also breaks any host that strips PATH before spawning MCP servers.
Fix: rewrite the launcher as inline node -e. Node is guaranteed to be
on PATH for any host running claude-mem (the plugin's other scripts
require it). The new launcher preserves the exact same fallback search
order:
1. CLAUDE_PLUGIN_ROOT / PLUGIN_ROOT env (host-injected)
2. <cwd>/plugin
3. <cwd>
4. ~/.codex/plugins/cache/claude-mem-local/claude-mem/<version>/
(mtime-descending; matches `ls -dt` semantics)
5. ~/.codex/plugins/cache/thedotmack/claude-mem/<version>/
6. <CLAUDE_CONFIG_DIR or ~/.claude>/plugins/cache/thedotmack/claude-mem/<version>/
7. <CLAUDE_CONFIG_DIR or ~/.claude>/plugins/marketplaces/thedotmack/plugin
Both the codex and claude cache fallback paths are preserved as
literal strings so build-hooks.js's distribution check still passes.
Verification:
- JSON parses cleanly
- node --check on the inline JS reports no syntax errors
- Running the launcher logic on the repo cwd resolves
PLUGIN_ROOT to <cwd>/plugin (same as the old sh script)
- npm run build's mcp-search assertion passes
No change to runtime behavior on macOS / Linux — node is the same
spawn target the old sh script ultimately exec'd into. The only
difference is the search-and-launch logic now runs in node itself
rather than in a sub-shell.
Addresses Greptile review on #2593. cmd.exe expands `%VAR%` against the process environment even inside `cmd /c`, so a user-supplied value passed via buildCommandArgs() that happens to contain `%` characters — most plausibly `chromaApiKey` when CLAUDE_MEM_CHROMA_API_KEY is set to a password-style string — would be silently mangled: `%TOKEN%` becomes the value of $TOKEN (or empty if unset). The result is an opaque auth failure that looks unrelated to Windows quoting. Fix: double `%` to `%%` before the caret pass. Order matters — running the caret replacement first would convert `%` to `^%` which cmd.exe then interprets as "literal %", but the original intent (literal % staying literal) requires the canonical `%%` form because escaping with `^` only works inside double-quoted contexts (we're passing args via /c, not quoted). Verified: rebuilt worker-service.cjs contains the chained `p.replace(/%/g,"%%").replace(/([\^<>&|])/g,"^$1")`. Bundle size unchanged. POSIX path still bypasses both replacements.
Addresses both points in the Greptile review on #2594. The previous launcher had two POSIX regressions vs. the old `exec node ...` approach: 1. Signal forwarding gap: the MCP host holds the launcher's PID, not mcp-server.cjs's. SIGTERM to the launcher would exit the launcher via Node's default handler but leave the child orphaned, holding its stdio descriptors / file locks. 2. Signal-death exit code: spawn's `exit` callback fires with `(code, signal)`. When the child dies via signal, `code` is null and `signal` is e.g. `'SIGTERM'`. The previous handler returned `0` (clean exit), masking the real cause from the host. Fix: - Register handlers for SIGTERM/SIGINT/SIGHUP on the launcher that forward to `ch.kill(sig)`. try/catch in case the child is already dead. - In `ch.on('exit')`, when signal is non-null, deregister our own handler for that signal and re-raise it via `process.kill(process.pid, sig)`. The deregister step is critical — Node calls registered handlers instead of dying, so without it the launcher would catch the signal and process.exit(0) instead of inheriting the signal-death semantics (waitpid would see WIFEXITED instead of WIFSIGNALED, and the host would see exit code 0). The fallback `process.exit(128)` covers exotic platforms where re-raise is rejected. Verification (sleep child standin): launcher=9617 child=9630 OK: child died launcher exit=143 (expect 128+15=143) POSIX hosts now see the launcher process die with the same signal the host sent, matching the old `exec node` semantics. Windows behavior unchanged (it never had the orphan problem — the OS reaps the process tree).
… pattern Closes #2589, #2588, #2461, #2439. Six hooks in plugin/hooks/hooks.json (SessionStart×2, UserPromptSubmit, PostToolUse, PreToolUse, Stop) start with: export PATH="$($SHELL -lc 'echo $PATH' 2>/dev/null):$PATH"; ... Under Git Bash on Windows (MSYS bash 5.x), when stdin is not a TTY (which it never is for a hook subprocess), `$SHELL -lc` returns empty or fails in a way that bash interprets as a broken pipe, producing the well-documented: /usr/bin/bash: line 1: printf: write error: Permission denied cascading through every subsequent pipe in the hook command. Net effect on Windows: every hook fires this stderr noise on every invocation, and downstream `printf | while IFS=read -r` blocks get spurious EPIPE/EACCES errors. Fix: switch to the same battle-tested probe that plugin/hooks/codex-hooks.json already uses: _HP=$(printenv PATH 2>/dev/null || true); if [ -z "$_HP" ] && [ -n "${SHELL:-}" ]; then _HP=$("$SHELL" -lc 'printf %s "$PATH"' 2>/dev/null || true); fi; _HP=$(printf '%s' "$_HP" | tr ' ' ':'); export PATH="${_HP:+$_HP:}$PATH"; Three improvements vs. the broken version: 1. `printenv PATH` is the canonical way to read an environment variable from a POSIX shell — works under bash/zsh/dash/Git Bash, no TTY dependency, no pipe to break. The login-shell probe runs only as a fallback when printenv returns empty (which would only happen if PATH is genuinely unset). 2. `"$SHELL" -lc 'printf %s "$PATH"'` (quoted, with printf instead of echo) is the safer form: `echo` interprets backslashes in some shells, `printf %s` is unambiguous. 3. `tr ' ' ':'` on the captured PATH defends against Git Bash / MSYS quirks where a path containing literal spaces could break downstream PATH parsing. The Setup hook (line 11) already uses a hand-rolled PATH that includes nvm / brew / /usr/local — it doesn't share the broken pattern and is intentionally left untouched. This is the same probe codex-hooks.json (used by the Codex CLI integration) has shipped with for some time — it's been validated across the same matrix of shells we need to support here. Verification: $ grep -c "SHELL -lc 'echo \$PATH'" plugin/hooks/hooks.json 0 $ grep -c "printenv PATH" plugin/hooks/hooks.json 6 $ node -e "JSON.parse(...)" # JSON OK $ bash -c '_HP=$(printenv PATH ...); ... echo "len=$(echo $PATH | wc -c)"' PATH length=3626 contains-usr-local=1 Note on scope: this PR addresses the root cause (the broken $SHELL -lc probe). PR #2526 silenced the downstream printf EPIPE symptom by adding 2>/dev/null to the consumer pipe, but the noise was still being generated. With this fix, the symptom goes away entirely because the upstream probe no longer fails.
…n a token Closes #2578. Before this change, `captureProcessStartToken()` had: if (process.platform === 'win32') { return null; } …which made `verifyPidFileOwnership()` short-circuit at: const currentToken = captureProcessStartToken(info.pid); if (currentToken === null) return true; // <-- always on Windows The net effect: on Windows, the PID-reuse defense was disabled. When claude-mem's worker died and Windows reassigned its PID to an unrelated process (issue #2578 reports SignalRgbLauncher.exe winning the lottery), `claude-mem status` happily reported the worker as healthy and `start` refused to launch a fresh one, leaving the user with a permanent deadlock until they manually deleted `worker.pid`. This commit lights up the Windows branch using PowerShell `Get-Process -Id <pid>` and composes the start token from `(StartTime.Ticks | ProcessName)`. The combination handles both flavors of PID reuse: - **Same image restarted** — e.g. bun.exe crashed and the OS immediately assigned the freed PID to another bun.exe spawn. StartTime.Ticks (100-nanosecond resolution) differs. - **Different image inherits the PID** — the #2578 case where an unrelated app like SignalRgbLauncher.exe ends up with claude-mem's old PID. ProcessName differs. `Get-Process` is preferred over `Get-CimInstance Win32_Process` because it's ~3x faster on startup and works without elevation for processes owned by the current user — which is the only case that matters for a user-installed CC plugin. Failure modes: - Process doesn't exist → PowerShell's `-ErrorAction Stop` throws, the catch returns empty output, captureProcessStartToken returns null, verifyPidFileOwnership falls back to "PID alive → trust" (matches the old behavior, no regression on this edge). - Get-Process succeeds but stdout is empty → return null (same fallback). - PowerShell spawn itself fails (extremely rare, e.g. ENOENT because powershell.exe is gone) → return null (same fallback). Cost: ~300–500ms PowerShell warm-up per call. This function is on the worker start / stop / status path, not a hot loop, so the latency is acceptable. POSIX paths (linux /proc, mac/BSD `ps -p $pid -o lstart=`) are untouched. Verification on macOS (POSIX branch unaffected): $ node -e 'const{spawnSync}=require("child_process"); console.log(spawnSync("ps",["-p",String(process.pid),"-o","lstart="],{encoding:"utf-8"}).stdout.trim())' Thu May 21 21:38:20 2026 Windows verification deferred to follow-up — the PowerShell command itself is straight `Get-Process` so the failure mode if anything is wrong is "captureProcessStartToken returns null", which is exactly what the function returned before this PR. So the worst-case regression on Windows is a no-op; the best case (and intended case) is that PID reuse finally gets caught.
…Greptile) Addresses the Greptile review on #2599. Node.js's `spawnSync` reports failures via two channels: - `result.status` — set to null when the child never exited normally (timeout, killed by signal); set to the integer exit code otherwise. - `result.error` — set to the Error instance when the spawn itself failed (ENOENT, EACCES, etc.), even before the child ran. The Windows branch (added in this PR) and the BSD/mac branch (long standing) both only checked `result.status !== 0`. That works today because `status` is null on every spawn failure — null !== 0 is true, so the function falls through to `return null` correctly. But it leaves an implicit assumption about Node's `spawnSync` semantics that a reader has to reconstruct. Updated both branches to the idiomatic `result.error || result.status !== 0` guard. Semantically equivalent today; future-proof if Node ever changes spawn-failure reporting; and makes the intent explicit to the next person reading this code. Both branches share the same one-line guard now (Windows + BSD/mac), keeping them symmetric. Linux branch reads /proc directly so it doesn't go through spawnSync. esbuild minification preserves the new `||` guard — verified that the rebuilt worker-service.cjs contains `r.error||r.status` and `e.error||e.status` (esbuild renamed variables but kept the logic). No behavior change on any platform; this is a defense-in-depth clarification of the existing fallback path.
Claude Code spawns MCP servers WITHOUT a shell, so the `.mcp.json` `command` is resolved against PATH directly. On Windows `sh` is not on PATH (Git Bash is not guaranteed to be there), so `command: "sh"` fails with ENOENT and the mcp-search server never starts. Switch only the bundled plugin/.mcp.json launcher to `command: "node"` (node is always present as the runtime) and perform the same plugin-root resolution in JS, then require() the resolved mcp-server.cjs. Hooks are intentionally left untouched: they run via "shell": "bash", which Claude Code provides itself, so the maintainer's sh-based hook launchers keep working on Windows. Only the shell-less MCP spawn needed a node bootstrap. Tests: the MCP-startup assertions now validate the node launcher; the hook-command assertions remain unchanged (still sh).
…ands from #2188 stdin guard Two defense-in-depth fixes for Windows environments: 1. SettingsDefaultsManager: strip UTF-8 BOM (U+FEFF) before JSON.parse. Windows tools (editors, formatters, Claude Code Edit hook) may prepend BOM to settings.json, which Bun rejects with "Unrecognized token", causing a silent fallback to defaults that breaks server-beta routing. 2. bun-runner.js: exempt lifecycle subcommands (start/stop/restart/status) from the #2188 empty-stdin guard. These subcommands manage the worker daemon and never consume stdin. Killing the child on empty stdin prevents the daemon from starting on platforms where Claude Code does not pipe a payload for SessionStart (e.g. Windows CC <= 2.1.145). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Cygwin/Windows with bash 5.3+, when the while loop inside the command-substitution pipeline breaks early, subsequent printf calls in the left-side block can produce "write error: Permission denied" (EPIPE manifesting as EACCES). Redirect stderr of the pipeline's feeder block to /dev/null so these harmless pipe-closure errors don't surface as hook errors to the user. All 7 inline hook commands (Setup, SessionStart x2, UserPromptSubmit, PostToolUse, PreToolUse, Stop) use the same pipeline pattern and are fixed identically.
@modelcontextprotocol/sdk validates tool schemas with Ajv. mcp-server.cjs
bundles the SDK, but Ajv emits compiled validators that require their
runtime helpers by package path (e.g. require("ajv/dist/runtime/equal")),
which esbuild leaves external. These packages were never declared in the
generated plugin/package.json, so `bun install` in the plugin cache never
fetched them — and the MCP server crashes resolving "ajv/..." the moment a
tool is called (notably on Windows, where the cache had a fresh install).
Add ajv and ajv-formats to the generated plugin dependencies, plus a build
guardrail that fails the build if mcp-server.cjs requires either package
without it being declared, mirroring the existing zod guardrail.
When the host auto-upgrades the plugin, it creates a fresh version directory in the plugin cache without node_modules. The worker and MCP server then fail to resolve their runtime deps (zod, ajv via the MCP SDK, tree-sitter grammars) and claude-mem silently stops working until the user notices the upgrade hint and runs the install command manually — a recurring papercut on Windows. The Setup-phase version-check now kicks off a detached, throttled `bun install` in the resolved plugin root when node_modules is absent. Properties: - No-op on healthy installs (node_modules present) — zero behavior change for users who are not in the broken post-upgrade state. - Detached + unref so the Setup hook stays non-blocking; deps land within ~1 min. - 10-minute cooldown marker so a running or repeatedly-failing install does not respawn every session. - shell:true so the host PATH resolves `bun` (bun.cmd on Windows, bun on Unix). - Fully best-effort: any failure is swallowed and the existing upgrade hint still documents the manual recovery command.
- Store the .deps-install-attempted cooldown marker in the version-specific plugin root instead of the global data dir. Otherwise a second upgrade landing within the 10-minute window would be suppressed and left without node_modules. `root` is writable (bun install targets it). - Close the inherited log file descriptor in the parent after spawning the detached child, so the child is its sole owner.
cmd.exe interprets shell-metacharacters in its /c argument string BEFORE handing them to uvx, and Node's arg-array quoting does not survive that re-parse (cmd.exe strips the outer quote pair from /c per the documented parser behavior). The chroma-mcp command built by buildCommandArgs() contains `--with onnxruntime>=1.20` and `--with protobuf<7`, which cmd.exe parses as stdout redirection to file `=1.20` and stdin redirection from file `7` respectively. uvx is invoked with truncated args, exits with "The system cannot find the file specified", and Chroma sync fails with "MCP error -32000: Connection closed" on every observation on Windows. uvx ships as uvx.exe on Windows (real PE executable, not a .cmd shim per the uv installer), so Node's child_process can spawn it directly via CreateProcess without a shell wrapper. Verified on Windows 11 + Node 22.22.2 + Bun 1.3.14 + uv 0.11.8: after the patch, `/api/chroma/status?deep=1` reports status=healthy with round-trip semantic search succeeding in ~650ms.
…leakage Save the original ComSpec value before setting it for Windows simulation and restore/delete it in afterAll, keeping env isolation symmetric with the existing platform rollback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # plugin/scripts/worker-service.cjs # src/services/sync/ChromaMcpManager.ts
# Conflicts: # src/services/sync/ChromaMcpManager.ts
# Conflicts: # plugin/scripts/bun-runner.js # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs # src/services/infrastructure/ProcessManager.ts # src/supervisor/process-registry.ts
# Conflicts: # plugin/scripts/server-beta-service.cjs # plugin/scripts/worker-service.cjs # plugin/ui/viewer-bundle.js
# Conflicts: # plugin/hooks/hooks.json
# Conflicts: # plugin/.mcp.json
Rebuilt plugin/scripts/*.cjs and viewer-bundle.js via `npm run build` so the generated artifacts reflect the merged source from all bundled Windows PRs (during conflict resolution the stale bundles were taken verbatim, then regenerated here). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Greptile SummaryThis canary PR bundles 12 Windows-compatibility fixes into one combined branch for collective testing before individual merges: it replaces the
Confidence Score: 3/5The ChromaMcpManager, BOM stripping, and captureProcessStartToken changes look correct and well-tested, but the DATA_DIR hardcoding in ProcessManager.ts will silently misdirect PID file and migration-marker writes for any user who has configured CLAUDE_MEM_DATA_DIR, causing the worker health check to permanently report the worker as dead on those setups. The MCP launcher rewrite and uvx direct-spawn are well-reasoned and the new test coverage for the Windows spawn path is solid. ProcessManager.ts no longer calls paths.dataDir(), which in the original code read CLAUDE_MEM_DATA_DIR from the environment and settings file before falling back to ~/.claude-mem — the hardcoded replacement creates a split that makes the daemon appear dead for users with non-default data directories. The PowerShell & operator in the hooks installers is also applied unconditionally by OS rather than by the shell the target tool actually uses, which could break Windows Gemini CLI or Windsurf users who invoke hooks outside PowerShell. src/services/infrastructure/ProcessManager.ts (DATA_DIR hardcoding) and src/services/integrations/WindsurfHooksInstaller.ts + GeminiCliHooksInstaller.ts (unconditional PowerShell call operator) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Claude Code spawns MCP server] --> B["node -e inline-JS launcher"]
B --> C{Search plugin root candidates}
C -->|Found| D["spawn mcp-server.cjs with stdio inherit"]
C -->|Not found| E["stderr error + exit 1"]
D --> F[Forward SIGTERM SIGINT SIGHUP to child]
D --> G{Child exits}
G -->|signal| H["re-raise signal on self"]
G -->|code| I["process.exit(code)"]
subgraph ChromaMcpManager spawn path
J["connectInternal()"] --> K["spawn uvx directly on all platforms"]
K --> L["argv: --with onnxruntime>=1.20 --with protobuf<7"]
L --> M["No cmd.exe redirection misparse"]
end
subgraph DATA_DIR regression
N["paths.dataDir() - honours CLAUDE_MEM_DATA_DIR"] -.removed.-> O["Hardcoded ~/.claude-mem"]
O --> P["PID file + migration markers go to wrong path for custom data dirs"]
end
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
src/services/infrastructure/ProcessManager.ts:23-25
**DATA_DIR hardcoding drops custom-data-directory support**
`paths.dataDir()` (previously used here) reads `CLAUDE_MEM_DATA_DIR` from the environment first, then falls back to a value from `settings.json`, then falls back to `~/.claude-mem`. The replacement hardcodes only the last fallback. Any user who has set `CLAUDE_MEM_DATA_DIR` (via env var or `settings.json`) will now have their PID file, migration marker files, and DB backup paths silently written to `~/.claude-mem` instead of the configured directory. The practical effect is that `validateWorkerPidFile` reads from a path that the worker never writes to, so the worker appears perpetually dead and the caller may spawn duplicate instances or refuse to start.
### Issue 2 of 3
src/services/integrations/WindsurfHooksInstaller.ts:144-145
**`& ` call operator assumes PowerShell; breaks cmd.exe invocation**
The `& ` prefix is PowerShell-specific: it is the call operator required to invoke a quoted executable path as a command in PowerShell. If Windsurf (or Gemini CLI in the parallel change) executes hook commands via `cmd.exe` instead of PowerShell on Windows, the leading `& ` will be interpreted as a command separator in `cmd.exe` and the hook will fail silently. The condition `process.platform === 'win32'` is applied at install time without checking which shell the target tool actually uses to invoke hooks. Confirm that Windsurf and Gemini CLI always route hook commands through PowerShell on Windows before treating this as unconditionally safe.
### Issue 3 of 3
plugin/.mcp.json:8
**MCP launcher exit sentinel 128 on Windows signal re-raise failure**
The launcher forwards `SIGTERM`, `SIGINT`, and `SIGHUP` to the child. The exit handler calls `process.kill(process.pid, sig)` when the child exits due to a signal. If this throws on Windows (where many POSIX signals are unsupported), the catch branch calls `process.exit(128)` — a non-zero, non-standard exit code. Callers that check the launcher's exit code for success/failure may misinterpret 128 as an error, even though the child itself may have exited cleanly.
Reviews (1): Last reviewed commit: "build: regenerate bundles from merged Wi..." | Re-trigger Greptile |
| // Standard paths for PID file management | ||
| const DATA_DIR = path.join(homedir(), '.claude-mem'); | ||
| const PID_FILE = path.join(DATA_DIR, 'worker.pid'); |
There was a problem hiding this comment.
DATA_DIR hardcoding drops custom-data-directory support
paths.dataDir() (previously used here) reads CLAUDE_MEM_DATA_DIR from the environment first, then falls back to a value from settings.json, then falls back to ~/.claude-mem. The replacement hardcodes only the last fallback. Any user who has set CLAUDE_MEM_DATA_DIR (via env var or settings.json) will now have their PID file, migration marker files, and DB backup paths silently written to ~/.claude-mem instead of the configured directory. The practical effect is that validateWorkerPidFile reads from a path that the worker never writes to, so the worker appears perpetually dead and the caller may spawn duplicate instances or refuse to start.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/infrastructure/ProcessManager.ts
Line: 23-25
Comment:
**DATA_DIR hardcoding drops custom-data-directory support**
`paths.dataDir()` (previously used here) reads `CLAUDE_MEM_DATA_DIR` from the environment first, then falls back to a value from `settings.json`, then falls back to `~/.claude-mem`. The replacement hardcodes only the last fallback. Any user who has set `CLAUDE_MEM_DATA_DIR` (via env var or `settings.json`) will now have their PID file, migration marker files, and DB backup paths silently written to `~/.claude-mem` instead of the configured directory. The practical effect is that `validateWorkerPidFile` reads from a path that the worker never writes to, so the worker appears perpetually dead and the caller may spawn duplicate instances or refuse to start.
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| return `"${bunPath}" "${workerServicePath}" hook windsurf ${hookCommand}`; | ||
| const callOperator = process.platform === 'win32' ? '& ' : ''; |
There was a problem hiding this comment.
& call operator assumes PowerShell; breaks cmd.exe invocation
The & prefix is PowerShell-specific: it is the call operator required to invoke a quoted executable path as a command in PowerShell. If Windsurf (or Gemini CLI in the parallel change) executes hook commands via cmd.exe instead of PowerShell on Windows, the leading & will be interpreted as a command separator in cmd.exe and the hook will fail silently. The condition process.platform === 'win32' is applied at install time without checking which shell the target tool actually uses to invoke hooks. Confirm that Windsurf and Gemini CLI always route hook commands through PowerShell on Windows before treating this as unconditionally safe.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/integrations/WindsurfHooksInstaller.ts
Line: 144-145
Comment:
**`& ` call operator assumes PowerShell; breaks cmd.exe invocation**
The `& ` prefix is PowerShell-specific: it is the call operator required to invoke a quoted executable path as a command in PowerShell. If Windsurf (or Gemini CLI in the parallel change) executes hook commands via `cmd.exe` instead of PowerShell on Windows, the leading `& ` will be interpreted as a command separator in `cmd.exe` and the hook will fail silently. The condition `process.platform === 'win32'` is applied at install time without checking which shell the target tool actually uses to invoke hooks. Confirm that Windsurf and Gemini CLI always route hook commands through PowerShell on Windows before treating this as unconditionally safe.
How can I resolve this? If you propose a fix, please make it concise.| "-c", | ||
| "_C=\"${CLAUDE_CONFIG_DIR:-$HOME/.claude}\"; _E=\"${CLAUDE_PLUGIN_ROOT:-${PLUGIN_ROOT:-}}\"; _P=$({ [ -n \"$_E\" ] && printf '%s\\n' \"$_E\"; printf '%s\\n' \"$PWD/plugin\" \"$PWD\"; ls -dt \"$HOME/.codex/plugins/cache/claude-mem-local/claude-mem\"/[0-9]*/ \"$HOME/.codex/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ \"$_C/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ 2>/dev/null; printf '%s\\n' \"$_C/plugins/marketplaces/thedotmack/plugin\"; } | while IFS= read -r _R; do [ -d \"$_R/plugin/scripts\" ] && _Q=\"$_R/plugin\" || _Q=\"$_R\"; [ -f \"$_Q/scripts/mcp-server.cjs\" ] && { printf '%s\\n' \"$_Q\"; break; }; done); [ -n \"$_P\" ] || { echo \"claude-mem: mcp server not found\" >&2; exit 1; }; exec node \"$_P/scripts/mcp-server.cjs\"" | ||
| "-e", | ||
| "const f=require('fs'),p=require('path'),o=require('os'),c=require('child_process');const h=o.homedir();const C=process.env.CLAUDE_CONFIG_DIR||p.join(h,'.claude');const E=process.env.CLAUDE_PLUGIN_ROOT||process.env.PLUGIN_ROOT||'';const d=process.cwd();const L=x=>{try{return f.readdirSync(x).filter(n=>/^\\d/.test(n)).map(n=>p.join(x,n)).filter(z=>{try{return f.statSync(z).isDirectory()}catch{return false}}).sort((a,b)=>f.statSync(b).mtimeMs-f.statSync(a).mtimeMs)}catch{return[]}};const K=[E,p.join(d,'plugin'),d,...L(p.join(h,'.codex/plugins/cache/claude-mem-local/claude-mem')),...L(p.join(h,'.codex/plugins/cache/thedotmack/claude-mem')),...L(p.join(C,'plugins/cache/thedotmack/claude-mem')),p.join(C,'plugins/marketplaces/thedotmack/plugin')].filter(Boolean);let R=null;for(const k of K){const r=f.existsSync(p.join(k,'plugin','scripts'))?p.join(k,'plugin'):k;if(f.existsSync(p.join(r,'scripts','mcp-server.cjs'))){R=r;break}}if(!R){process.stderr.write('claude-mem: mcp server not found\\n');process.exit(1)}const ch=c.spawn(process.execPath,[p.join(R,'scripts','mcp-server.cjs')],{stdio:'inherit'});for(const s of ['SIGTERM','SIGINT','SIGHUP'])process.on(s,()=>{try{ch.kill(s)}catch{}});ch.on('exit',(code,sig)=>{if(sig){process.removeAllListeners(sig);try{process.kill(process.pid,sig)}catch{process.exit(128)}}else process.exit(code==null?0:code)})" |
There was a problem hiding this comment.
MCP launcher exit sentinel 128 on Windows signal re-raise failure
The launcher forwards SIGTERM, SIGINT, and SIGHUP to the child. The exit handler calls process.kill(process.pid, sig) when the child exits due to a signal. If this throws on Windows (where many POSIX signals are unsupported), the catch branch calls process.exit(128) — a non-zero, non-standard exit code. Callers that check the launcher's exit code for success/failure may misinterpret 128 as an error, even though the child itself may have exited cleanly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: plugin/.mcp.json
Line: 8
Comment:
**MCP launcher exit sentinel 128 on Windows signal re-raise failure**
The launcher forwards `SIGTERM`, `SIGINT`, and `SIGHUP` to the child. The exit handler calls `process.kill(process.pid, sig)` when the child exits due to a signal. If this throws on Windows (where many POSIX signals are unsupported), the catch branch calls `process.exit(128)` — a non-zero, non-standard exit code. Callers that check the launcher's exit code for success/failure may misinterpret 128 as an error, even though the child itself may have exited cleanly.
How can I resolve this? If you propose a fix, please make it concise.|
Hi @thedotmack — thanks for bundling these together, the canary approach makes a lot of sense given the overlap. My PRs in the canary#2593 (cmd.exe metachar escaping) — agree that direct-spawn (#2502/#2668) is the better path. Happy for #2593 to sit as a no-op fallback; if direct-spawn works across environments, we can close it. #2594 (node -e MCP launcher) — glad this was picked as the launcher approach. One note on Greptile's Issue 3 below. #2598 (printenv-first PATH probe) — also just pushed a follow-up fix ( #2599 (Windows captureProcessStartToken) — straightforward, no concerns. On the Greptile issuesIssue 1 (DATA_DIR hardcoding in ProcessManager.ts) — this looks like a real regression. Issue 2 (PowerShell Issue 3 (exit code 128 on Windows signal re-raise) — from my #2594. The Happy to help address Issues 1 and 3 if that would be useful — just let me know. |
|
Windows test report (posted via Claude Code agent on behalf of @bigmanBass666) Environment:
Issue reproduced:
Workaround confirmed working: {
"mcpServers": {
"mcp-search": {
"type": "stdio",
"command": "node",
"args": ["${CLAUDE_PLUGIN_ROOT}/scripts/mcp-server.cjs"]
}
}
}MCP server starts successfully after this change. Worker health endpoint at This confirms PR #2489's approach is correct. The Note: The worker was still running 12.1.0 at test time (pre-restart). After a full Claude Code restart the SessionStart hook should bring up the 13.3.0 worker. The ✅ |
|
✅ Windows test report — @CaTeIM Environment: Windows 11 Pro 10.0.28000 · node v24.15.0 · bun 1.3.14 · uvx at Setup
End-to-end checks
Test suite (
|
|
Follow-up after live restart — confirming #2654 in production: after switching my Claude Code session to the canary plugin (cache cc @Bruce583102198 — your fix is doing exactly what it claims. |
Windows 11 Home SL 26200 — full triple-shell report (@alessandropcostabr)Environment: Windows 11 Home SL 10.0.26200.8457 · node v24.11.1 · bun 1.3.14 · git 2.53 · Claude Code 2.1.145 DeployPlugin Discriminator counts in the deployed bundle
Per-shell results
Live capture E2EWorker uptime 30+ min post-restart, 0 ERROR / 0 FATAL / 0 server-beta-fallback in the 74-line log. Pipeline Worker invocation: Bonus — Codex CLI MCP integration confirmedWired Non-blocking note for maintainers running server-beta behind a proxyThe deployed client bundle uses I opened #2719 to make the server middleware accept Net resultCanary cc @thedotmack @CaTeIM @YOMXXX @Bruce583102198 🤖 Generated with Claude Code |
…leware (#2719) * fix(server-beta): accept X-Api-Key as fallback to Bearer in auth middleware The server-beta auth middleware (`requireServerAuth` and `requirePostgresServerAuth`) only reads `Authorization: Bearer <key>`. Clients using `@better-auth/api-key` default config send raw keys via the `X-Api-Key` header — including the worker bundle shipped from the Windows-canary line (PR #2699). The mismatch produces: POST /v1/events X-Api-Key: cmem_... → 401 {"error":"Unauthorized","message":"Missing bearer API key"} Direct-LAN setups don't hit this because no proxy strips/normalizes headers and the worker's own retry path eventually round-trips; remote setups behind a proxy (Cloudflare → Express in our case) fail closed and never capture. Fix: middleware accepts either header. Bearer remains canonical so existing clients are unaffected; `X-Api-Key` is the fallback when Bearer is absent. Same change applied to the bun:sqlite-backed `auth.ts` and the Postgres-backed `postgres-auth.ts` (the latter is what server-beta actually uses). Tests added in `tests/server/auth-api-key.test.ts`: - accepts X-Api-Key when Bearer absent (positive) - prefers Bearer when both present (defense-in-depth) - rejects requests with neither header (negative regression) No new test for postgres-auth.ts (no existing suite that wires the PG pool); the parser logic is byte-identical to auth.ts so the bun:sqlite suite covers the behavioral contract. Verified end-to-end against a deployed server-beta: curl -X POST https://mem.late.app.br/v1/events \ -H 'X-Api-Key: cmem_...' -d '{"events":[]}' # before: 401 "Missing bearer API key" # after: 400 ValidationError (auth ok, body invalid — expected) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server-beta): update 401 message to mention both accepted auth headers Addresses Greptile P2 feedback on #2719: the 401 body still said "Missing bearer API key" after the middleware started accepting X-Api-Key, which made debugging harder for clients sending only X-Api-Key with a typo'd header name. Message now lists both accepted forms in both auth.ts and postgres-auth.ts, keeping the two backends consistent. Test that asserts the body in the both-absent rejection path updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Alessandro Costa <alessandro@claudio.dev> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Independent Windows validation: #2806 fixed the PATH gap, but the cmd.exe quoting defect this PR targets still breaks semantic search on a clean v13.4.0 / latest main. Flagging this so closing #2790 isn't read as "Windows Chroma fixed" — there are two distinct root causes here:
Minimal repro on Windows 11, claude-mem 13.4.0, uvx 0.9.11, Python 3.13.9, with uvx confirmed on PATH (i.e. this is post-#2806 behavior — the PATH precondition is already satisfied, which isolates the failure to the >/< quoting, not PATH): (A) Direct spawn — exactly what this PR does — WORKS:(B) cmd.exe /c uvx — what main's connectInternal still does — FAILS in ~270ms:Same args, same machine — the only difference is the cmd.exe wrapper, and it dies in ~270 ms on the redirection. So this PR (drop the cmd.exe wrapper, spawn uvx directly) is the actual fix for the quoting defect, and #2806 is the complementary PATH half. With #2806 now merged, the PATH confound is gone, so this is the remaining blocker for Windows semantic search. The corresponding issues (#2716, #2667) are currently closed as not planned despite the defect still reproducing on latest main. Happy to check out this branch, build it, and confirm Chroma connects end-to-end on Windows 11 if that helps land it. |
🪟 Windows Canary — combined build/test branch for all open Windows fixes
We have a large cluster of open Windows PRs that each fix a different piece of the same cross-platform story (spawn, hooks, chroma, mcp launcher, supervisor, settings, runtime deps). They overlap heavily — several touch the same launcher strings and the same
ChromaMcpManagerspawn path — so merging them one-by-one keeps re-introducing conflicts and nobody can see them working together.This branch bundles them into one canary so we can validate the combined result.
✅ PRs bundled in this canary (12)
mcp-searchvianodeso Windows can spawn itcmd.exewrap on Windowsuvxspawn (chroma)&) to hook commandscmd.exemetacharacters inuvxargsshlauncher with cross-platformnode -e(#2461)$SHELL -lcPATH probe withprintenv-firstcaptureProcessStartTokenactually return a token (#2578)settings.json+ lifecycle stdin-guard exemptionuvxdirectly on Windows instead of viacmd.exe(#2667)🔀 Conflict decisions (please sanity-check these)
These PRs proposed competing approaches to the same problem — only one can be active at a time. Here's what the canary currently uses and what got superseded:
uvxspawn — three PRs, two approaches:uvxdirectly (nocmd.exewrap). ← canary uses this.cmd.exewrap but escapes metacharacters (^<>&|,%%). Merged as a no-op since direct-spawn makes the wrap unnecessary. @YOMXXX — if direct-spawn doesn't work on your Windows setup, this is the fallback; please test.mcp-server.cjsin-process viarequire(superseded, but fix(mcp): launch mcp-search via node so Windows can spawn it #2489 still contributes the node move + its test).printenv-first probe + fix: suppress printf write errors when pipe breaks in inline bash hooks #2654's2>/dev/nullpipe-suppression are combined (complementary).🚫 Deliberately excluded (need a rebase or a decision)
watcher.tshalf (Windows transcript-append fix) is clean, but theSessionRoutes.tshalf collides withmain's since-mergedhandleGeneratorExit/GeneratorExitHandlerrefactor. The [Bug] Codex transcript ingestion on Windows finds no/late items and can strand pending observations #2192 in-flight-requeue fix is not present in the new handler and must be re-ported there. Needs a rebase ontomainbefore it can join.echoinstead ofprintf '%s\n'in hook subshells (Windows (Git Bash): printf write error Permission denied blocks UserPromptSubmit hook #2439). Superseded by fix(hooks): replace fragile $SHELL -lc PATH probe with printenv-first pattern #2598'sprintenv-first probe already in the canary; its tests assert the old launcher format. Ifprintf '%s\n'genuinely misbehaves on your shell, please say so — that would override fix(hooks): replace fragile $SHELL -lc PATH probe with printenv-first pattern #2598.2>/dev/nullpipe-suppression already in the canary. Please confirm fix: suppress printf write errors when pipe breaks in inline bash hooks #2654 fully resolves your case, or what's still leaking.🧪 Verification done locally (macOS — Windows still needs you)
npm run build✅ all bundles regenerate from merged sourcetsc --noEmit: 24 errors — identical toorigin/main; 0 introduced by this merge (the pre-existing TS baseline is fixed by Fix upstream TypeScript baseline #2511)bun test: 54 failures — identical toorigin/main; 0 introduced by this merge (pre-existing test baseline is fixed by Fix upstream test baseline regressions #2512)tests/infrastructure,tests/services/sync): 186 pass / 0 failIn other words: this branch adds the Windows fixes without regressing the baseline. The remaining red is pre-existing and owned by #2511/#2512.
Notes
CHANGELOG.mdintentionally not touched (auto-generated).🤖 Generated with Claude Code