Skip to content

🪟 Windows Canary: combined build/test of all open Windows fixes#2699

Open
thedotmack wants to merge 35 commits into
mainfrom
windows-canary
Open

🪟 Windows Canary: combined build/test of all open Windows fixes#2699
thedotmack wants to merge 35 commits into
mainfrom
windows-canary

Conversation

@thedotmack

Copy link
Copy Markdown
Owner

🪟 Windows Canary — combined build/test branch for all open Windows fixes

We have a large cluster of open Windows PRs that each fix a different piece of the same cross-platform story (spawn, hooks, chroma, mcp launcher, supervisor, settings, runtime deps). They overlap heavily — several touch the same launcher strings and the same ChromaMcpManager spawn path — so merging them one-by-one keeps re-introducing conflicts and nobody can see them working together.

This branch bundles them into one canary so we can validate the combined result.

⚠️ Calling all Windows contributors

@CaTeIM @ousamabenyounes @sabinss4 @YOMXXX @alessandropcostabr @Bruce583102198 @PulseCheckAI — your Windows PRs are bundled here.

I (@thedotmack) do not have a Windows machine, so I genuinely cannot test this myself — we have to work this one out together. Please:

  1. git fetch origin && git checkout windows-canary
  2. npm ci (or bun install) then npm run build
  3. Install/point Claude Code at this build and exercise it on real Windows (PowerShell, Git Bash, and cmd.exe if you can)
  4. Report back in this thread: ✅ works / ❌ broke, with the shell + Windows version. Screenshots/logs of any console-window flashes, MCP error -32000, EPIPE, or path-resolution failures are gold.

If your fix needed adjustment to combine with the others (see "Conflict decisions" below), please confirm the resolution still solves your original issue.


✅ PRs bundled in this canary (12)

PR Author What it fixes
#2208 @thedotmack Restore Windows spawn (PR #751 re-apply) + Windows CI
#2489 @CaTeIM Launch mcp-search via node so Windows can spawn it
#2502 @ousamabenyounes Drop cmd.exe wrap on Windows uvx spawn (chroma)
#2507 @sabinss4 Add PowerShell call operator (&) to hook commands
#2593 @YOMXXX Escape cmd.exe metacharacters in uvx args
#2594 @YOMXXX Replace sh launcher with cross-platform node -e (#2461)
#2598 @YOMXXX Replace fragile $SHELL -lc PATH probe with printenv-first
#2599 @YOMXXX Make Windows captureProcessStartToken actually return a token (#2578)
#2636 @alessandropcostabr Strip BOM from settings.json + lifecycle stdin-guard exemption
#2654 @Bruce583102198 Suppress printf write errors when pipe breaks in inline bash hooks
#2658 @CaTeIM Restore plugin runtime deps after auto-upgrade
#2668 @PulseCheckAI Spawn uvx directly on Windows instead of via cmd.exe (#2667)

🔀 Conflict decisions (please sanity-check these)

These PRs proposed competing approaches to the same problem — only one can be active at a time. Here's what the canary currently uses and what got superseded:

🚫 Deliberately excluded (need a rebase or a decision)

🧪 Verification done locally (macOS — Windows still needs you)

  • npm run build ✅ all bundles regenerate from merged source
  • tsc --noEmit: 24 errors — identical to origin/main; 0 introduced by this merge (the pre-existing TS baseline is fixed by Fix upstream TypeScript baseline #2511)
  • bun test: 54 failures — identical to origin/main; 0 introduced by this merge (pre-existing test baseline is fixed by Fix upstream test baseline regressions #2512)
  • Windows/chroma/infra suites that this branch touches (tests/infrastructure, tests/services/sync): 186 pass / 0 fail

In other words: this branch adds the Windows fixes without regressing the baseline. The remaining red is pre-existing and owned by #2511/#2512.

Notes

  • Diffstat: 20 files, +1,572 / −606 (incl. regenerated bundles).
  • This is a canary, not a merge target — once Windows folks confirm what works, we'll land the validated fixes via their individual PRs (or fast-track them).
  • CHANGELOG.md intentionally not touched (auto-generated).

🤖 Generated with Claude Code

thedotmack and others added 30 commits April 29, 2026 13:19
Re-applies the PowerShell Start-Process -WindowStyle Hidden daemon spawn
that PR #751 (e6ae017) introduced and commit d13662d reverted. Also
fixes the bun-runner cmd /c popup, sets detached:false on Windows for
SDK subprocesses (so windowsHide actually works and claude.exe doesn't
outlive the worker), and adds windows-latest CI to prevent regression.

- ProcessManager.spawnDaemon: PowerShell -EncodedCommand branch back.
  Returns 0 sentinel on success — callers MUST use pid === undefined
  for failure detection, never falsy checks.
- bun-runner.js: drop "cmd /c" wrapper. shell:true lets Node resolve
  bun.cmd via PATHEXT and respects windowsHide (the explicit cmd.exe
  wrapper was popping a visible window per hook — #2150, #2186).
- process-registry.ts spawnSdkProcess: detached:false on Windows.
  Mixing detached:true with windowsHide:true is documented-undefined
  on Windows; with detached:false, windowsHide actually hides
  claude.exe and the SDK subprocess dies with the parent (#2190, #2198).
- .github/workflows/windows.yml: smoke test counts visible cmd windows
  before/after spawn + grep guard that the Start-Process branch survives.

WSL bash stdin (#2188) is acknowledged but deferred — the bash → node
pipe boundary needs a real Windows VM to test, beyond this PR's scope.
PTY for Claude CLI SDK mode (#2173, #2177) is also deferred per plan.

Closes #2150, #2169, #2186, #2187, #2190, #2198
Refs #2183 (Windows perf — same root cause)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
actions/setup-node@v4 cache: npm requires a package-lock.json and this
project uses Bun (only bunfig.toml exists at root). Drop the cache
directive, switch npm ci to npm install --no-audit --no-fund, and
narrow the build step to npm run build — build-and-sync also runs a
marketplace sync + worker restart that hardcodes ~/.claude/plugins,
which doesn't exist on CI.
…oke with static guards

CodeRabbit majors on #2208:

1. plugin/scripts/bun-runner.js — shell:true with separate spawnArgs
   triggers DEP0190 on Node 22+ and breaks paths/args containing
   spaces. Build a single fully-quoted command string (mirroring
   findBun()'s 'where bun' approach) and pass spawnArgs=[].

2. .github/workflows/windows.yml — the dynamic smoke step that counted
   visible cmd windows around 'claude-mem start' exits 1 on
   'claude-mem is not installed' before exercising the spawn path,
   AND PowerShell try/catch doesn't suppress native exit codes
   regardless. Replace with three static regression guards covering
   the exact patterns PR #2208 protects:
   - PowerShell Start-Process + WindowStyle Hidden in spawnDaemon
   - bun-runner shell:true with empty spawnArgs (DEP0190 guard)
   - windowsHide set on SDK spawn factory (issue #2190)
The "Anti-regression" steps grep ProcessManager.ts/bun-runner.js/process-registry.ts
for specific strings (Start-Process, WindowStyle Hidden, shell:true, windowsHide).
Tripwires aren't fixes — they make refactoring harder forever and verify nothing
the actual Windows build doesn't already verify. The npm install + npm run build
on windows-latest is the real guard.
ChromaMcpManager wrapped uvx as `cmd.exe /c uvx ...` on Windows.
cmd.exe reassembles argv into a single command line and re-parses
the dep-override `--with` values: `onnxruntime>=1.20` becomes
"redirect stdout to file `=1.20`" and `protobuf<7` becomes
"redirect stdin from file `7`". The Python subprocess never starts,
so every chroma-mcp tool call closed in ~30 ms with
`MCP error -32000: Connection closed`.

Node's child_process.spawn (via cross-spawn) resolves `uvx` ->
`uvx.exe` on PATH on Windows the same way it does on POSIX, so
the cmd.exe wrap is not needed. Drop it and call uvx directly on
all platforms.

Adds a regression test that pins the spawn command to `uvx` and
asserts the dep-override version specifiers reach uvx as argv
elements (not as cmd.exe redirection).

Co-Authored-By: Claude <noreply@anthropic.com>
On Windows, PowerShell interprets a quoted absolute path as a string
literal rather than an executable. Hook commands like:

  C:\path\to\bun.exe C:\path\to\worker-service.cjs hook ...

throw an exception because PowerShell never invokes them as a process.
Prefixing with the call operator (&) fixes this:

  & C:\path\to\bun.exe C:\path\to\worker-service.cjs hook ...

Apply the fix to all three hook installers: GeminiCliHooksInstaller,
CursorHooksInstaller, and WindsurfHooksInstaller. The call operator is
only added when process.platform === win32; Linux/macOS are unchanged.

Users already experiencing this issue can re-run the install command
to regenerate their settings.json with the corrected hook commands.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #2585, #2591, #2495, #2429, #2405.

On Windows the worker spawns chroma-mcp via
`cmd.exe /c uvx --with onnxruntime>=1.20 --with protobuf<7 chroma-mcp ...`.
cmd.exe interprets unquoted `<`, `>`, `&`, `|` in the command line as
redirection / pipe metacharacters before launching uvx, so the version
pins are parsed as I/O redirects to files named `7` / `=1.20`. The child
dies in ~50ms before the MCP handshake completes, surfacing as
`MCP error -32000: Connection closed` on every Chroma sync.

Node's child_process auto-quoting only triggers on whitespace, so these
metacharacters reach cmd.exe raw. Five separate issues report the same
symptom on Windows 10/11 (PowerShell, cmd, Git Bash).

Fix: prefix each cmd.exe metacharacter (`^ < > & |`) with cmd's own
escape character `^` so it's treated as literal. POSIX path is
unchanged; the escape only runs when `process.platform === 'win32'`.

Verification: a freshly-built worker-service.cjs contains
`p.replace(/([\^<>&|])/g,"^$1")` and the resulting command line is
`cmd.exe /c uvx --with onnxruntime^>=1.20 --with protobuf^<7 ...`,
which the issue reporter (#2585) confirmed lets chroma-mcp connect in
~6 seconds and backfill complete.

The escape character class includes `^` itself so an arg containing a
literal caret is escaped to `^^`, matching cmd.exe's quoting rules.
The mcp-search MCP server in plugin/.mcp.json launched via:

    "command": "sh",
    "args": ["-c", "_C=... ls -dt ... | while read ... exec node ..."]

On Windows, neither cmd nor PowerShell ships `sh`, and most users do
not have Git Bash on the system PATH that Claude Desktop / Claude Code
spawns from. Result: `/mcp` panel shows mcp-search failed with
ENOENT, blocking all mem-search skill usage on Windows. Same launcher
also breaks any host that strips PATH before spawning MCP servers.

Fix: rewrite the launcher as inline node -e. Node is guaranteed to be
on PATH for any host running claude-mem (the plugin's other scripts
require it). The new launcher preserves the exact same fallback search
order:

  1. CLAUDE_PLUGIN_ROOT / PLUGIN_ROOT env (host-injected)
  2. <cwd>/plugin
  3. <cwd>
  4. ~/.codex/plugins/cache/claude-mem-local/claude-mem/<version>/
     (mtime-descending; matches `ls -dt` semantics)
  5. ~/.codex/plugins/cache/thedotmack/claude-mem/<version>/
  6. <CLAUDE_CONFIG_DIR or ~/.claude>/plugins/cache/thedotmack/claude-mem/<version>/
  7. <CLAUDE_CONFIG_DIR or ~/.claude>/plugins/marketplaces/thedotmack/plugin

Both the codex and claude cache fallback paths are preserved as
literal strings so build-hooks.js's distribution check still passes.

Verification:
- JSON parses cleanly
- node --check on the inline JS reports no syntax errors
- Running the launcher logic on the repo cwd resolves
  PLUGIN_ROOT to <cwd>/plugin (same as the old sh script)
- npm run build's mcp-search assertion passes

No change to runtime behavior on macOS / Linux — node is the same
spawn target the old sh script ultimately exec'd into. The only
difference is the search-and-launch logic now runs in node itself
rather than in a sub-shell.
Addresses Greptile review on #2593.

cmd.exe expands `%VAR%` against the process environment even inside
`cmd /c`, so a user-supplied value passed via buildCommandArgs() that
happens to contain `%` characters — most plausibly `chromaApiKey`
when CLAUDE_MEM_CHROMA_API_KEY is set to a password-style string —
would be silently mangled: `%TOKEN%` becomes the value of $TOKEN
(or empty if unset). The result is an opaque auth failure that
looks unrelated to Windows quoting.

Fix: double `%` to `%%` before the caret pass. Order matters —
running the caret replacement first would convert `%` to `^%`
which cmd.exe then interprets as "literal %", but the original
intent (literal % staying literal) requires the canonical `%%`
form because escaping with `^` only works inside double-quoted
contexts (we're passing args via /c, not quoted).

Verified: rebuilt worker-service.cjs contains the chained
`p.replace(/%/g,"%%").replace(/([\^<>&|])/g,"^$1")`. Bundle size
unchanged. POSIX path still bypasses both replacements.
Addresses both points in the Greptile review on #2594.

The previous launcher had two POSIX regressions vs. the old
`exec node ...` approach:

1. Signal forwarding gap: the MCP host holds the launcher's PID,
   not mcp-server.cjs's. SIGTERM to the launcher would exit the
   launcher via Node's default handler but leave the child
   orphaned, holding its stdio descriptors / file locks.

2. Signal-death exit code: spawn's `exit` callback fires with
   `(code, signal)`. When the child dies via signal, `code` is
   null and `signal` is e.g. `'SIGTERM'`. The previous handler
   returned `0` (clean exit), masking the real cause from the
   host.

Fix:

- Register handlers for SIGTERM/SIGINT/SIGHUP on the launcher
  that forward to `ch.kill(sig)`. try/catch in case the child
  is already dead.
- In `ch.on('exit')`, when signal is non-null, deregister our
  own handler for that signal and re-raise it via
  `process.kill(process.pid, sig)`. The deregister step is
  critical — Node calls registered handlers instead of dying,
  so without it the launcher would catch the signal and
  process.exit(0) instead of inheriting the signal-death
  semantics (waitpid would see WIFEXITED instead of
  WIFSIGNALED, and the host would see exit code 0). The
  fallback `process.exit(128)` covers exotic platforms where
  re-raise is rejected.

Verification (sleep child standin):

    launcher=9617  child=9630
    OK: child died
    launcher exit=143 (expect 128+15=143)

POSIX hosts now see the launcher process die with the same
signal the host sent, matching the old `exec node` semantics.
Windows behavior unchanged (it never had the orphan problem —
the OS reaps the process tree).
… pattern

Closes #2589, #2588, #2461, #2439.

Six hooks in plugin/hooks/hooks.json (SessionStart×2, UserPromptSubmit,
PostToolUse, PreToolUse, Stop) start with:

    export PATH="$($SHELL -lc 'echo $PATH' 2>/dev/null):$PATH"; ...

Under Git Bash on Windows (MSYS bash 5.x), when stdin is not a TTY
(which it never is for a hook subprocess), `$SHELL -lc` returns
empty or fails in a way that bash interprets as a broken pipe,
producing the well-documented:

    /usr/bin/bash: line 1: printf: write error: Permission denied

cascading through every subsequent pipe in the hook command. Net
effect on Windows: every hook fires this stderr noise on every
invocation, and downstream `printf | while IFS=read -r` blocks
get spurious EPIPE/EACCES errors.

Fix: switch to the same battle-tested probe that
plugin/hooks/codex-hooks.json already uses:

    _HP=$(printenv PATH 2>/dev/null || true);
    if [ -z "$_HP" ] && [ -n "${SHELL:-}" ]; then
      _HP=$("$SHELL" -lc 'printf %s "$PATH"' 2>/dev/null || true);
    fi;
    _HP=$(printf '%s' "$_HP" | tr ' ' ':');
    export PATH="${_HP:+$_HP:}$PATH";

Three improvements vs. the broken version:

1. `printenv PATH` is the canonical way to read an environment
   variable from a POSIX shell — works under bash/zsh/dash/Git Bash,
   no TTY dependency, no pipe to break. The login-shell probe runs
   only as a fallback when printenv returns empty (which would only
   happen if PATH is genuinely unset).

2. `"$SHELL" -lc 'printf %s "$PATH"'` (quoted, with printf instead
   of echo) is the safer form: `echo` interprets backslashes in
   some shells, `printf %s` is unambiguous.

3. `tr ' ' ':'` on the captured PATH defends against Git Bash /
   MSYS quirks where a path containing literal spaces could break
   downstream PATH parsing.

The Setup hook (line 11) already uses a hand-rolled PATH that
includes nvm / brew / /usr/local — it doesn't share the broken
pattern and is intentionally left untouched.

This is the same probe codex-hooks.json (used by the Codex CLI
integration) has shipped with for some time — it's been validated
across the same matrix of shells we need to support here.

Verification:

  $ grep -c "SHELL -lc 'echo \$PATH'" plugin/hooks/hooks.json
  0
  $ grep -c "printenv PATH" plugin/hooks/hooks.json
  6
  $ node -e "JSON.parse(...)"  # JSON OK

  $ bash -c '_HP=$(printenv PATH ...); ... echo "len=$(echo $PATH | wc -c)"'
  PATH length=3626  contains-usr-local=1

Note on scope: this PR addresses the root cause (the broken
$SHELL -lc probe). PR #2526 silenced the downstream printf
EPIPE symptom by adding 2>/dev/null to the consumer pipe, but
the noise was still being generated. With this fix, the symptom
goes away entirely because the upstream probe no longer fails.
…n a token

Closes #2578.

Before this change, `captureProcessStartToken()` had:

    if (process.platform === 'win32') {
      return null;
    }

…which made `verifyPidFileOwnership()` short-circuit at:

    const currentToken = captureProcessStartToken(info.pid);
    if (currentToken === null) return true;   // <-- always on Windows

The net effect: on Windows, the PID-reuse defense was disabled. When
claude-mem's worker died and Windows reassigned its PID to an
unrelated process (issue #2578 reports SignalRgbLauncher.exe winning
the lottery), `claude-mem status` happily reported the worker as
healthy and `start` refused to launch a fresh one, leaving the user
with a permanent deadlock until they manually deleted `worker.pid`.

This commit lights up the Windows branch using PowerShell
`Get-Process -Id <pid>` and composes the start token from
`(StartTime.Ticks | ProcessName)`. The combination handles both
flavors of PID reuse:

  - **Same image restarted** — e.g. bun.exe crashed and the OS
    immediately assigned the freed PID to another bun.exe spawn.
    StartTime.Ticks (100-nanosecond resolution) differs.

  - **Different image inherits the PID** — the #2578 case where an
    unrelated app like SignalRgbLauncher.exe ends up with claude-mem's
    old PID. ProcessName differs.

`Get-Process` is preferred over `Get-CimInstance Win32_Process`
because it's ~3x faster on startup and works without elevation for
processes owned by the current user — which is the only case that
matters for a user-installed CC plugin.

Failure modes:

  - Process doesn't exist → PowerShell's `-ErrorAction Stop` throws,
    the catch returns empty output, captureProcessStartToken returns
    null, verifyPidFileOwnership falls back to "PID alive → trust"
    (matches the old behavior, no regression on this edge).

  - Get-Process succeeds but stdout is empty → return null (same
    fallback).

  - PowerShell spawn itself fails (extremely rare, e.g. ENOENT
    because powershell.exe is gone) → return null (same fallback).

Cost: ~300–500ms PowerShell warm-up per call. This function is on
the worker start / stop / status path, not a hot loop, so the
latency is acceptable.

POSIX paths (linux /proc, mac/BSD `ps -p $pid -o lstart=`) are
untouched.

Verification on macOS (POSIX branch unaffected):

  $ node -e 'const{spawnSync}=require("child_process"); console.log(spawnSync("ps",["-p",String(process.pid),"-o","lstart="],{encoding:"utf-8"}).stdout.trim())'
  Thu May 21 21:38:20 2026

Windows verification deferred to follow-up — the PowerShell command
itself is straight `Get-Process` so the failure mode if anything is
wrong is "captureProcessStartToken returns null", which is exactly
what the function returned before this PR. So the worst-case
regression on Windows is a no-op; the best case (and intended case)
is that PID reuse finally gets caught.
…Greptile)

Addresses the Greptile review on #2599.

Node.js's `spawnSync` reports failures via two channels:

  - `result.status` — set to null when the child never exited
    normally (timeout, killed by signal); set to the integer exit
    code otherwise.
  - `result.error` — set to the Error instance when the spawn itself
    failed (ENOENT, EACCES, etc.), even before the child ran.

The Windows branch (added in this PR) and the BSD/mac branch (long
standing) both only checked `result.status !== 0`. That works today
because `status` is null on every spawn failure — null !== 0 is true,
so the function falls through to `return null` correctly. But it
leaves an implicit assumption about Node's `spawnSync` semantics that
a reader has to reconstruct.

Updated both branches to the idiomatic `result.error || result.status
!== 0` guard. Semantically equivalent today; future-proof if Node
ever changes spawn-failure reporting; and makes the intent explicit
to the next person reading this code.

Both branches share the same one-line guard now (Windows + BSD/mac),
keeping them symmetric. Linux branch reads /proc directly so it
doesn't go through spawnSync.

esbuild minification preserves the new `||` guard — verified that the
rebuilt worker-service.cjs contains `r.error||r.status` and
`e.error||e.status` (esbuild renamed variables but kept the logic).

No behavior change on any platform; this is a defense-in-depth
clarification of the existing fallback path.
Claude Code spawns MCP servers WITHOUT a shell, so the `.mcp.json`
`command` is resolved against PATH directly. On Windows `sh` is not on
PATH (Git Bash is not guaranteed to be there), so `command: "sh"` fails
with ENOENT and the mcp-search server never starts.

Switch only the bundled plugin/.mcp.json launcher to `command: "node"`
(node is always present as the runtime) and perform the same plugin-root
resolution in JS, then require() the resolved mcp-server.cjs.

Hooks are intentionally left untouched: they run via "shell": "bash",
which Claude Code provides itself, so the maintainer's sh-based hook
launchers keep working on Windows. Only the shell-less MCP spawn needed
a node bootstrap.

Tests: the MCP-startup assertions now validate the node launcher; the
hook-command assertions remain unchanged (still sh).
…ands from #2188 stdin guard

Two defense-in-depth fixes for Windows environments:

1. SettingsDefaultsManager: strip UTF-8 BOM (U+FEFF) before JSON.parse.
   Windows tools (editors, formatters, Claude Code Edit hook) may prepend
   BOM to settings.json, which Bun rejects with "Unrecognized token",
   causing a silent fallback to defaults that breaks server-beta routing.

2. bun-runner.js: exempt lifecycle subcommands (start/stop/restart/status)
   from the #2188 empty-stdin guard. These subcommands manage the worker
   daemon and never consume stdin. Killing the child on empty stdin
   prevents the daemon from starting on platforms where Claude Code
   does not pipe a payload for SessionStart (e.g. Windows CC <= 2.1.145).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Cygwin/Windows with bash 5.3+, when the while loop inside the
command-substitution pipeline breaks early, subsequent printf calls
in the left-side block can produce "write error: Permission denied"
(EPIPE manifesting as EACCES). Redirect stderr of the pipeline's
feeder block to /dev/null so these harmless pipe-closure errors
don't surface as hook errors to the user.

All 7 inline hook commands (Setup, SessionStart x2, UserPromptSubmit,
PostToolUse, PreToolUse, Stop) use the same pipeline pattern and
are fixed identically.
@modelcontextprotocol/sdk validates tool schemas with Ajv. mcp-server.cjs
bundles the SDK, but Ajv emits compiled validators that require their
runtime helpers by package path (e.g. require("ajv/dist/runtime/equal")),
which esbuild leaves external. These packages were never declared in the
generated plugin/package.json, so `bun install` in the plugin cache never
fetched them — and the MCP server crashes resolving "ajv/..." the moment a
tool is called (notably on Windows, where the cache had a fresh install).

Add ajv and ajv-formats to the generated plugin dependencies, plus a build
guardrail that fails the build if mcp-server.cjs requires either package
without it being declared, mirroring the existing zod guardrail.
When the host auto-upgrades the plugin, it creates a fresh version directory
in the plugin cache without node_modules. The worker and MCP server then fail
to resolve their runtime deps (zod, ajv via the MCP SDK, tree-sitter grammars)
and claude-mem silently stops working until the user notices the upgrade hint
and runs the install command manually — a recurring papercut on Windows.

The Setup-phase version-check now kicks off a detached, throttled `bun install`
in the resolved plugin root when node_modules is absent. Properties:

- No-op on healthy installs (node_modules present) — zero behavior change for
  users who are not in the broken post-upgrade state.
- Detached + unref so the Setup hook stays non-blocking; deps land within ~1 min.
- 10-minute cooldown marker so a running or repeatedly-failing install does not
  respawn every session.
- shell:true so the host PATH resolves `bun` (bun.cmd on Windows, bun on Unix).
- Fully best-effort: any failure is swallowed and the existing upgrade hint still
  documents the manual recovery command.
- Store the .deps-install-attempted cooldown marker in the version-specific
  plugin root instead of the global data dir. Otherwise a second upgrade
  landing within the 10-minute window would be suppressed and left without
  node_modules. `root` is writable (bun install targets it).
- Close the inherited log file descriptor in the parent after spawning the
  detached child, so the child is its sole owner.
cmd.exe interprets shell-metacharacters in its /c argument string BEFORE
handing them to uvx, and Node's arg-array quoting does not survive that
re-parse (cmd.exe strips the outer quote pair from /c per the documented
parser behavior). The chroma-mcp command built by buildCommandArgs()
contains `--with onnxruntime>=1.20` and `--with protobuf<7`, which cmd.exe
parses as stdout redirection to file `=1.20` and stdin redirection from
file `7` respectively. uvx is invoked with truncated args, exits with
"The system cannot find the file specified", and Chroma sync fails with
"MCP error -32000: Connection closed" on every observation on Windows.

uvx ships as uvx.exe on Windows (real PE executable, not a .cmd shim
per the uv installer), so Node's child_process can spawn it directly
via CreateProcess without a shell wrapper.

Verified on Windows 11 + Node 22.22.2 + Bun 1.3.14 + uv 0.11.8: after
the patch, `/api/chroma/status?deep=1` reports status=healthy with
round-trip semantic search succeeding in ~650ms.
…leakage

Save the original ComSpec value before setting it for Windows simulation
and restore/delete it in afterAll, keeping env isolation symmetric with
the existing platform rollback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	plugin/scripts/worker-service.cjs
#	src/services/sync/ChromaMcpManager.ts
# Conflicts:
#	src/services/sync/ChromaMcpManager.ts
# Conflicts:
#	plugin/scripts/bun-runner.js
#	plugin/scripts/mcp-server.cjs
#	plugin/scripts/worker-service.cjs
#	src/services/infrastructure/ProcessManager.ts
#	src/supervisor/process-registry.ts
# Conflicts:
#	plugin/scripts/server-beta-service.cjs
#	plugin/scripts/worker-service.cjs
#	plugin/ui/viewer-bundle.js
thedotmack and others added 5 commits May 28, 2026 18:37
# Conflicts:
#	plugin/hooks/hooks.json
# Conflicts:
#	plugin/.mcp.json
Rebuilt plugin/scripts/*.cjs and viewer-bundle.js via `npm run build` so the
generated artifacts reflect the merged source from all bundled Windows PRs
(during conflict resolution the stale bundles were taken verbatim, then
regenerated here).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This canary PR bundles 12 Windows-compatibility fixes into one combined branch for collective testing before individual merges: it replaces the sh -c MCP launcher with a cross-platform node -e launcher, removes the cmd.exe /c uvx wrapper in ChromaMcpManager, adds PowerShell & call operator support in Windsurf/Gemini hook installers, implements a printenv-first PATH probe, strips UTF-8 BOM from settings.json, and fixes Windows captureProcessStartToken PID-reuse detection.

  • MCP launcher (plugin/.mcp.json): replaced a Bash sh -c inline script with a node -e inline JavaScript launcher that performs the same plugin-root search and spawns mcp-server.cjs with signal forwarding — eliminating the sh dependency on Windows.
  • ChromaMcpManager: dropped the Windows-only cmd.exe /c uvx … wrapper; uvx is now spawned directly on all platforms, preventing >= and < in --with arguments from being misinterpreted as I/O redirection by cmd.exe.
  • ProcessManager.ts: DATA_DIR / PID_FILE constants are now hardcoded to path.join(homedir(), '.claude-mem'), diverging from the paths.dataDir() / paths.workerPid() helpers that honour the CLAUDE_MEM_DATA_DIR env-var and settings-file override.

Confidence Score: 3/5

The ChromaMcpManager, BOM stripping, and captureProcessStartToken changes look correct and well-tested, but the DATA_DIR hardcoding in ProcessManager.ts will silently misdirect PID file and migration-marker writes for any user who has configured CLAUDE_MEM_DATA_DIR, causing the worker health check to permanently report the worker as dead on those setups.

The MCP launcher rewrite and uvx direct-spawn are well-reasoned and the new test coverage for the Windows spawn path is solid. ProcessManager.ts no longer calls paths.dataDir(), which in the original code read CLAUDE_MEM_DATA_DIR from the environment and settings file before falling back to ~/.claude-mem — the hardcoded replacement creates a split that makes the daemon appear dead for users with non-default data directories. The PowerShell & operator in the hooks installers is also applied unconditionally by OS rather than by the shell the target tool actually uses, which could break Windows Gemini CLI or Windsurf users who invoke hooks outside PowerShell.

src/services/infrastructure/ProcessManager.ts (DATA_DIR hardcoding) and src/services/integrations/WindsurfHooksInstaller.ts + GeminiCliHooksInstaller.ts (unconditional PowerShell call operator)

Important Files Changed

Filename Overview
src/services/infrastructure/ProcessManager.ts DATA_DIR hardcoded to ~/.claude-mem, dropping paths.dataDir() which honours CLAUDE_MEM_DATA_DIR; PID file, migration markers, and DB-backup paths will go to the wrong location for users with a custom data directory.
src/services/sync/ChromaMcpManager.ts Drops Windows cmd.exe/c uvx wrapper — direct uvx spawn on all platforms prevents redirection misparse of --with version specifiers; POSIX tree-kill logic added for orphan cleanup.
plugin/.mcp.json MCP launcher replaced from sh -c to node -e inline JS; cross-platform plugin-root search with signal forwarding to child process; removes sh dependency on Windows.
src/shared/SettingsDefaultsManager.ts BOM stripping added before JSON.parse to handle Windows-tool-generated settings.json files; correct and non-breaking.
src/supervisor/process-registry.ts Windows captureProcessStartToken now queries PowerShell Get-Process for StartTime.Ticks
src/services/integrations/WindsurfHooksInstaller.ts Adds PowerShell & call operator prefix on win32; correct for PowerShell hook execution but would break cmd.exe invocation if Windsurf uses that shell on Windows.
src/services/integrations/GeminiCliHooksInstaller.ts Same PowerShell & operator added as WindsurfHooksInstaller; same caveat applies — only safe if Gemini CLI always invokes hooks via PowerShell on Windows.
plugin/hooks/hooks.json PATH probe changed to printenv-first with $SHELL -lc fallback; reduces shell startup overhead and works on shells that don't support -lc flag.
tests/services/sync/chroma-mcp-manager-windows-spawn.test.ts New Windows spawn regression test verifying uvx is spawned directly without cmd.exe wrapper and version specifiers survive as argv elements.
tests/infrastructure/plugin-distribution.test.ts Updated to assert node-based MCP startup command contains expected JS content (CLAUDE_CONFIG_DIR, plugin root paths, mcp-server.cjs); passes against new .mcp.json.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Claude Code spawns MCP server] --> B["node -e inline-JS launcher"]
    B --> C{Search plugin root candidates}
    C -->|Found| D["spawn mcp-server.cjs with stdio inherit"]
    C -->|Not found| E["stderr error + exit 1"]
    D --> F[Forward SIGTERM SIGINT SIGHUP to child]
    D --> G{Child exits}
    G -->|signal| H["re-raise signal on self"]
    G -->|code| I["process.exit(code)"]

    subgraph ChromaMcpManager spawn path
        J["connectInternal()"] --> K["spawn uvx directly on all platforms"]
        K --> L["argv: --with onnxruntime>=1.20 --with protobuf<7"]
        L --> M["No cmd.exe redirection misparse"]
    end

    subgraph DATA_DIR regression
        N["paths.dataDir() - honours CLAUDE_MEM_DATA_DIR"] -.removed.-> O["Hardcoded ~/.claude-mem"]
        O --> P["PID file + migration markers go to wrong path for custom data dirs"]
    end
Loading

Fix All in Claude Code

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
src/services/infrastructure/ProcessManager.ts:23-25
**DATA_DIR hardcoding drops custom-data-directory support**

`paths.dataDir()` (previously used here) reads `CLAUDE_MEM_DATA_DIR` from the environment first, then falls back to a value from `settings.json`, then falls back to `~/.claude-mem`. The replacement hardcodes only the last fallback. Any user who has set `CLAUDE_MEM_DATA_DIR` (via env var or `settings.json`) will now have their PID file, migration marker files, and DB backup paths silently written to `~/.claude-mem` instead of the configured directory. The practical effect is that `validateWorkerPidFile` reads from a path that the worker never writes to, so the worker appears perpetually dead and the caller may spawn duplicate instances or refuse to start.

### Issue 2 of 3
src/services/integrations/WindsurfHooksInstaller.ts:144-145
**`& ` call operator assumes PowerShell; breaks cmd.exe invocation**

The `& ` prefix is PowerShell-specific: it is the call operator required to invoke a quoted executable path as a command in PowerShell. If Windsurf (or Gemini CLI in the parallel change) executes hook commands via `cmd.exe` instead of PowerShell on Windows, the leading `& ` will be interpreted as a command separator in `cmd.exe` and the hook will fail silently. The condition `process.platform === 'win32'` is applied at install time without checking which shell the target tool actually uses to invoke hooks. Confirm that Windsurf and Gemini CLI always route hook commands through PowerShell on Windows before treating this as unconditionally safe.

### Issue 3 of 3
plugin/.mcp.json:8
**MCP launcher exit sentinel 128 on Windows signal re-raise failure**

The launcher forwards `SIGTERM`, `SIGINT`, and `SIGHUP` to the child. The exit handler calls `process.kill(process.pid, sig)` when the child exits due to a signal. If this throws on Windows (where many POSIX signals are unsupported), the catch branch calls `process.exit(128)` — a non-zero, non-standard exit code. Callers that check the launcher's exit code for success/failure may misinterpret 128 as an error, even though the child itself may have exited cleanly.

Reviews (1): Last reviewed commit: "build: regenerate bundles from merged Wi..." | Re-trigger Greptile

Comment on lines +23 to +25
// Standard paths for PID file management
const DATA_DIR = path.join(homedir(), '.claude-mem');
const PID_FILE = path.join(DATA_DIR, 'worker.pid');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 DATA_DIR hardcoding drops custom-data-directory support

paths.dataDir() (previously used here) reads CLAUDE_MEM_DATA_DIR from the environment first, then falls back to a value from settings.json, then falls back to ~/.claude-mem. The replacement hardcodes only the last fallback. Any user who has set CLAUDE_MEM_DATA_DIR (via env var or settings.json) will now have their PID file, migration marker files, and DB backup paths silently written to ~/.claude-mem instead of the configured directory. The practical effect is that validateWorkerPidFile reads from a path that the worker never writes to, so the worker appears perpetually dead and the caller may spawn duplicate instances or refuse to start.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/infrastructure/ProcessManager.ts
Line: 23-25

Comment:
**DATA_DIR hardcoding drops custom-data-directory support**

`paths.dataDir()` (previously used here) reads `CLAUDE_MEM_DATA_DIR` from the environment first, then falls back to a value from `settings.json`, then falls back to `~/.claude-mem`. The replacement hardcodes only the last fallback. Any user who has set `CLAUDE_MEM_DATA_DIR` (via env var or `settings.json`) will now have their PID file, migration marker files, and DB backup paths silently written to `~/.claude-mem` instead of the configured directory. The practical effect is that `validateWorkerPidFile` reads from a path that the worker never writes to, so the worker appears perpetually dead and the caller may spawn duplicate instances or refuse to start.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment on lines 144 to +145

return `"${bunPath}" "${workerServicePath}" hook windsurf ${hookCommand}`;
const callOperator = process.platform === 'win32' ? '& ' : '';

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 & call operator assumes PowerShell; breaks cmd.exe invocation

The & prefix is PowerShell-specific: it is the call operator required to invoke a quoted executable path as a command in PowerShell. If Windsurf (or Gemini CLI in the parallel change) executes hook commands via cmd.exe instead of PowerShell on Windows, the leading & will be interpreted as a command separator in cmd.exe and the hook will fail silently. The condition process.platform === 'win32' is applied at install time without checking which shell the target tool actually uses to invoke hooks. Confirm that Windsurf and Gemini CLI always route hook commands through PowerShell on Windows before treating this as unconditionally safe.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/integrations/WindsurfHooksInstaller.ts
Line: 144-145

Comment:
**`& ` call operator assumes PowerShell; breaks cmd.exe invocation**

The `& ` prefix is PowerShell-specific: it is the call operator required to invoke a quoted executable path as a command in PowerShell. If Windsurf (or Gemini CLI in the parallel change) executes hook commands via `cmd.exe` instead of PowerShell on Windows, the leading `& ` will be interpreted as a command separator in `cmd.exe` and the hook will fail silently. The condition `process.platform === 'win32'` is applied at install time without checking which shell the target tool actually uses to invoke hooks. Confirm that Windsurf and Gemini CLI always route hook commands through PowerShell on Windows before treating this as unconditionally safe.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment thread plugin/.mcp.json
"-c",
"_C=\"${CLAUDE_CONFIG_DIR:-$HOME/.claude}\"; _E=\"${CLAUDE_PLUGIN_ROOT:-${PLUGIN_ROOT:-}}\"; _P=$({ [ -n \"$_E\" ] && printf '%s\\n' \"$_E\"; printf '%s\\n' \"$PWD/plugin\" \"$PWD\"; ls -dt \"$HOME/.codex/plugins/cache/claude-mem-local/claude-mem\"/[0-9]*/ \"$HOME/.codex/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ \"$_C/plugins/cache/thedotmack/claude-mem\"/[0-9]*/ 2>/dev/null; printf '%s\\n' \"$_C/plugins/marketplaces/thedotmack/plugin\"; } | while IFS= read -r _R; do [ -d \"$_R/plugin/scripts\" ] && _Q=\"$_R/plugin\" || _Q=\"$_R\"; [ -f \"$_Q/scripts/mcp-server.cjs\" ] && { printf '%s\\n' \"$_Q\"; break; }; done); [ -n \"$_P\" ] || { echo \"claude-mem: mcp server not found\" >&2; exit 1; }; exec node \"$_P/scripts/mcp-server.cjs\""
"-e",
"const f=require('fs'),p=require('path'),o=require('os'),c=require('child_process');const h=o.homedir();const C=process.env.CLAUDE_CONFIG_DIR||p.join(h,'.claude');const E=process.env.CLAUDE_PLUGIN_ROOT||process.env.PLUGIN_ROOT||'';const d=process.cwd();const L=x=>{try{return f.readdirSync(x).filter(n=>/^\\d/.test(n)).map(n=>p.join(x,n)).filter(z=>{try{return f.statSync(z).isDirectory()}catch{return false}}).sort((a,b)=>f.statSync(b).mtimeMs-f.statSync(a).mtimeMs)}catch{return[]}};const K=[E,p.join(d,'plugin'),d,...L(p.join(h,'.codex/plugins/cache/claude-mem-local/claude-mem')),...L(p.join(h,'.codex/plugins/cache/thedotmack/claude-mem')),...L(p.join(C,'plugins/cache/thedotmack/claude-mem')),p.join(C,'plugins/marketplaces/thedotmack/plugin')].filter(Boolean);let R=null;for(const k of K){const r=f.existsSync(p.join(k,'plugin','scripts'))?p.join(k,'plugin'):k;if(f.existsSync(p.join(r,'scripts','mcp-server.cjs'))){R=r;break}}if(!R){process.stderr.write('claude-mem: mcp server not found\\n');process.exit(1)}const ch=c.spawn(process.execPath,[p.join(R,'scripts','mcp-server.cjs')],{stdio:'inherit'});for(const s of ['SIGTERM','SIGINT','SIGHUP'])process.on(s,()=>{try{ch.kill(s)}catch{}});ch.on('exit',(code,sig)=>{if(sig){process.removeAllListeners(sig);try{process.kill(process.pid,sig)}catch{process.exit(128)}}else process.exit(code==null?0:code)})"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 MCP launcher exit sentinel 128 on Windows signal re-raise failure

The launcher forwards SIGTERM, SIGINT, and SIGHUP to the child. The exit handler calls process.kill(process.pid, sig) when the child exits due to a signal. If this throws on Windows (where many POSIX signals are unsupported), the catch branch calls process.exit(128) — a non-zero, non-standard exit code. Callers that check the launcher's exit code for success/failure may misinterpret 128 as an error, even though the child itself may have exited cleanly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: plugin/.mcp.json
Line: 8

Comment:
**MCP launcher exit sentinel 128 on Windows signal re-raise failure**

The launcher forwards `SIGTERM`, `SIGINT`, and `SIGHUP` to the child. The exit handler calls `process.kill(process.pid, sig)` when the child exits due to a signal. If this throws on Windows (where many POSIX signals are unsupported), the catch branch calls `process.exit(128)` — a non-zero, non-standard exit code. Callers that check the launcher's exit code for success/failure may misinterpret 128 as an error, even though the child itself may have exited cleanly.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

@YOMXXX

YOMXXX commented May 29, 2026

Copy link
Copy Markdown
Contributor

Hi @thedotmack — thanks for bundling these together, the canary approach makes a lot of sense given the overlap.

My PRs in the canary

#2593 (cmd.exe metachar escaping) — agree that direct-spawn (#2502/#2668) is the better path. Happy for #2593 to sit as a no-op fallback; if direct-spawn works across environments, we can close it.

#2594 (node -e MCP launcher) — glad this was picked as the launcher approach. One note on Greptile's Issue 3 below.

#2598 (printenv-first PATH probe) — also just pushed a follow-up fix (e060a674) removing tr ' ' ':' which was corrupting Windows paths containing spaces (e.g. /c/Program Files/nodejs). That fix should probably be pulled into the canary as well.

#2599 (Windows captureProcessStartToken) — straightforward, no concerns.

On the Greptile issues

Issue 1 (DATA_DIR hardcoding in ProcessManager.ts) — this looks like a real regression. paths.dataDir() honours CLAUDE_MEM_DATA_DIR from env/settings, so hardcoding ~/.claude-mem would silently break multi-profile setups. Probably worth restoring paths.dataDir() here.

Issue 2 (PowerShell & call operator) — this came from #2507. The concern is valid: & is PowerShell-specific and would break under cmd.exe. Worth confirming with @sabinss4 which shell Windsurf/Gemini CLI actually use for hook invocation on Windows.

Issue 3 (exit code 128 on Windows signal re-raise) — from my #2594. The process.exit(128) fallback only fires when process.kill(pid, sig) throws (Windows doesn't support most POSIX signals). I can change this to process.exit(0) since the child already exited — the launcher's job is done at that point. Will push a fix.

Happy to help address Issues 1 and 3 if that would be useful — just let me know.

@bigmanBass666

Copy link
Copy Markdown

Windows test report (posted via Claude Code agent on behalf of @bigmanBass666)

Environment:

  • Windows 11 Home China (22631)
  • Shell: Git Bash (MSYS2), PowerShell 5.1
  • claude-mem version: 13.3.0 (from installed_plugins.json)
  • Worker: 12.1.0 (stale, not yet restarted after upgrade)

Issue reproduced:

  • mcp-search MCP server fails to start — MCP error -32000
  • .mcp.json in 13.3.0 uses "command": "sh" with a pipe-based path resolver
  • Reproducing the exact sh -c command in Git Bash produces: printf: write error: Permission denied
  • Root cause: { producer; } | while read ... break; done — when consumer breaks early, remaining printf calls get EACCES on Windows MSYS emulated pipes (silent SIGPIPE on Linux/macOS)

Workaround confirmed working:
Patched 13.3.0/.mcp.json to use direct node invocation (same as 12.1.0):

{
  "mcpServers": {
    "mcp-search": {
      "type": "stdio",
      "command": "node",
      "args": ["${CLAUDE_PLUGIN_ROOT}/scripts/mcp-server.cjs"]
    }
  }
}

MCP server starts successfully after this change. Worker health endpoint at localhost:37777/api/health responds normally.

This confirms PR #2489's approach is correct. The sh launcher is fundamentally broken on Windows due to MSYS pipe emulation behavior. Using node directly bypasses the entire issue.

Note: The worker was still running 12.1.0 at test time (pre-restart). After a full Claude Code restart the SessionStart hook should bring up the 13.3.0 worker. The .mcp.json patch will need to be re-applied after each plugin auto-update until the canary lands.

node-based launcher works
sh -c pipe resolver fails on Windows (EACCES)

@CaTeIM

CaTeIM commented May 29, 2026

Copy link
Copy Markdown

✅ Windows test report — @CaTeIM

Environment: Windows 11 Pro 10.0.28000 · node v24.15.0 · bun 1.3.14 · uvx at ~/.local/bin/uvx
Shells exercised: PowerShell 7, Git Bash, cmd.exe (limited — see notes)

Setup

git checkout windows-canarynpm cinpm run build → all clean, no errors.

End-to-end checks

Area Result Detail
mcp-search launcher (PR #2594, supersedes my #2489) node -e bootstrap spawns mcp-server.cjs via child_process.spawn(process.execPath, …, {stdio:'inherit'}) with SIGTERM/INT/HUP forwarding. Real initialize handshake from PowerShell with UseShellExecute=false (same path Claude Code uses) returns proper serverInfo for claude-mem 13.3.0. Confirms the "Connection closed" failure from my #2489 is still resolved by #2594's approach.
Plugin runtime deps (my #2658 Fix A) plugin/package.json still declares ajv ^8.20.0 + ajv-formats ^3.0.1. Build guardrail preserved.
Auto-install on upgrade (my #2658 Fix B) ensureRuntimeDeps() in version-check.js tested across all three paths: broken (spawns detached bun install, marker written in the version root per the Greptile review fix, log opened), cooldown (second call within 10 min no-ops), healthy (node_modules present → no-op).
Chroma uvx direct-spawn (PR #2502/#2668) tests/services/sync/chroma-mcp-manager-windows-spawn.test.ts "spawns uvx directly on Windows (no cmd.exe /c wrapper)" passes (2/2 on canary; fails on main).

Test suite (tests/infrastructure + tests/services/sync)

Canary on Windows: 163 pass / 15 fail / 8 skip (12 unique failing test names after dedup) vs main on Windows: 165 pass / 13 fail / 8 skip (12 unique).

Name-only diff vs main (timings stripped):

Live evidence for #2654 / #2526 (EPIPE suppression)

Independent of the canary, my running Claude Code session — using the currently-installed (not canary) plugin — is generating the following on every PostToolUse observation hook right now:

/usr/bin/bash: line 1: printf: write error: Permission denied

This is exactly the EPIPE/EACCES leakage that #2654 suppresses with 2>/dev/null and that #2526 also targets. Strong real-world signal that these belong in the canary as-is.

Live deploy

After the isolated checks above, I pointed my Claude Code at the canary by copying the built plugin/ contents into the active plugin cache at ~/.claude/plugins/cache/thedotmack/claude-mem/13.3.0/ (the dir Claude Code actually reads — not the marketplace clone; that was the bit that tripped me up in #2489's earlier sessions). After the worker restart:

  • claude mcp list shows plugin:claude-mem:mcp-search as node -e <#2594 bootstrap with c.spawn + signal forwarding>✓ Connected.
  • Worker healthy on http://127.0.0.1:37777.
  • MCP initialize handshake against the cache's mcp-server.cjs returns proper serverInfo.

To pick this up in my live GUI session I'll do a full Claude Code restart.

Notes

  • cmd.exe shell test was inconclusive — my repro had quoting issues with paths containing spaces ("Google Drive"). PowerShell's UseShellExecute=false path is what Claude Code actually uses, so this is more representative than cmd.exe anyway.
  • I'm happy to re-run any specific shell scenario or attach raw logs — ping me.

🤖 Generated with Claude Code

@CaTeIM

CaTeIM commented May 29, 2026

Copy link
Copy Markdown

Follow-up after live restart — confirming #2654 in production: after switching my Claude Code session to the canary plugin (cache 13.3.0 deployed + restart), my PostToolUse observation hook no longer emits printf: write error: Permission denied. Same machine, same Git Bash, same hook flow — the only thing that changed is the bundled hook now suppresses the broken pipe with 2>/dev/null. Before the restart, every Bash tool call in this very session was producing the EPIPE; after the restart, zero occurrences across many subsequent calls.

cc @Bruce583102198 — your fix is doing exactly what it claims.

@alessandropcostabr

Copy link
Copy Markdown
Contributor

Windows 11 Home SL 26200 — full triple-shell report (@alessandropcostabr)

Environment: Windows 11 Home SL 10.0.26200.8457 · node v24.11.1 · bun 1.3.14 · git 2.53 · Claude Code 2.1.145

Deploy

Plugin marketplaces/thedotmack/plugin/ replaced byte-for-byte by canary bcf437fb's pre-built plugin/ (md5 C7C58737..., 4,031,405 bytes). bun install ran clean (706 packages); npm run build fails on @better-auth/kysely-adapterkysely drift — irrelevant since the canary commit already ships regenerated bundles. Backup kept at plugin.bak-pre-canary-... (md5 744A9AB5..., the previous 23/05 build).

Discriminator counts in the deployed bundle

Marker Count Source
/api/context/inject (canary client path) 7
/v1/context/inject (legacy) 0 (intentionally removed)
isTTY stdin guard 1 #2188
SessionStart lifecycle exempt 1 #2636
node -e MCP launcher (.mcp.json) #2594
BOM strip in SettingsDefaultsManager present #2636
@better-auth/api-key (x-api-key default) 1 (see note below)

Per-shell results

Shell claude mcp list Live tool calls printf write error MCP error -32000 Console flash
PowerShell 7 claude-mem ✓ Connected (+ 6 others) observations STORED (obsId 34634) 0 0 none
Git Bash (MSYS) claude-mem ✓ Connected session registered, tool calls captured 0 (confirms #2654) 0 none
cmd.exe (interactive) claude-mem ✓ Connected (all 7 servers) claude --version2.1.145 0 0 none

Live capture E2E

Worker uptime 30+ min post-restart, 0 ERROR / 0 FATAL / 0 server-beta-fallback in the 74-line log. Pipeline notebook → CF → Mach10 → Express → BullMQ → Redis → Qdrant flowing: 68 events / 10 min acknowledged server-side, with observations generated downstream.

Worker invocation:

"C:\Users\aless\.bun\bin\bun.exe" C:\Users\aless\.claude\plugins\marketplaces\thedotmack\plugin\scripts\worker-service.cjs --daemon

Bonus — Codex CLI MCP integration confirmed

Wired [mcp_servers.claude-mem] into ~/.codex/config.toml pointing at the same mcp-server.cjs (via node direct, mirroring the .mcp.json durable-fix pattern). Inside codex REPL, /mcp lists claude-mem alongside native codex_apps with all 21 tools available (search, smart_search, memory_add, observation_search, timeline, etc.). Auth: Unsupported is expected for stdio-local MCP servers — not a defect. Codex capture hooks (Peça 2) still pending — .codex-plugin/ in the bundle only ships plugin.json without handlers, so Codex sessions can read the claude-mem corpus but won't generate observations of their own yet.

Non-blocking note for maintainers running server-beta behind a proxy

The deployed client bundle uses X-Api-Key on POST /v1/events (the @better-auth/api-key default). The server-beta middleware in src/server/middleware/{auth,postgres-auth}.ts only reads Authorization: Bearer. Anyone running server-beta behind a header-normalizing proxy (Cloudflare → Express in my case) gets 401 {"error":"Unauthorized","message":"Missing bearer API key"} even with a valid key. Direct-LAN setups don't hit it because no proxy is in the path.

I opened #2719 to make the server middleware accept X-Api-Key as fallback to Bearer — fully backward-compatible (Bearer remains canonical and wins when both are sent), 3 new unit tests passing alongside the existing 12 in tests/server/auth-api-key.test.ts, verified end-to-end against my deployment.

Net result

Canary bcf437fb is green across all three Windows shells for me. Zero changes outside the deployed bundle, backup retained for instant rollback if needed.

cc @thedotmack @CaTeIM @YOMXXX @Bruce583102198

🤖 Generated with Claude Code

thedotmack pushed a commit that referenced this pull request Jun 6, 2026
…leware (#2719)

* fix(server-beta): accept X-Api-Key as fallback to Bearer in auth middleware

The server-beta auth middleware (`requireServerAuth` and `requirePostgresServerAuth`)
only reads `Authorization: Bearer <key>`. Clients using `@better-auth/api-key`
default config send raw keys via the `X-Api-Key` header — including the worker
bundle shipped from the Windows-canary line (PR #2699). The mismatch produces:

    POST /v1/events  X-Api-Key: cmem_...  → 401 {"error":"Unauthorized","message":"Missing bearer API key"}

Direct-LAN setups don't hit this because no proxy strips/normalizes headers and
the worker's own retry path eventually round-trips; remote setups behind a proxy
(Cloudflare → Express in our case) fail closed and never capture.

Fix: middleware accepts either header. Bearer remains canonical so existing
clients are unaffected; `X-Api-Key` is the fallback when Bearer is absent.
Same change applied to the bun:sqlite-backed `auth.ts` and the Postgres-backed
`postgres-auth.ts` (the latter is what server-beta actually uses).

Tests added in `tests/server/auth-api-key.test.ts`:
  - accepts X-Api-Key when Bearer absent (positive)
  - prefers Bearer when both present (defense-in-depth)
  - rejects requests with neither header (negative regression)

No new test for postgres-auth.ts (no existing suite that wires the PG pool);
the parser logic is byte-identical to auth.ts so the bun:sqlite suite covers
the behavioral contract.

Verified end-to-end against a deployed server-beta:
  curl -X POST https://mem.late.app.br/v1/events \
       -H 'X-Api-Key: cmem_...' -d '{"events":[]}'
  # before:  401 "Missing bearer API key"
  # after:   400 ValidationError (auth ok, body invalid — expected)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(server-beta): update 401 message to mention both accepted auth headers

Addresses Greptile P2 feedback on #2719: the 401 body still said "Missing
bearer API key" after the middleware started accepting X-Api-Key, which made
debugging harder for clients sending only X-Api-Key with a typo'd header name.

Message now lists both accepted forms in both auth.ts and postgres-auth.ts,
keeping the two backends consistent. Test that asserts the body in the
both-absent rejection path updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Alessandro Costa <alessandro@claudio.dev>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tomeyerman

Copy link
Copy Markdown

Independent Windows validation: #2806 fixed the PATH gap, but the cmd.exe quoting defect this PR targets still breaks semantic search on a clean v13.4.0 / latest main.

Flagging this so closing #2790 isn't read as "Windows Chroma fixed" — there are two distinct root causes here:

Minimal repro on Windows 11, claude-mem 13.4.0, uvx 0.9.11, Python 3.13.9, with uvx confirmed on PATH (i.e. this is post-#2806 behavior — the PATH precondition is already satisfied, which isolates the failure to the >/< quoting, not PATH):

(A) Direct spawn — exactly what this PR does — WORKS:

uvx --python 3.13 --with "onnxruntime>=1.20" --with "protobuf<7" chroma-mcp==0.2.6 --client-type persistent --data-dir <dir>
  → {"result":{ ... "serverInfo":{"name":"chroma","version":"1.6.0"}}}    (boots, waits for stdio)

(B) cmd.exe /c uvx — what main's connectInternal still does — FAILS in ~270ms:

cmd /c uvx --python 3.13 --with onnxruntime>=1.20 --with protobuf<7 chroma-mcp==0.2.6 --client-type persistent --data-dir <dir>
  → "The system cannot find the file specified"    (cmd parses <7 as stdin redirection from a file named "7")

Same args, same machine — the only difference is the cmd.exe wrapper, and it dies in ~270 ms on the redirection.

So this PR (drop the cmd.exe wrapper, spawn uvx directly) is the actual fix for the quoting defect, and #2806 is the complementary PATH half. With #2806 now merged, the PATH confound is gone, so this is the remaining blocker for Windows semantic search. The corresponding issues (#2716, #2667) are currently closed as not planned despite the defect still reproducing on latest main.

Happy to check out this branch, build it, and confirm Chroma connects end-to-end on Windows 11 if that helps land it.

@rodboev

rodboev commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

One more Windows rollup note from the newer follow-up wave: there are a few additional open Windows-focused PRs now that may be worth tracking alongside this canary if you do another bundle or maintainer pass:

For #2945 specifically: the Linux/macOS-facing jobs are green on the current head, and the only remaining red check is the same Windows npm-install/node-gyp Visual Studio toolchain failure already showing up on other open Windows PRs rather than a new branch-specific regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Windows compatibility: replace sh spawn + silence printf EPIPE