Skip to content

mcp: suppress subprocess exit error during graceful close#913

Open
blackwell-systems wants to merge 1 commit intomodelcontextprotocol:mainfrom
blackwell-systems:fix/client-session-close-race
Open

mcp: suppress subprocess exit error during graceful close#913
blackwell-systems wants to merge 1 commit intomodelcontextprotocol:mainfrom
blackwell-systems:fix/client-session-close-race

Conversation

@blackwell-systems
Copy link
Copy Markdown

Summary

Fixes #855. When ClientSession.Close() initiates graceful shutdown by closing stdin, the subprocess may exit with a non-zero status. This exit error propagated through pipeRWC.Close() as closeErr, causing spurious errors from ClientSession.Close().

This matches the analysis by @adonovan in golang/go#77336:

"a bug in mcp causing a race between the active part of Close (which breaks the connection and terminates the child process) and the waiting part of close (which seems to observe an error from the terminated child without realizing that close itself was responsible)"

Root cause

The error path traced by @adonovan:

  1. ClientSession.Close() calls conn.Close(), which sets connClosing = true
  2. updateInFlight sees idle + closing, calls s.closer.Close() (pipeRWC.Close)
  3. pipeRWC.Close() closes stdin, then waits for cmd.Wait()
  4. The subprocess (e.g. gopls) sees EOF on stdin, treats it as fatal, exits with code 2
  5. cmd.Wait() returns *exec.ExitError, stored as closeErr
  6. conn.wait(false) returns closeErr to the caller

The subprocess exited because the client told it to (by closing stdin). The exit code is not meaningful for the client.

Fix

In pipeRWC.Close(), after closing stdin and waiting for the subprocess to exit, suppress *exec.ExitError. Errors from the SIGTERM/SIGKILL fallback paths are still propagated, as those indicate the subprocess did not respond to graceful shutdown.

The fix is at the transport layer (not the jsonrpc2 layer) because only the stdio transport has the semantic knowledge that "I closed stdin, so the subprocess exiting is expected."

Test plan

  • Added TestCmdTransportNonZeroExit: connects to a server that exits with code 2 on shutdown, verifies session.Close() returns nil
  • Confirmed the test fails without the fix (exit status 2) and passes with it
  • Full test suite passes: GOWORK=off go test ./... -count=1

When ClientSession.Close() initiates graceful shutdown via stdin close,
the subprocess may exit with a non-zero status (e.g., gopls exits with
code 2 because it treats EOF on stdin as fatal). This exit error
propagated through pipeRWC.Close() -> closeErr -> ClientSession.Close(),
causing spurious "exit status 2" errors for callers.

The fix: in pipeRWC.Close(), after closing stdin and waiting for the
subprocess to exit, suppress *exec.ExitError. The subprocess exited
because we told it to (by closing stdin); the exit code is not
meaningful for the client. Errors from SIGTERM/SIGKILL paths are still
propagated, as those indicate the subprocess did not respond to graceful
shutdown.

This matches the analysis by @adonovan in golang/go#77336: "a bug in
mcp causing a race between the active part of Close (which breaks the
connection and terminates the child process) and the waiting part of
close (which seems to observe an error from the terminated child without
realizing that close itself was responsible)."

Fixes modelcontextprotocol#855
@adonovan
Copy link
Copy Markdown

cc: @findleyr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Potential race condition in (*mcp.ClientSession).Close()

2 participants