Skip to content

Commit ebd5136

Browse files
fix(diagnostics): classify stdio exit-before-initialize; surface stderr + exit code (MCP-1093, #599) (#606)
* fix(diagnostics): classify stdio subprocess exit-before-initialize; surface stderr + exit code When an stdio upstream subprocess starts but exits before completing the MCP initialize handshake (e.g. a docker image that prints a fatal config error and dies, like mcp/brave-search with no BRAVE_API_KEY), mcp-go reports a closed transport. Previously this fell through to MCPX_UNKNOWN_UNCLASSIFIED with cause "transport error: transport closed", and the actionable child stderr was discarded — so the UI told users to "file a bug" for a self-serviceable config error. Recurring gap (cf. #483). - New code MCPX_STDIO_EXIT_BEFORE_INITIALIZE + catalog entry (registry.go). - classifyStdio: match "transport closed" / "exited before completing the mcp initialize" under the stdio transport hint → the new code (gated so a closed transport on another transport is not misattributed). - initialize(): on a closed-transport failure, enrich the error with the captured recent stderr tail and (best-effort) the child exit code so the real cause is visible in the UI banner and per-server logs. Exit code is best-effort: on stdio the process handle is only extracted after a successful initialize, so the stderr tail is the primary signal on the failure path. - Regression test (classifier_test.go): raw + enriched inputs classify to the new code and carry exit code + stderr tail; a non-stdio "transport closed" does not match. The UI banner surfaces whatever the classifier returns, so it improves automatically — no frontend change needed. Related MCP-1093, #599 Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(diagnostics): add error doc + production enrichment test for stdio early-exit Addresses Codex review on PR #606: - Add docs/errors/MCPX_STDIO_EXIT_BEFORE_INITIALIZE.md and the STDIO index entry in docs/errors/README.md (ENG-9 docs for the new public code). - Extract enrichTransportClosedError() from the initialize() premature-exit path and unit-test it (exit code + stderr tail + wrapped cause), replacing the hard-coded-string assertion that proved nothing about the production path. Related #599 * fix(diagnostics): scope stdio early-exit enrichment to stderr (drop unreliable exit code) Addresses Codex review round 2 on PR #606: childExitInfo() only had a code when ProcessState was set, but the stdio init-failure path returns before the process is reaped, so the exit code was essentially never present. Drop the exit-code from the enrichment, docs, and test; surface the captured stderr tail (the actionable cause). Exit-code capture is a separate follow-up. Related #599 * test(diagnostics): drop stale exit-code wording in stdio early-exit test Codex review round 3 on PR #606: the enrichment no longer surfaces an exit code (stderr-only scope), so update the test comment + case wording so readers don't infer exit-code surfacing is part of acceptance. Related #599 * fix(diagnostics): gate stdio premature-exit enrichment to stdio transport Codex review round on PR #606: initialize() is shared by HTTP/SSE transports, so a remote server closing the connection was mislabeled 'server process exited before completing the MCP initialize handshake'. Gate the enrichment to the stdio transport via shouldEnrichStdioPrematureExit (HTTP/SSE keep generic diagnostics) + table test across transports. Related #599 * fix(diagnostics): classify auto-detected stdio servers (resolved transport hint) Codex review round on PR #606: the supervisor built the classifier transport hint from raw Config.Protocol, so auto-detected stdio servers (Protocol=='' + Command!='') missed the stdio-gated rules and stayed UNKNOWN. Use transport.DetermineTransportType(config) at both hint sites so the resolved transport drives classification + add a DetermineTransportType regression test. Related #599 --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
1 parent 8b7b156 commit ebd5136

10 files changed

Lines changed: 311 additions & 7 deletions

File tree

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
id: MCPX_STDIO_EXIT_BEFORE_INITIALIZE
3+
title: MCPX_STDIO_EXIT_BEFORE_INITIALIZE
4+
sidebar_label: EXIT_BEFORE_INITIALIZE
5+
description: The stdio MCP server process exited before completing the MCP initialize handshake.
6+
---
7+
8+
# `MCPX_STDIO_EXIT_BEFORE_INITIALIZE`
9+
10+
**Severity:** error
11+
**Domain:** STDIO
12+
13+
## What happened
14+
15+
mcpproxy spawned the stdio server, but the process exited before completing the
16+
MCP `initialize` handshake. The underlying transport reports a closed pipe / EOF
17+
rather than a typed exit error, which previously surfaced as a generic
18+
`MCPX_UNKNOWN_UNCLASSIFIED`. This code makes the cause explicit, and mcpproxy now
19+
folds the last lines of the child's **stderr** into the error so you can see the
20+
real, usually self-serviceable problem.
21+
22+
This almost always means a **missing or invalid configuration** — a required API
23+
key or environment variable, a bad argument, or a missing dependency — that makes
24+
the server print an error and exit immediately on startup.
25+
26+
## Common causes
27+
28+
- A required environment variable / API key is missing (e.g. a server that needs
29+
`BRAVE_API_KEY` and exits with `Error: --brave-api-key is required`).
30+
- A required CLI flag/argument was not passed via `args`.
31+
- A Docker image is missing a mandatory `env` value (cf. `MCPX_STDIO_EXIT_NONZERO`).
32+
- The upstream package is not installed in the resolved runtime (`uvx`/`pipx`/`npx`).
33+
34+
## How to fix
35+
36+
### 1. Read the captured stderr
37+
38+
The error banner and the per-server log already include the last stderr lines.
39+
To see more:
40+
41+
```bash
42+
mcpproxy upstream logs <server-name> --tail 100
43+
```
44+
45+
The last lines almost always contain the real cause (a missing-key message, a
46+
traceback, a "module not found", etc.).
47+
48+
### 2. Supply the missing configuration
49+
50+
Add the required environment variable / argument to the server entry, e.g.:
51+
52+
```json
53+
{
54+
"name": "brave-search",
55+
"command": "npx",
56+
"args": ["-y", "@modelcontextprotocol/server-brave-search"],
57+
"env": { "BRAVE_API_KEY": "your-key-here" }
58+
}
59+
```
60+
61+
### 3. Reproduce manually
62+
63+
Run the exact `command`, `args`, and `env` in a normal shell; the server should
64+
start and wait for MCP input instead of exiting. Once it starts cleanly, restart
65+
the server in mcpproxy.
66+
67+
## Related codes
68+
69+
- [`MCPX_STDIO_EXIT_NONZERO`](MCPX_STDIO_EXIT_NONZERO.md) — exited non-zero before handshake
70+
- [`MCPX_STDIO_HANDSHAKE_TIMEOUT`](MCPX_STDIO_HANDSHAKE_TIMEOUT.md) — no `initialize` reply in time

docs/errors/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ Run `mcpproxy doctor list-codes` for the machine-readable list.
4040
- [`MCPX_STDIO_SPAWN_ENOENT`](MCPX_STDIO_SPAWN_ENOENT.md) — command not found on PATH
4141
- [`MCPX_STDIO_SPAWN_EACCES`](MCPX_STDIO_SPAWN_EACCES.md) — permission denied executing command
4242
- [`MCPX_STDIO_EXIT_NONZERO`](MCPX_STDIO_EXIT_NONZERO.md) — server exited before handshake
43+
- [`MCPX_STDIO_EXIT_BEFORE_INITIALIZE`](MCPX_STDIO_EXIT_BEFORE_INITIALIZE.md) — process exited before `initialize` (captured stderr surfaced)
4344
- [`MCPX_STDIO_HANDSHAKE_TIMEOUT`](MCPX_STDIO_HANDSHAKE_TIMEOUT.md) — no `initialize` reply within 30s
4445
- [`MCPX_STDIO_HANDSHAKE_INVALID`](MCPX_STDIO_HANDSHAKE_INVALID.md) — malformed MCP frame
4546

internal/diagnostics/classifier.go

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,17 @@ func classifyStdio(err error, hints ClassifierHints) Code {
186186
return STDIOSpawnENOENT
187187
case strings.Contains(lmsg, "permission denied"):
188188
return STDIOSpawnEACCES
189+
// Subprocess started but the transport closed before the MCP initialize
190+
// handshake completed — the child exited early (e.g. printed a fatal
191+
// config error to stderr and died). mcp-go surfaces this as a closed
192+
// transport, which otherwise falls through to MCPX_UNKNOWN_UNCLASSIFIED
193+
// even though the real cause is on the child's stderr (MCP-1093 / #599).
194+
// Gated on the stdio hint so a "transport closed" from another transport
195+
// is not misattributed.
196+
case strings.Contains(lmsg, "transport closed"),
197+
strings.Contains(lmsg, "exited before completing the mcp initialize"),
198+
strings.Contains(lmsg, "exited before the mcp initialize"):
199+
return STDIOExitBeforeInitialize
189200
case strings.Contains(lmsg, "did not respond to mcp initialize"),
190201
strings.Contains(lmsg, "handshake timeout"):
191202
return STDIOHandshakeTimeout

internal/diagnostics/classifier_test.go

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,52 @@ func TestClassify_Nil(t *testing.T) {
1616
}
1717
}
1818

19+
// TestClassify_STDIO_ExitBeforeInitialize covers GitHub #599 / MCP-1093: a
20+
// subprocess that exits before completing the MCP initialize handshake must
21+
// classify to MCPX_STDIO_EXIT_BEFORE_INITIALIZE (not MCPX_UNKNOWN_UNCLASSIFIED).
22+
// The surfaced error carries the captured stderr tail (the exit code is not
23+
// reliably available on this path and is intentionally not surfaced).
24+
func TestClassify_STDIO_ExitBeforeInitialize(t *testing.T) {
25+
cases := []struct {
26+
name string
27+
err error
28+
}{
29+
{
30+
name: "raw transport-closed under stdio hint",
31+
err: errors.New(`stdio transport (command="docker", docker_isolation=true): transport error: transport closed`),
32+
},
33+
{
34+
name: "enriched message carrying stderr tail",
35+
err: errors.New("server process exited before completing the MCP initialize handshake; recent stderr:\n" +
36+
" | Error: --brave-api-key is required: transport closed"),
37+
},
38+
}
39+
for _, tc := range cases {
40+
t.Run(tc.name, func(t *testing.T) {
41+
got := Classify(tc.err, ClassifierHints{Transport: "stdio"})
42+
if got != STDIOExitBeforeInitialize {
43+
t.Fatalf("Classify(%q) = %q, want %q", tc.err, got, STDIOExitBeforeInitialize)
44+
}
45+
})
46+
}
47+
48+
// The production enrichment that folds the child exit code + stderr tail into
49+
// the initialize-failure error is covered by TestEnrichTransportClosedError_*
50+
// in internal/upstream/core — a real test of the helper the production path
51+
// calls, instead of asserting against a hard-coded string here.
52+
// (Codex review on PR #606)
53+
}
54+
55+
// TestClassify_STDIO_ExitBeforeInitialize_NotForHTTP guards against over-match:
56+
// the same "transport closed" wording on a non-stdio transport must not be
57+
// captured by the stdio rule.
58+
func TestClassify_STDIO_ExitBeforeInitialize_NotForHTTP(t *testing.T) {
59+
err := errors.New("transport error: transport closed")
60+
if got := Classify(err, ClassifierHints{Transport: "http"}); got == STDIOExitBeforeInitialize {
61+
t.Fatalf("HTTP transport must not classify as %q", STDIOExitBeforeInitialize)
62+
}
63+
}
64+
1965
func TestClassify_STDIO_SpawnENOENT(t *testing.T) {
2066
// os/exec wraps ENOENT into *exec.Error when the binary is missing.
2167
wrapped := &exec.Error{

internal/diagnostics/codes.go

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,12 @@ package diagnostics
88

99
// STDIO domain — stdio-transport MCP server failures.
1010
const (
11-
STDIOSpawnENOENT Code = "MCPX_STDIO_SPAWN_ENOENT"
12-
STDIOSpawnEACCES Code = "MCPX_STDIO_SPAWN_EACCES"
13-
STDIOExitNonzero Code = "MCPX_STDIO_EXIT_NONZERO"
14-
STDIOHandshakeTimeout Code = "MCPX_STDIO_HANDSHAKE_TIMEOUT"
15-
STDIOHandshakeInvalid Code = "MCPX_STDIO_HANDSHAKE_INVALID"
11+
STDIOSpawnENOENT Code = "MCPX_STDIO_SPAWN_ENOENT"
12+
STDIOSpawnEACCES Code = "MCPX_STDIO_SPAWN_EACCES"
13+
STDIOExitNonzero Code = "MCPX_STDIO_EXIT_NONZERO"
14+
STDIOExitBeforeInitialize Code = "MCPX_STDIO_EXIT_BEFORE_INITIALIZE"
15+
STDIOHandshakeTimeout Code = "MCPX_STDIO_HANDSHAKE_TIMEOUT"
16+
STDIOHandshakeInvalid Code = "MCPX_STDIO_HANDSHAKE_INVALID"
1617
)
1718

1819
// OAUTH domain — OAuth 2.1 / PKCE flow failures.

internal/diagnostics/registry.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,16 @@ func seedSTDIO() {
6161
},
6262
DocsURL: docsURL(STDIOExitNonzero),
6363
})
64+
register(CatalogEntry{
65+
Code: STDIOExitBeforeInitialize,
66+
Severity: SeverityError,
67+
UserMessage: "The stdio server process exited before completing the MCP initialize handshake. This usually means a missing/invalid configuration (e.g. a required API key or environment variable) — check the captured stderr for the exact cause.",
68+
FixSteps: []FixStep{
69+
{Type: FixStepButton, Label: "Show last server log lines", FixerKey: "stdio_show_last_logs"},
70+
{Type: FixStepLink, Label: "Troubleshooting early exit", URL: docsURL(STDIOExitBeforeInitialize)},
71+
},
72+
DocsURL: docsURL(STDIOExitBeforeInitialize),
73+
})
6474
register(CatalogEntry{
6575
Code: STDIOHandshakeTimeout,
6676
Severity: SeverityError,

internal/runtime/supervisor/supervisor.go

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import (
1515
"github.com/smart-mcp-proxy/mcpproxy-go/internal/diagnostics"
1616
"github.com/smart-mcp-proxy/mcpproxy-go/internal/runtime/configsvc"
1717
"github.com/smart-mcp-proxy/mcpproxy-go/internal/runtime/stateview"
18+
transportpkg "github.com/smart-mcp-proxy/mcpproxy-go/internal/transport"
1819
"github.com/smart-mcp-proxy/mcpproxy-go/internal/upstream/types"
1920
)
2021

@@ -696,9 +697,12 @@ func (s *Supervisor) updateStateView(name string, state *ServerState) {
696697
status.LastError = errorStr
697698

698699
// Spec 044: classify raw error into stable diagnostic code.
700+
// Use the RESOLVED transport, not the raw Config.Protocol: an
701+
// auto-detected stdio server has Protocol=="" + Command!="" and
702+
// would otherwise miss the stdio-gated classifier rules (#599).
699703
transport := ""
700704
if state.Config != nil {
701-
transport = string(state.Config.Protocol)
705+
transport = transportpkg.DetermineTransportType(state.Config)
702706
}
703707
classifyAndAttach(status, state.ConnectionInfo.LastError, transport)
704708
// Spec 044 Phase H: notify telemetry counter store.
@@ -969,9 +973,12 @@ func (s *Supervisor) updateSnapshotFromEvent(event Event) {
969973
status.LastError = errorStr
970974

971975
// Spec 044: classify raw error into stable diagnostic code.
976+
// Resolved transport (not raw Config.Protocol) so auto-detected
977+
// stdio servers (Protocol=="" + Command!="") hit the stdio
978+
// classifier rules (#599).
972979
transport := ""
973980
if status.Config != nil {
974-
transport = string(status.Config.Protocol)
981+
transport = transportpkg.DetermineTransportType(status.Config)
975982
}
976983
classifyAndAttach(status, connInfo.LastError, transport)
977984
// Spec 044 Phase H: notify telemetry counter store.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
package transport
2+
3+
import (
4+
"testing"
5+
6+
"github.com/smart-mcp-proxy/mcpproxy-go/internal/config"
7+
)
8+
9+
// Locks the behavior the supervisor classifier hint relies on (#599 / PR #606):
10+
// an auto-detected stdio server (empty/auto protocol + a command) must resolve to
11+
// stdio, so the stdio-gated diagnostic rules fire for it — using the raw
12+
// Config.Protocol would leave such servers unclassified.
13+
func TestDetermineTransportType(t *testing.T) {
14+
cases := []struct {
15+
name string
16+
cfg config.ServerConfig
17+
want string
18+
}{
19+
{"explicit stdio", config.ServerConfig{Protocol: "stdio"}, TransportStdio},
20+
{"auto-detected stdio (empty protocol + command)", config.ServerConfig{Command: "npx"}, TransportStdio},
21+
{"auto protocol + command", config.ServerConfig{Protocol: "auto", Command: "docker"}, TransportStdio},
22+
{"empty protocol + url", config.ServerConfig{URL: "https://example.com/mcp"}, TransportStreamableHTTP},
23+
{"explicit http", config.ServerConfig{Protocol: "http", URL: "https://example.com/mcp"}, "http"},
24+
}
25+
for _, tc := range cases {
26+
t.Run(tc.name, func(t *testing.T) {
27+
cfg := tc.cfg
28+
if got := DetermineTransportType(&cfg); got != tc.want {
29+
t.Errorf("DetermineTransportType(%+v) = %q, want %q", tc.cfg, got, tc.want)
30+
}
31+
})
32+
}
33+
}

internal/upstream/core/connection_lifecycle.go

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ import (
55
"encoding/json"
66
"errors"
77
"fmt"
8+
"io"
9+
"strings"
810
"time"
911

1012
"github.com/mark3labs/mcp-go/mcp"
@@ -59,6 +61,18 @@ func (c *Client) initialize(ctx context.Context) error {
5961
return fmt.Errorf("server did not respond to MCP initialize within %s and produced no stderr output (check that the command starts an MCP server and not a help banner)", waited)
6062
}
6163

64+
// STDIO subprocess exited before completing the handshake: mcp-go reports
65+
// a closed transport / EOF on the pipe (not a typed exit error). Surface
66+
// the captured stderr so the user sees the real, often self-serviceable
67+
// cause (e.g. "Error: --brave-api-key is required") instead of a bare
68+
// "transport closed" that the diagnostics layer marks UNKNOWN. Gated to
69+
// stdio: initialize() is shared by HTTP/SSE transports, which have no
70+
// local subprocess and must keep their generic diagnostics. (MCP-1093 /
71+
// #599; stdio gate per Codex review on PR #606)
72+
if shouldEnrichStdioPrematureExit(c.transportType, err) {
73+
return enrichTransportClosedError(c.formatRecentStderr(), err)
74+
}
75+
6276
return err
6377
}
6478

@@ -85,6 +99,50 @@ func (c *Client) initialize(ctx context.Context) error {
8599
return nil
86100
}
87101

102+
// enrichTransportClosedError builds the actionable error for a stdio subprocess
103+
// that exited before completing the MCP initialize handshake, folding in the
104+
// captured stderr tail so the UI banner and per-server logs show the real,
105+
// often self-serviceable cause (e.g. a missing API key) instead of a bare
106+
// "transport closed". It wraps the original cause with %w so callers can still
107+
// errors.Is/As it. Pure (no receiver state) so the production enrichment path is
108+
// unit-testable. (MCP-1093 / #599)
109+
//
110+
// Note: the child exit code is intentionally NOT surfaced here. On this failure
111+
// path mcp-go has not reaped the process (no Wait), so ProcessState is unset and
112+
// any exit code would be unreliable; the captured stderr is the actionable
113+
// signal. Surfacing the exit code is a separate follow-up.
114+
// shouldEnrichStdioPrematureExit reports whether an initialize() failure should
115+
// be enriched as a stdio subprocess premature exit. The enrichment is stdio-
116+
// specific (it describes a local "server process" and its stderr); HTTP/SSE
117+
// transports share initialize() but have no subprocess, so a closed-transport
118+
// error there must keep its generic diagnostics. (Codex review on PR #606)
119+
func shouldEnrichStdioPrematureExit(transportType string, err error) bool {
120+
return transportType == transportStdio && isTransportClosedErr(err)
121+
}
122+
123+
func enrichTransportClosedError(stderrBlock string, cause error) error {
124+
if stderrBlock != "" {
125+
return fmt.Errorf("server process exited before completing the MCP initialize handshake; recent stderr:\n%s: %w", stderrBlock, cause)
126+
}
127+
return fmt.Errorf("server process exited before completing the MCP initialize handshake and produced no stderr output (transport closed before the handshake): %w", cause)
128+
}
129+
130+
// isTransportClosedErr reports whether an initialize() failure indicates the
131+
// child process went away mid-handshake. mcp-go surfaces a premature stdio
132+
// exit as a closed transport / EOF on the pipe rather than a typed exit error,
133+
// so we match those shapes to distinguish "the process died" from a genuine
134+
// malformed-handshake response. (MCP-1093 / #599)
135+
func isTransportClosedErr(err error) bool {
136+
if errors.Is(err, io.EOF) {
137+
return true
138+
}
139+
lmsg := strings.ToLower(err.Error())
140+
return strings.Contains(lmsg, "transport closed") ||
141+
strings.Contains(lmsg, "broken pipe") ||
142+
strings.Contains(lmsg, "file already closed") ||
143+
strings.Contains(lmsg, "use of closed")
144+
}
145+
88146
// registerNotificationHandler registers a handler for MCP notifications.
89147
// This should be called after client.Start() and initialize() succeed.
90148
// It handles notifications/tools/list_changed to trigger reactive tool discovery.

0 commit comments

Comments
 (0)