Skip to content

HTTP transport: orphaned python server after domain reload (port collision on resume) #1164

@malard

Description

@malard

Summary

When using the HTTP transport, HttpBridgeReloadHandler.OnBeforeAssemblyReload calls transport.ForceStop(TransportMode.Http) to tear down the python server before a Unity domain reload. In some cases this silently fails to actually kill the python process. After the reload, ResumeHttpWithRetriesAsync retries StartAsync up to 6 times (0s, 1s, 3s, 5s, 10s, 30s — ~49s total) and every attempt collides with the still-alive python server:

Cannot start the local HTTP server because port <N> is already in use by PID(s): <X>
MCP For Unity will not terminate unrelated processes. Stop the owning process manually or change the HTTP URL.

Eventually the connection section gives up:

MCP-FOR-UNITY: Server no longer running; ending orphaned session.

The python process meanwhile is healthy — its log shows the WebSocket close with code 1005 (no status, i.e. the Unity end of the socket disappeared without a proper close handshake) at the moment of domain reload, then it sits idle waiting for a reconnect that never comes because the Editor plugin is trying to spawn a fresh server instead of reattaching.

Environment

  • MCP for Unity 9.7.1
  • Unity 6000.3.5f2
  • Windows 11
  • HTTP transport, configured on a non-default port (reproduced on both 8080 and 8280)

Repro

  1. Configure HTTP transport, start the server, register a plugin session.
  2. Trigger a Unity domain reload (any script edit, package import, refresh).
  3. After reload, observe: Cannot start the local HTTP server because port N is already in use by PID(s): X in the Console, where X is the python PID from before the reload.
  4. Confirm the orphan: `Get-CimInstance Win32_Process -Filter "ProcessId = X"` shows the original `mcp-for-unity.exe --transport http --http-url http://127.0.0.1:N --pidfile .../Library/MCPForUnity/RunState/mcp_http_N.pid` invocation, still running.

Python-side log immediately preceding the disconnect:

```
2026-05-25 23:44:46,633 - transport.plugin_hub - INFO - Plugin registered: ErtsUnity (...)
2026-05-25 23:44:46,633 - transport.plugin_hub - INFO - Registered 30 tools for session ...
2026-05-25 23:46:36,636 - transport.plugin_hub - INFO - Plugin session ... disconnected (1005)
```

Root cause (as far as I can read it)

`HttpBridgeReloadHandler.OnBeforeAssemblyReload` (Editor/Services/HttpBridgeReloadHandler.cs):

```csharp
// beforeAssemblyReload is synchronous; force a synchronous teardown so we do not
// leave an orphaned socket due to an unfinished async close handshake.
transport.ForceStop(TransportMode.Http);
```

`ForceStop`'s return value is discarded and no verification that the process is gone happens before the reload proceeds. `ServerManagementService.StopLocalServerForPidFileAsync` (Editor/Services/ServerManagementService.cs ~line 585+) has multiple early-return paths that refuse to kill — stale pidfile, PID-validation failure, fingerprint mismatch, refusal-to-kill-Unity-Editor-pid, etc. — and the silent-failure path looks reachable when the pidfile/token EditorPrefs handshake has drifted from the on-disk pidfile. Once `ForceStop` no-ops, the python server survives reload and the retry loop on the other side of reload can never bind.

Suggested fix

  1. Make ForceStop's outcome a precondition for reload. If the kill failed (returned false or process still alive after a short poll), surface a hard error and either:
    • Block the reload (probably impractical), or
    • At least flip the post-reload resume into "do not spawn — attach to existing if reachable" mode.
  2. Pre-reload sanity verification. After calling ForceStop, poll `Process.GetProcessById(pid)` for ~500ms; if it's still alive, escalate to an unconditional kill (or log loudly so the user knows what's happening — silent leakage is the worst failure mode).
  3. On post-reload spawn failure due to port-in-use, attempt to reattach. The python server is still WebSocket-listening on `/hub/plugin`. If the spawn attempt fails because the port is held by what looks like a previous instance of our own server (validate via pidfile+token), the Editor plugin should reconnect to it rather than retry-spawn.

Any one of these would have caught my case. The combination of "ForceStop silently failed AND resume always tries to spawn" is what made this manifest as an unrecoverable dead session rather than a transient hiccup.

Workaround

Until this is fixed, I'm running a local Editor script that subscribes to `AssemblyReloadEvents.beforeAssemblyReload` (and `EditorApplication.quitting`), reads every `Library/MCPForUnity/RunState/mcp_http_*.pid` file, and unconditionally kills the referenced PID if it's still alive and looks like a python process. It runs in addition to the package's own ForceStop and just papers over the gap. Happy to share if useful.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions