Skip to content

fix(blender): unblock send_command hang against persistent-connection server (#1022)#1026

Open
kovtcharov wants to merge 3 commits intomainfrom
kalin/fix-1022-blender-hang
Open

fix(blender): unblock send_command hang against persistent-connection server (#1022)#1026
kovtcharov wants to merge 3 commits intomainfrom
kalin/fix-1022-blender-hang

Conversation

@kovtcharov
Copy link
Copy Markdown
Collaborator

Why this matters

Before: on v0.17.6, running gaia blender ... against the Blender MCP addon hangs forever after the first tool call (e.g. clear_scene), and Ctrl-C does not interrupt it. Reported in #1022.

After: send_command reads the response incrementally and returns as soon as a complete JSON document parses out of the buffer — matching the framing the server already uses. A 30 s socket timeout also keeps Ctrl-C responsive.

The bug was a protocol mismatch between client and server: the client looped on recv() until FIN, but the Blender addon keeps the connection open to accept further commands on the same socket, so FIN never came. Both ends ended up blocked on recv() waiting for each other.

Adds tests/unit/mcp/test_blender_mcp_client.py — a socket-level regression test using a fake persistent-connection server. Runs in CI without a Blender install.

Fixes #1022. Plan for the broader Blender modernization is tracked in #1025.

Test plan

  • `pytest tests/unit/mcp/test_blender_mcp_client.py -v` — passes in <1 s
  • `pytest src/gaia/agents/blender/tests/test_mcp_client.py -v` — still skips cleanly without a Blender server
  • `python util/lint.py --black --isort` — clean
  • Manual smoke: `gaia blender --example 1` against a real Blender 4.5 + addon (delegated to issue reporter via [Bug]: blender agent hangs in tool execution #1022)

… server (#1022)

Client used a chunk-until-EOF recv loop, but the Blender addon's server
keeps the connection open to accept further commands on the same socket.
Both ends blocked on recv() forever after the first response.

Switch to incremental JSON parsing (matching the server's own framing
in blender_mcp_server.py:128-166) and add a 30s socket timeout so a
misbehaving server can't hang the agent indefinitely and Ctrl-C
remains responsive.

Adds a socket-level regression test using a fake persistent-connection
server, so this can be exercised in CI without a real Blender install.
@github-actions github-actions Bot added mcp MCP integration changes tests Test changes labels May 10, 2026
…t timeouts

Follow-up hardening from a deeper review of the #1022 fix:

* Tolerate UnicodeDecodeError in the recv loop. A multi-byte UTF-8
  character (e.g. a non-ASCII object name in an error message) split
  across two recv() boundaries was raising UnicodeDecodeError and
  bubbling out as a hard MCPError; now treated identically to
  JSONDecodeError — keep buffering.

* Treat ConnectionResetError as "connection closed before complete
  response received" rather than letting it surface as a raw OSError.
  Some servers send RST instead of FIN when closing mid-response.

* Bump the default per-recv socket timeout from 30s to 120s. Long
  Blender operations such as render/simulate via execute_code can sit
  silent for tens of seconds at a time; the original 30s was tight
  enough to misfire in legitimate workflows.

Tests added:
  * test_send_command_assembles_chunked_response — splits a non-ASCII
    response across two send() calls with a delay, exercising the
    incremental-parse loop and the UnicodeDecodeError tolerance.
  * test_send_command_raises_on_premature_close — server closes after
    sending a partial JSON; client must surface a clear MCPError.

Also tightened the connection-refused test to use a kernel-assigned
free port instead of port 1, for CI portability.
Addresses items raised in a follow-up review of the original fix:

* Use json.JSONDecoder().raw_decode() instead of json.loads() so a
  hypothetically pipelined server response ({"a":1}{"b":2}) yields the
  first object instead of looping until timeout. The current server
  sends one response per command, but raw_decode is cheap and removes
  a foot-gun.

* Split UTF-8 decoding from JSON parsing into separate try blocks. The
  prior single try caught UnicodeDecodeError but only because of the
  combined except clause; this is clearer and more robust.

* Cap the response buffer at 16 MB. A misbehaving server that trickled
  bytes just under each per-recv timeout would otherwise grow the
  buffer without bound. 16 MB leaves enormous headroom for legitimate
  execute_code stdout payloads.

* Broaden the close-mid-response catch to ConnectionResetError +
  ConnectionAbortedError + BrokenPipeError. Previously only
  ConnectionResetError was handled specifically; the others fell
  through to the generic Exception handler with a less-actionable
  message.

* Drop self.log.setLevel(logging.INFO) from MCPClient.__init__ — it
  mutated the global module logger every time a client was constructed,
  overriding the user's logging config.

* Bump execute_code's default timeout from 120s (inherited) to 600s
  with a docstring explaining why — bpy.ops.render.render() and
  similar can sit silent for many seconds without emitting data.

Test changes:

* test_send_command_assembles_chunked_response now actually splits
  mid-emoji. The previous version split at offset 20, which landed in
  pure ASCII; the UnicodeDecodeError path was uncovered. Compute the
  split offset from the JSON-encoded payload (with ensure_ascii=False)
  so the cut lands inside the 4-byte UTF-8 codepoint.

* New test_send_command_surfaces_enhanced_error_message — covers
  _enhance_error_message, which had no prior test coverage.

* _ChunkedServer now serializes with ensure_ascii=False so multi-byte
  payloads land on the wire as raw UTF-8 (matching the path the
  UnicodeDecodeError tolerance defends against).

5/5 tests in tests/unit/mcp/test_blender_mcp_client.py pass; 123/123
in the broader scope; lint clean.
@kovtcharov kovtcharov enabled auto-merge May 10, 2026 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mcp MCP integration changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: blender agent hangs in tool execution

1 participant