fix: prevent MCP response loss from stdout suppression race condition#43
Closed
anchapin wants to merge 1 commit intoNatLabRockies:developfrom
Closed
fix: prevent MCP response loss from stdout suppression race condition#43anchapin wants to merge 1 commit intoNatLabRockies:developfrom
anchapin wants to merge 1 commit intoNatLabRockies:developfrom
Conversation
Root cause: the FastMCP middleware held os.dup2() on process-global fd 1 (stdout) redirected to stderr for the entire duration of each tool call. FastMCP dispatches sync tools via anyio.to_thread.run_sync, so concurrent tool calls could run simultaneously. While Thread A held fd 1 → stderr, the asyncio stdout_writer task (event loop thread) wrote Thread B's JSON- RPC response to fd 1 — which was stderr — and the MCP client received nothing, causing MCP error -32001 timeouts on all tools. Fix: - Remove global FastMCP middleware from server.py - Add threading.RLock to suppress_openstudio_warnings() so concurrent worker threads cannot interleave their fd 1 redirects - Use RLock (not Lock) to allow safe same-thread nested re-entry (e.g. create_baseline_osm → set_constructions → add_hvac) - Delete dead create_suppression_middleware() function - Move flush() inside try/finally to prevent fd leak on flush error - Restore targeted suppression to all SDK callsites that emit SWIG memory warnings: BCLMeasure(), Model(), DesignDay(), model.save(), VersionTranslator.loadModel(), and full HVAC creation blocks across hvac_systems, weather, common_measures, comstock, geometry, measures, measure_authoring, model_management, and baseline_model - Add is_initialized() guard in baseline_model.set_constructions() - Add regression test (test_concurrent_tools.py, CI shard 5) that fires create_baseline_osm + get_server_status concurrently and asserts both return valid responses Known limitation: HVAC creation blocks (~2-10s) are the widest suppression windows. Sequential MCP clients (Claude Desktop, Cursor) are unaffected; only concurrent multi-client stdio usage could theoretically lose a response during HVAC tool calls. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #42
Problem
All MCP tools returned
error -32001: Request timed outwhen any slow tool (create_new_building,create_baseline_osm, HVAC creation) ran concurrently with any other tool call — including trivial ones likeget_server_status.Root Cause
The FastMCP middleware held
os.dup2()on process-global fd 1 (stdout) redirected to stderr for the entire duration of each tool call. FastMCP dispatches sync tools viaanyio.to_thread.run_sync— multiple tools can run simultaneously. While Thread A held fd 1 → stderr, the asynciostdout_writertask wrote Thread B's JSON-RPC response to fd 1 (stderr) → client received nothing → timeout.Fix
server.pythreading.RLocktosuppress_openstudio_warnings()so concurrent worker threads cannot interleave their fd 1 redirectscreate_baseline_osm → set_constructions → add_hvacall on one thread)create_suppression_middleware()functionflush()inside try/finally to prevent fd leak on flush errorBCLMeasure(),Model(),DesignDay(),model.save(),VersionTranslator.loadModel(), and full HVAC creation blocks across 10 filesis_initialized()guard inbaseline_model.set_constructions()Test Results
pytest -m "not integration")test_concurrent_tools.py: firescreate_baseline_osm+get_server_statusconcurrently, asserts both return valid responsesKnown Limitation
HVAC creation blocks (~2–10s) are the widest suppression windows. Only concurrent multi-client stdio usage could theoretically lose a response during HVAC tool calls. Sequential clients (Claude Desktop, Cursor) are unaffected.
Files Changed (13)
server.pystdout_suppression.pymodel_management/operations.pymodel_management/baseline_model.pyset_constructions+add_hvachvac_systems/operations.pyweather/operations.pyDesignDay()constructorcommon_measures/operations.pyBCLMeasure()constructorcomstock/operations.pyBCLMeasure()+Model()geometry/operations.pymodel.save()measures/operations.pyBCLMeasure()+model.save()measure_authoring/operations.pytests/test_concurrent_tools.py.github/workflows/ci.yml