Fix deadlocks#170
Merged
MischaPanch merged 29 commits intomainfrom Jun 16, 2025
Merged
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR aims to address potential deadlock issues by improving timeout handling and thread cancellation, while enhancing logging and refactoring helper functions. Key changes include:
- Introduction of a helper function (create_ls_with_ignored_dirs) to consolidate language server creation with ignored directory support.
- Enhanced logging and cancellation for threads executing with timeouts in the utility module.
- Updates to asynchronous task management and integration with language server requests for better tracking and cancellation.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/multilspy/python/test_retrieval_with_ignored_dirs.py | Refactored fixtures to use a helper for creating the language server with ignored directories |
| src/serena/util/thread.py | Added detailed logging and thread cancellation logic to mitigate potential deadlocks |
| src/serena/util/async_task_manager.py | New async task manager for tracking and cancelling async tasks during timeouts |
| src/serena/mcp.py | Updated help message for tool timeout configuration |
| src/serena/agent.py | Enhanced logging with a unique call identifier for tool execution |
| src/multilspy/language_server.py | Introduced asynchronous logging and cancellation-aware wrappers for symbol tree requests |
Comments suppressed due to low confidence (1)
test/multilspy/python/test_retrieval_with_ignored_dirs.py:44
- The docstring for the fixture still mentions only the 'scripts' directory. Consider updating the comment to accurately reflect all ignored directories (e.g., 'scripts' and 'custom_test_repo').
ignored_paths = ["scripts", "custom_test_repo"]
Needed to make multilspy logs respect our log format
Needed for tests involving ignored patterns
e3d81c1 to
cd10d2f
Compare
…deadlocks ROOT CAUSE: Asyncio event loop contamination between MCP server and SerenaAgent was causing intermittent deadlocks. MCP server runs in its own asyncio loop, but SerenaAgent creates coroutines that leak into the MCP context, leading to unawaited coroutines and eventual event loop blocking. SOLUTION: Complete process isolation with clean architecture Process Isolation Changes: - Add ProcessIsolatedSerenaAgent wrapper that runs SerenaAgent in separate process - Implement JSON-RPC based IPC for communication between MCP server and agent - Add SerenaAgentWorker class to handle requests in the isolated process - Centralize error handling in _make_request_with_result() helper method - Use serena_config.tool_timeout instead of hardcoded timeout parameters Architecture Improvements: - Define ToolProtocol interface for clean tool contracts - Implement ProcessIsolatedTool with complete Tool interface (apply + apply_ex) - Remove hacky make_process_isolated_tool function - use single make_tool path - Eliminate inheritance abuse - ProcessIsolatedTool implements protocol cleanly - Preserve full docstring parsing functionality through process boundary MCP Server Updates: - Update start_mcp_server() to use process isolation by default - Maintain backward compatibility with create_mcp_server_and_agent() - Retrieve tool metadata from isolated process for proper MCP tool creation - Ensure proper process lifecycle management in server lifespan handlers Key Benefits: - Prevents asyncio contamination deadlocks completely - Maintains full docstring parsing and parameter schema generation - Clean protocol-based architecture with no inheritance hacks - Centralized configuration-driven timeout handling - All tools implement consistent apply_ex interface - Process isolation is transparent to MCP clients This fix ensures that MCP symbolic tools no longer create deadlocks
make_tool previously depended on tool instances in the main process, but the tool instances are no longer available there To solve this, the tool interface was adjusted to provide static information which is used in the ProcessIsolatedTool wrapper. I also tried another approach before, where ProcessIsolatedTool would accept tool metadata at init, but this lead to serialization issues that couldn't be easily solved (mcp's FuncMetadata is not serializable, despite being a BaseModel...). Tests for checking that the wrapped tool instance has the same metadata as the wrapper have been added.
# Conflicts: # test/conftest.py
Contributor
Author
|
@opcode81 Finally finished, merging this now since it solves core functionality issues. We can do a post-merge review in pair programming The work on this was probably the most frustrating and time-consuming task that I have ever done. |
baka3k
pushed a commit
to baka3k/serena
that referenced
this pull request
Nov 11, 2025
Fix deadlocks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #153
The issue was due to async event loop contamination, which is essentially a bug in asyncio and happens at random times, causing unrecoverable deadlocks.
It was quasi impossible to reproduce outside of the MCP server (thus impossible to debug), since it's mainly within the MCP server where the event loops cross-contaminated
We had a whole of three asyncio applications running in a single process: the mcp server, the language server and the dashboard app. They all shared one event loop and eventually, after enough requests, things would deadlock.
The solution was to let everything run in its own isolated process. Since Tools and tool metadata needed for FastMCP are not serializable, a small refactoring was needed in the core code. Major new modules were introduced for creating process isolation, in particular a tool wrapper.
A major challenge was to terminate everything correctly. We also have to collect logs from one process and pass it to another.
The current status works. Names are suboptimal, and the solution is likely overly complex. I generated fairly large parts with claude-code (and serena within it, in the periods when it worked). I did have to go through the new code extensively manually, fix a lot of issues and remove a lot of complexity. Still, probably too much complexity remains. I already marked one TODO for later, but it's not very urgent.
Tests for the process isolation and a new marker for them have been added. I tested the dashboard shutdown and logging forwarding extensively manually,