-
-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
RFC: Server Architecture Refactor for Horizontal Scalability and Bottleneck Elimination
TL;DR
This RFC proposes eliminating the single Base server bottleneck by introducing a session-centric architecture with pluggable registry support for both local and distributed deployments.
Problem Statement
Current architecture has several critical bottlenecks identified in production:
- Single Process Bottleneck (Why are all server requests sent to a single process for processing? cloudwalk/hermes-mcp#179): All MCP requests serialize through Base GenServer
- Session Management Overhead ([Server] MCP Sessions in clustered Distributed Erlang cloudwalk/hermes-mcp#141): Centralized session tracking with O(n) complexity
- Clustering Challenges: Sessions don't distribute properly across Erlang clusters
- Blocking Requests: Long-running tools block other requests
Proposed Solution
Session-Centric Architecture
Replace the heavy Base server with:
- Lightweight Router: Stateless request routing
- Session-Based Processing: Sessions handle their own requests directly
- Pluggable Registry: Support ETS for default/local implementation and extensibility for distribution (like horde) or even external storages for sessions (like redis)
- Async Request Processing: Eliminate blocking operations
Key Benefits
- Eliminate bottlenecks: No more single-process serialization
- Horizontal scaling: Sessions distribute across cluster nodes
- Backward compatibility: Zero or minimal number of breaking changes to public API
Architecture Diagrams
Current vs Proposed Request Flow
Current (Bottleneck):
Client → Transport (bottleneck) → Base Server (bottleneck) → Session → Handler
↑ All requests routed here ↑ All requests serialized here
Proposed (Distributed):
Client → Transport → Router → Session → Handler (async)
↑ Lightweight routing only
↑ Direct processing per session
Community Input Needed
- Architecture feedback: Does the session-centric approach make sense?
- Registry design: ETS vs Horde trade-offs and alternatives?
- Migration concerns: Any deployment scenarios we should consider?
- Performance priorities: What metrics matter most to your use case?
- Distribution needs: How do you currently handle clustering/scaling?
Let's discuss! 🚀 so for that im asking for the original issues authors: @byu @feng19
Metadata
Metadata
Assignees
Labels
No labels