- Status: In Review
- Author(s): Jeremy Drouillard (@jerm-dro)
- Created: 2026-03-04
- Last Updated: 2026-03-07
- Target Repository: toolhive
- Related Issues:
- toolhive#3986 - Enable sticky sessions on operator-created Services
- toolhive#3992 - Add ClusterIP service with SessionAffinity for MCP server backend
- THV-0038 - Session-Scoped Client Lifecycle for vMCP
- toolhive#1589 - Scaling stdio within k8s.
ToolHive's vmcp and thv-proxyrunner components cannot currently be scaled horizontally because both hold session state in process-local memory. This RFC defines an approach to enable safe horizontal scale-out of these components by externalizing session state to a shared Redis store and implementing session-aware routing at each layer.
In Kubernetes mode, ToolHive deploys MCP servers using a two-tier model:
graph TB
Client[MCP Client]
VMCP[vmcp]
subgraph "Logical MCPServer A"
PR_A[proxyrunner A]
STS_A[StatefulSet A]
end
subgraph "Logical MCPServer B"
PR_B[proxyrunner B]
STS_B[StatefulSet B]
end
Client --> VMCP
VMCP --> PR_A
VMCP --> PR_B
PR_A --> STS_A
PR_B --> STS_B
The operator (thv-operator) watches MCPServer and VirtualMCPServer CRDs and reconciles them into Kubernetes resources. For each MCPServer, the operator creates:
- A Deployment running
thv-proxyrunner— which proxies traffic to the MCP server backend - A StatefulSet running the actual MCP server image, created and managed by the proxyrunner via Kubernetes server-side apply
- A Service exposing the proxyrunner to clients (or to vMCP)
All replicas of a proxyrunner Deployment share a single StatefulSet — on startup, every replica independently applies the same StatefulSet spec using server-side apply with a shared field manager (toolhive-container-manager), converging on the same desired state without leader election.
Replica configurability today: Neither MCPServer nor VirtualMCPServer CRDs expose a replicas field. Both the proxyrunner Deployment and the vMCP Deployment are created with a hardcoded replica count of 1. The StatefulSet created by the proxyrunner is also hardcoded to 1 pod. See §2 for the constraints this creates.
The Virtual MCP Server (vmcp) sits above the proxyrunner tier. It presents a unified MCP endpoint to external clients, discovers backends from an MCPGroup, aggregates their capabilities, and routes inbound tool calls to the appropriate backend proxyrunner.
The proxyrunner implements session tracking via pkg/transport/session. Each MCP session is represented as a Session object stored in a Storage backend. The Storage interface was designed from the outset to support pluggable backends:
type Storage interface {
Store(ctx context.Context, session Session) error
Load(ctx context.Context, id string) (Session, error)
Delete(ctx context.Context, id string) error
DeleteExpired(ctx context.Context, before time.Time) error
Close() error
}Today, only LocalStorage (in-process memory map) is implemented. The Storage interface is the extension point this RFC targets.
THV-0038 refactored vMCP's session management to introduce explicit session lifecycle: backend HTTP clients are created once at session initialization, reused for all requests within the session, and closed on expiry. The resulting MultiSession interface owns the routing table for that session (which tool belongs to which backend), as well as live backend connections.
The session.go documentation in the codebase is explicit about the distributed scaling trade-off:
Distributed deployment note: Because MCP clients cannot be serialised, horizontal scaling requires sticky sessions (session affinity at the load balancer). Without sticky sessions, a request routed to a different vMCP instance must recreate backend clients (one-time cost per re-route). This is an accepted trade-off.
A
MultiSessionuses a two-layer storage model:
- Runtime layer (in-process only): backend HTTP connections, routing table, and capability lists. These cannot be serialized and are lost when the process exits. Sessions are therefore node-local.
- Metadata layer (serializable): connected backend IDs, and backend session IDs are written to the embedded
transportsession.Sessionso that pluggabletransportsession.Storagebackends (e.g. Redis) can persist them.
This two-layer design is the key insight for this RFC: we can persist enough metadata to route any request to the correct pod, even if we cannot migrate the full session runtime.
ToolHive already uses Redis as an external storage backend for the embedded auth server's session and token state (see MCPExternalAuthConfig.storage.redis). This establishes Redis as a proven dependency in the ToolHive Kubernetes ecosystem and provides a reference for how to configure and connect to Redis from operator-managed pods.
Two recent PRs implement a short-term mitigation:
- #3986 sets
SessionAffinity: ClientIPon all operator-created Services (forMCPServer,MCPRemoteProxy, andVirtualMCPServer). This causes kube-proxy to consistently route traffic from the same client IP to the same pod. - #3992 adds a dedicated ClusterIP Service (with
SessionAffinity: ClientIP) for the MCP server StatefulSet backend, so the proxyrunner's connections to the backend are also sticky.
Client IP affinity reduces — but does not eliminate — session breakage. It fails when:
- Multiple clients share an IP (NAT, corporate proxy, load balancer)
- A pod is replaced (rolling update, crash recovery) and kube-proxy routes to a new pod
- The operator scales out and the new pod becomes the affinity target for existing clients
- vMCP itself is deployed behind a load balancer that masks client IPs
This approach is a useful stopgap but is not a foundation for intentional horizontal scaling.
MCP servers that use stdio transport are inherently stateful: the MCP protocol conversation is a single long-lived stdin/stdout stream between the proxyrunner and the container. This state cannot be shared or transferred between proxyrunner instances — the stream lives or dies with the process.
A stdio backend couples itself to a specific proxyrunner process: the proxyrunner is the only process attached to the MCP server container's stdin/stdout, and that attachment is exclusive and non-transferable. This coupling is why proxyrunners cannot be made fully fungible (stateless, interchangeable replicas where any replica can handle any request) as long as stdio transport is supported. Removing or isolating stdio support would be a prerequisite for a fully fungible proxyrunner design; that is a larger architectural change out of scope for this RFC.
Even for SSE and streamable-http transports, where the backend MCP server speaks HTTP, individual backend connections carry session-specific negotiated state (e.g., the Mcp-Session-Id assigned by the backend server and known only to the proxyrunner that initialized the session).
This is a structural constraint of the MCP protocol, not a ToolHive implementation choice, and it shapes the solution described in this RFC.
The fundamental problem is that all requests within an MCP session must be handled by the same process, at every layer of the stack. Today, with single-replica deployments at each layer, this is automatic. With multiple replicas, it is not. In reality, the only hard constraint is that the underlying MCPServer backend receives all requests for its initialized sessions.
When vmcp runs with more than one replica, an inbound request carrying an Mcp-Session-Id may be routed by the Kubernetes Service to any vMCP pod. The pod that receives it may not have the session in its local Storage, which means:
- It cannot look up the routing table (which tool → which proxyrunner)
- It cannot reuse the backend HTTP clients associated with the session
- It would have to re-initialize the session from scratch — a destructive operation that creates entirely new backend sessions, discards all in-progress state, and requires the client to restart its workflow.
This applies equally to SSE and streamable-http sessions.
The VirtualMCPServer CRD has no replicas field. The operator creates the vMCP Deployment with a hardcoded replica count of 1 and there is no declarative way to change it. vMCP therefore runs as a single pod today, and the session affinity problem described in §2.1 is not yet encountered in practice — but it will be as soon as operators need to scale vMCP for availability or load.
The MCPServer CRD has no replicas field. The operator creates the proxyrunner Deployment with a hardcoded replica count of 1. The reconciler enforces this: attempting to scale the Deployment (e.g., via kubectl scale) is overwritten by the next reconcile cycle. There is therefore no supported path to run multiple proxyrunner replicas for a single MCPServer today, regardless of transport.
The proxyrunner hardcodes the StatefulSet to 1 pod; there is no CRD field to configure it. A user can attach an HPA directly to the StatefulSet outside of the operator's control, and it will create multiple pods — but without session-aware routing, requests are distributed across pods that do not share session state, producing 400 Bad Request: No valid session ID provided errors. This failure mode was directly observed: a user's HPA experiment on an MCP server StatefulSet caused replicas to scale to three pods, and the error rate spiked immediately. This triggered the client-IP affinity mitigations in §1.3 and motivates the long-term solution in this RFC.
The correct scaling unit for the backend is the proxyrunner+StatefulSet pair (§3.1), not the StatefulSet alone.
SSE and streamable-http share the same class of horizontal scalability problem:
| Transport | Session Carrier | Affected Layers |
|---|---|---|
stdio |
Process stdin/stdout (unshareable) | Proxyrunner (cannot scale) |
sse |
Mcp-Session-Id header / SSE connection |
vMCP + proxyrunner |
streamable-http |
Mcp-Session-Id header |
vMCP + proxyrunner |
For stdio, each proxyrunner holds an exclusive stdin/stdout attachment to a single MCP server container. While it is technically possible to run multiple proxyrunner replicas each attached to their own container, this does not help: stdio servers do not support concurrent sessions within a single process, so each container handles exactly one session at a time. Horizontal scaling is about increasing concurrency; stdio is fundamentally not designed for it. Scaling stdio-backed servers for higher session concurrency would require a different approach (e.g., a pool of single-session containers assigned per client), which is out of scope for this RFC.
For SSE and streamable-http, the session exists as a logical identifier (Mcp-Session-Id) that can be tracked in external storage. Routing by session ID is possible if the right metadata is externalized.
- CRD replica fields: Add explicit replica configuration to both CRDs so users can declaratively set the desired scale without bypassing the operator:
VirtualMCPServer.spec.replicas— number of vMCP Deployment pods.MCPServer.spec.replicas— number of proxyrunner Deployment pods (capped at 1 forstdiotransport).MCPServer.spec.backendReplicas— number of pods in the shared MCP server StatefulSet. There is exactly one StatefulSet perMCPServer, shared by all proxyrunner replicas (see §3.2).
- Operator reconciler changes: The operator must respect and preserve the new replica fields rather than hardcoding 1. The reconciler must stop overwriting manually-set replica counts.
- Horizontal scale-out of
vmcp: Multiple vMCP replicas should be able to serve any request, regardless of which replica initialized the session. vMCP reads session routing metadata from shared storage to determine which sub-sessions to use for each underlyingMCPServer. - Horizontal scale-out of
thv-proxyrunner: A singleMCPServeris backed by multiple proxyrunner replicas sharing one StatefulSet. Session metadata in shared storage allows any replica to look up which backend pod a given session belongs to. - Transport coverage:
SSEandstreamable-httptransports at both layers. - Manual scale-out without session disruption: Changing replica counts in either CRD must not disrupt existing sessions. New requests may be routed to new replicas; existing sessions continue to route via the pod that initialized them.
- Safe vMCP scale-in: When a vMCP replica is removed, sessions previously handled by it can be served by remaining replicas using session metadata from Redis. vMCP scale-in is safe because the session runtime can be reconstructed from persisted metadata. Another in-scope dimension of scale-in safety is allowing a grace period for inflight requests on finishing pods to be completed.
- Proxyrunner scale-in (non-
stdio): Forsseandstreamable-httptransports, removing a proxyrunner replica is an in-scope operation. Sessions on the removed pod are re-hydrated on other pods when they receive a request.stdio-transport proxyrunner scale-in is out of scope (the stdin/stdout attachment cannot be transferred. Another in-scope dimension of scale-in safety is allowing a grace period for inflight requests on finishing pods to be completed. - Enabling future auto-scaling: The session storage mechanism is the prerequisite for HPAs and KEDA-based auto-scaling. This RFC does not define auto-scaling policy, but the design must not preclude it.
stdiotransport scaling: The proxyrunner's attachment to the MCP container's stdin/stdout is inherently single-process. Horizontal scaling of stdio-backed servers requires re-initializing the container session and is out of scope for this RFC.- Smart Routing at vMCP: We've previously discussed having vMCP route requests to the same underlying proxyrunner that's previously serviced the session. While this may be useful long term to reduce Redis load and reduce latency, it's not strictly necessary in the short term. vMCP will continue to send requests through k8s Service associated with the proxyrunner.
- One StatefulSet per MCPServer (shared): The design maintains exactly one StatefulSet per
MCPServer. All proxyrunner replicas for a givenMCPServershare that single StatefulSet, converging on the same spec via server-side apply. Each proxyrunner replica routes its sessions to specific pods within the shared StatefulSet. Note: a 1:1 ratio (one StatefulSet per proxyrunner replica) could be a future direction if enablingstdiohorizontal scaling were desired, since each proxyrunner would then have its own dedicated backend. - Moving MCP server deployment out of the proxyrunner: The proxyrunner remains responsible for creating, managing, and proxying to the MCP server StatefulSet. Changing this responsibility boundary (e.g., having vMCP manage backends directly) is desirable long-term but is more work and out of scope. Discussed in detail here.
- Auto-scaling policy: How to trigger scale-out (HPA metrics, KEDA event sources, custom metrics) is deferred to a follow-on RFC. This RFC makes auto-scaling possible; it does not specify when or how to do it.
- Backend StatefulSet scale-in: Removing pods from the
MCPServerStatefulSet (reducingspec.backendReplicas) is always disruptive — the backend session state lives in the removed process and cannot be reconstructed. Graceful drain of backend pods is out of scope for this RFC. - Session Hijack Prevention at MCPServer: This capability will continue to work within vMCP, but it will not also be added to MCPServer.
| Component | CRD Field | Scale Out | Scale In |
|---|---|---|---|
vMCP (VirtualMCPServer) |
spec.replicas |
No disruption — new replicas serve new sessions; existing sessions rerouted via Redis | No disruption — remaining replicas pick up sessions from Redis metadata |
proxyrunner (MCPServer, non-stdio) |
spec.replicas |
No disruption — new replicas share the existing StatefulSet and handle new sessions; existing sessions remain pinned to their originating replica | No disruption - Sessions on the removed pods can be restored on other pods. stdio scale-in not supported. |
backend (MCPServer StatefulSet) |
spec.backendReplicas |
New pods available for new sessions; proxyrunner routes sessions to specific pods via Redis | Disruptive — backend session state on the removed pod is lost and cannot be reconstructed |
The solution externalizes session metadata to a shared Redis store at each layer, and introduces session-aware routing logic so that any replica can handle a request by locating the pod that owns the session.
graph TB
Client[MCP Client]
Redis[("Redis<br/>(shared session store)")]
subgraph vmcpDeployment ["vMCP Deployment (N replicas)"]
VMCP[vmcp]
end
subgraph mcpserverA ["MCPServer A"]
PRA["proxyrunner<br/>(M replicas, shared)"]
STSA["StatefulSet<br/>(K pods, one per MCPServer)"]
end
subgraph mcpserverB ["MCPServer B"]
PRB["proxyrunner<br/>(M replicas, shared)"]
STSB["StatefulSet<br/>(K pods, one per MCPServer)"]
end
Client --> VMCP
VMCP <-->|session metadata| Redis
VMCP -->|route by session| PRA
VMCP -->|route by session| PRB
PRA <-->|session metadata| Redis
PRB <-->|session metadata| Redis
PRA --> STSA
PRB --> STSB
Two categories of session metadata are externalized. Neither contains sensitive data; both contain only identifiers and pod addressing information needed for routing.
Written by the vMCP pod when a session is initialized (on initialize request). Used by any vMCP replica to enable session reuse across vMCPs.
A vMCP session spans multiple backends (one per MCPServer in the MCPGroup). The record therefore carries a backends array — one entry per backend connection — alongside top-level identity metadata.
Key: vmcp:session:{mcp-session-id}
TTL: Configurable (default: matches vMCP session TTL)
Value: {
"session_id": "...", // Mcp-Session-Id assigned to client
// additional fields for session hijack prevention are not pictured.
"created_at": "...",
"updated_at": "...",
"backends": [
{
"backend_id": "...", // MCPServer workload ID
"backend_session_id": "..." // Mcp-Session-Id assigned by the backend server
}
]
}
Written by the proxyrunner pod when it initializes a backend session. Serves one purposes: maps a session ID to the backend pod that owns it, enabling routing on a cache miss;
Key: proxyrunner:{mcpserver-name}:session:{session-id}
TTL: Configurable (default: matches proxyrunner session TTL)
Value: {
"session_id": "...", // session ID (Mcp-Session-Id)
"backend_pod": "...", // which StatefulSet pod hosts this session
"backend_url": "http://...", // pod-DNS URL of the backend pod
"created_at": "...",
"updated_at": "..."
}
A vMCP session spans multiple backends; each backend has its own underlying connection and backend_session_id. When vMCP receives a request, it first identifies the target backend from the routing table (which tool/resource/prompt → which MCPServer), then sends the request with Mcp-Session-Id=$backend_session_id to the k8s service for the MCPServer.
- No
Mcp-Session-Id(new session /initialize): Initialize connections to all backends in theMCPGroup. Record the session in Redis with abackendsarray containing each backend's proxyrunner pod URL and assignedbackend_session_id. Proceed normally. - Known
Mcp-Session-Id(session in local storage): Reuse the initialized clients with sub-session IDs already bound. - Unknown
Mcp-Session-Id(not in local storage, found in Redis): Reconstruct the session runtime from thebackendsarray in Redis. Then route as in case 2. This is a one-time cost on first contact after re-route. - Unknown
Mcp-Session-Id(not in Redis): Return400 Bad Request— client must re-initialize.
Cases 3 handles the "request hits wrong vMCP pod" scenario without re-initializing the session from the client's perspective.
The proxyrunner's sole routing concern is: given a session ID, which backend pod handles it? vMCP is responsible for ensuring each request has a valid session id. With shared storage:
- Known session (exists in local storage): Route to the backend pod (existing behavior).
- Unknown session (not in local storage): Look up in Redis.
- Found: Route to the
backend_urlfrom the Redis record. - Not found: Return
400 Bad Request.
- Found: Route to the
Redis is an opt-in external dependency. When not configured, both vMCP and proxyrunner fall back to local in-memory storage (current behavior), with single-replica semantics.
Configuration is consistent with the existing pattern from THV-0035 (auth server Redis storage):
For vMCP (VirtualMCPServer CRD):
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: VirtualMCPServer
spec:
replicas: 3 # new field: vMCP Deployment pod count
sessionStorage:
provider: redis
redis:
address: "redis:6379"
db: 0
keyPrefix: "vmcp:session:"
passwordRef:
name: redis-auth
key: passwordFor proxyrunner (MCPServer CRD):
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
spec:
replicas: 3 # new field: proxyrunner Deployment pod count (capped at 1 for stdio)
backendReplicas: 2 # new field: StatefulSet pod count (one StatefulSet per MCPServer)
sessionStorage:
provider: redis
redis:
address: "redis:6379"
db: 0
keyPrefix: "proxyrunner:"
passwordRef:
name: redis-auth
key: passwordWhen provider is omitted or set to memory, existing local-storage behavior is preserved.
The following requirements define the success criteria for this RFC.
- R-VMCP-1: Any incoming MCP request can be handled by any vMCP pod. vMCP reads session metadata from the shared session store to locate the correct sub-session ID and uses that in the request to the downstream proxyrunner.
- R-VMCP-2: vMCP writes session metadata to the shared session store when a new session is initialized. The record includes the
backendsarray with each backend's workload ID and backend-assigned session ID. - R-VMCP-3: vMCP session metadata TTL in Redis must match or exceed the vMCP session TTL. Redis entries are refreshed on session activity.
- R-VMCP-4: When Redis is not configured, vMCP operates with local in-memory storage and single-replica semantics (no behavioral regression).
- R-VMCP-5: Adding vMCP replicas must not disrupt existing sessions.
- R-VMCP-6: vMCP sends requests to the k8s service for the MCPServer. Assuming SSE or Streamable HTTP, the session should remain usable regardless which proxyrunner receives the request.
- R-VMCP-7: vMCP has a limit on the number of in-memory sessions. Once hit, LRU caching removes the least active session from the in-memory state.
- R-PR-1: The proxyrunner routes all requests within a session to the backend pod that initialized the session.
- R-PR-2: The proxyrunner writes session metadata (session ID, backend pod name, backend pod URL) to the shared session store when a new backend session is initialized.
- R-PR-3: The proxyrunner reads session metadata from the shared session store on a cache miss (session not in local memory) and routes accordingly.
- R-PR-4: The number of backend StatefulSet replicas (MCPServer pods) per proxyrunner is configurable in the
MCPServerCRD spec. The proxyrunner uses session-aware routing to distribute sessions across its backends. - R-PR-5: Multiple proxyrunner replicas serve a single
MCPServer. vMCP is responsible for routing each session's requests to the correct proxyrunner replica. - R-PR-6: When Redis is not configured, the proxyrunner operates with local in-memory storage and single-replica semantics (no behavioral regression).
- R-PR-7: proxyrunner has a limit on the number of in-memory sessions. Once hit, LRU caching removes the least active session from the in-memory state.
- R-OP-1: When session storage is configured on a
VirtualMCPServerorMCPServer, the operator injects the Redis connection configuration into the vMCP or proxyrunner pods (credentials from Secrets, address/db from CRD spec). - R-OP-2: Scaling out a
VirtualMCPServeror proxyrunnerDeploymentreplica count must not require changes to other resources (no cascading operator reconciliation for scale events). - R-OP-3: The operator adds explicit replica fields to both CRDs:
VirtualMCPServer.spec.replicas(vMCP pod count),MCPServer.spec.replicas(proxyrunner pod count, capped at 1 forstdiotransport), andMCPServer.spec.backendReplicas(StatefulSet pod count). The reconciler must respect these fields and must not overwrite them. There is exactly one StatefulSet perMCPServer;spec.backendReplicascontrols the pod count within that StatefulSet. - R-OP-4: if a user decides not to specify any replicas in the CRDs, whatever they set it to manually using kubectl scale --replicas=3 rs/foo or even if they use something like HPA, the Operator should not overwrite that field as we are not explicitly telling it to reconcile it.
- R-DEP-1: A single Redis instance (or Redis Sentinel / Cluster configuration) can be shared between vMCP session storage and proxyrunner session storage, as long as key prefixes are distinct.
- R-DEP-2: Redis is an optional dependency. ToolHive must remain deployable without Redis, with single-replica semantics.
- R-DEP-3: Manual scale-out of vMCP or proxyrunner (e.g.,
kubectl scale deployment) must not cause session disruption for active sessions. - R-DEP-4: Scale-in of backend StatefulSet pods (
spec.backendReplicasdecrease) is inherently disruptive — the backend session state lives in the removed process and cannot be reconstructed on another pod. Documentation must clearly state this. Graceful drain of backend pods is out of scope for this RFC. Proxyrunner scale-in (non-stdio) is an accepted, in-scope operation where no session loss is the expected outcome. - R-DEP-5: scaling in either the vMCP or the proxyrunner should not cause requests to fail. There should be a graceful shutdown of the finishing pods so inflight requests can be completed.
- R-DEP-6: Redis must eventually garbage collect unused sessions. A long TTL that is refreshed when sessions are used is a viable solution here.
- R-SEC-1: vMCP's session hijack prevention mechanism continues to function. A session initialized on one vMCP pod cannot be hijacked by a request hitting another vMCP pod.
This section catalogs the concrete code changes needed to implement the design described above, organized by component and mapped back to the requirements in §5.
File: cmd/thv-operator/api/v1alpha1/mcpserver_types.go
Add three new fields to MCPServerSpec:
Replicas *int32— proxyrunner Deployment pod count. When nil, the operator omitsspec.replicasfrom the Deployment, allowing external scaling (kubectl scale, HPA) to set the value freely. The reconciler caps this at 1 forstdiotransport.BackendReplicas *int32— StatefulSet pod count for the MCP server backend. When nil, omitted from the StatefulSet spec (same defer-to-external behavior). Passed to the proxyrunner via RunConfig so it can apply the value during server-side apply.SessionStorage *SessionStorageConfig— optional Redis connection configuration for externalizing session state. Uses a new shared type (SessionStorageConfig) withprovider(memory|redis) and aredissub-struct containingaddress,db,keyPrefix, andpasswordRef(SecretKeyRef).
Current state: No replica or session storage fields exist. Replicas are hardcoded to 1 in deploymentForMCPServer(). The ReadyReplicas status field already exists.
Satisfies: R-OP-3, R-PR-6, R-DEP-2
File: cmd/thv-operator/api/v1alpha1/virtualmcpserver_types.go
Add two new fields to VirtualMCPServerSpec:
Replicas *int32— vMCP Deployment pod count. When nil, the operator omitsspec.replicasfrom the Deployment, allowing external scaling to set the value freely.SessionStorage *SessionStorageConfig— same shared type as in RC-1. Optional Redis connection configuration for externalizing vMCP session state.
Current state: No replica or session storage fields exist. Replicas are hardcoded to 1 in deploymentForVirtualMCPServer().
Satisfies: R-OP-3, R-VMCP-4, R-DEP-2
File: cmd/thv-operator/controllers/mcpserver_controller.go
Two changes in deploymentForMCPServer() and the reconcile loop:
- Deployment creation: When
spec.replicasis non-nil, use that value forSpec.Replicasinstead of hardcodedint32(1). When nil, omitSpec.Replicasfrom the Deployment spec entirely (Kubernetes defaults to 1, but external tooling like HPA can set it freely without being overwritten). - Stdio cap enforcement: The existing logic (~line 392) that caps
stdioDeployments to 1 replica remains. Ifspec.replicas > 1andspec.transport == "stdio", the reconciler should reject or cap it with a status condition explaining why. - BackendReplicas passthrough: Add
spec.backendReplicasto the proxyrunner's RunConfig (serialized into the ConfigMap). The proxyrunner reads this value and uses it in its server-side apply of the StatefulSet instead of hardcoded 1. - Session storage validation: If
spec.replicas > 1andspec.sessionStorageis not configured (or set tomemory), the reconciler should reject the configuration with a status condition warning that multi-replica deployments require external session storage to avoid silent session failures.
Current state: deploymentForMCPServer() hardcodes replicas := int32(1). The update path already preserves Spec.Replicas (does not overwrite on reconcile). Stdio cap enforcement already exists. RunConfig has no backendReplicas field.
Satisfies: R-OP-3, R-OP-4, R-DEP-3
File: cmd/thv-operator/controllers/virtualmcpserver_deployment.go
- Deployment creation: When
spec.replicasis non-nil, use that value forSpec.Replicasinstead of hardcodedint32(1). When nil, omitSpec.Replicasentirely. - The update path already preserves
Spec.Replicas(does not overwrite on reconcile) — no change needed there. - Session storage validation: If
spec.replicas > 1andspec.sessionStorageis not configured (or set tomemory), the reconciler should reject the configuration with a status condition warning that multi-replica deployments require external session storage to avoid silent session failures.
Current state: deploymentForVirtualMCPServer() hardcodes replicas := int32(1).
Satisfies: R-OP-3, R-OP-4, R-VMCP-5
Files: cmd/thv-operator/controllers/mcpserver_controller.go, cmd/thv-operator/controllers/virtualmcpserver_deployment.go
When spec.sessionStorage is configured with provider: redis, the operator must inject the Redis connection details into the pod:
- Address, DB, keyPrefix: Added to the RunConfig (proxyrunner) or vmcp config ConfigMap (vMCP) so they are available at startup.
- Password: Injected as an environment variable sourced from the
passwordRefSecretKeyRef (e.g.,THV_SESSION_REDIS_PASSWORD). This follows the established pattern used for OIDC client secrets and token exchange secrets.
When sessionStorage is not set or provider is memory, no Redis-related config is injected.
Current state: No session storage config injection exists. The env-var-from-SecretKeyRef pattern is already used for OIDC and auth server credentials. A comment in buildEnvVarsForVmcp explicitly notes this extension point: "Other secrets (Redis passwords, service account credentials) may be added here in the future."
Satisfies: R-OP-1, R-DEP-1, R-DEP-2
File: New file pkg/transport/session/storage_redis.go
Implement the Storage interface backed by Redis:
Store: Serialize the session to JSON (using the existingserializeSessionfunction already prepared inserialization.go) and write to Redis with the configured key prefix and TTL. EachStorecall sets (or resets) the key's TTL, so actively-used sessions are kept alive.Load: Read from Redis, deserialize viadeserializeSession, and return theSession. ReturnsErrSessionNotFoundwhen the key does not exist.Delete: Remove the key from Redis.DeleteExpired: No-op — Redis TTLs handle expiry natively.Close: Close the Redis client connection.
The constructor accepts the Redis connection config (address, password, DB, key prefix) and returns a Storage interface. The serialization layer persists only the metadata fields (ID, type, timestamps, metadata map) — not runtime state like live HTTP connections or message channels. This is consistent with the two-layer model described in the RFC: the metadata layer is serializable; the runtime layer is node-local.
TTL refresh for garbage collection: Today, session.Manager.Get() calls Touch() on the session, which updates UpdatedAt. After touching, the manager must call Storage.Store() to persist the updated timestamp back to Redis — this resets the key's TTL. Sessions that are never accessed will naturally expire via Redis TTL, providing garbage collection without an explicit cleanup loop. The TTL value should match the session TTL configured on the manager.
Current state: Only LocalStorage (in-process sync.Map) exists. serialization.go contains serializeSession/deserializeSession functions marked as unused, explicitly tagged for Phase 4 Redis/Valkey support.
Satisfies: R-PR-2, R-PR-3, R-PR-6, R-VMCP-2, R-VMCP-4, R-DEP-6
File: pkg/transport/session/manager.go
Add a constructor or factory function that selects the Storage backend based on configuration:
- When Redis config is provided: create
RedisStorage(from RC-6) and pass it toNewManagerWithStorage. - When no Redis config is provided: use
NewLocalStorage()(existing default behavior viaNewManager).
NewManagerWithStorage(ttl, factory, storage) already exists as the extension point — this change wires it into the startup path for both vMCP and proxyrunner based on the injected session storage configuration.
Current state: NewManager(ttl, factory) always creates NewLocalStorage(). NewManagerWithStorage exists but is never called in production code.
Satisfies: R-PR-6, R-VMCP-4, R-DEP-2
File: pkg/vmcp/session/factory.go
In defaultMultiSessionFactory.makeSession(), after backend initialization and routing table construction, the factory already writes metadata keys (MetadataKeyBackendIDs, MetadataKeyTokenHash, MetadataKeyTokenSalt) to the embedded transportsession.Session. This metadata is what gets persisted when the session.Manager calls Storage.Store().
The change: ensure the backends array described in §4.2 is fully represented in the session metadata. Specifically, each backend's workload_id and backend_session_id must be written as serializable metadata so that Redis contains enough information for another vMCP pod to reconstruct the session routing (RC-9).
Today, MetadataKeyBackendIDs stores connected backend IDs, and backendSessions (workload ID → backend session ID) is held in the defaultMultiSession struct but is not written to the metadata map. The backend_session_id values must be added to the metadata layer.
Current state: backendSessions map[string]string is a field on defaultMultiSession but is not persisted to the metadata map. Only MetadataKeyBackendIDs (a comma-separated list of connected workload IDs) is written.
Satisfies: R-VMCP-2, R-VMCP-3
Files: pkg/vmcp/server/sessionmanager/session_manager.go, pkg/vmcp/session/factory.go
When a vMCP pod receives a request with an Mcp-Session-Id that is not in its local session.Manager, it must attempt to load the session from Redis and reconstruct the runtime layer. This is case 3 from §4.3 (vMCP routing).
Add a RestoreSession method to the MultiSessionFactory interface (implemented on defaultMultiSessionFactory). Unlike MakeSession which performs full initialization (sends initialize to all backends), RestoreSession accepts the persisted session metadata and reconstructs the runtime layer:
session.Manager.Get(id)returns not-found from local storage.- Load the session record from Redis via the
Storageinterface. The record contains thebackendsarray (workload IDs and backend session IDs) plus hijack-prevention metadata (token hash, token salt). - Call
factory.RestoreSession(ctx, sessionID, metadata)which:- Re-creates backend HTTP connections by calling the
backendConnectorfor each backend in the array, passing the existingbackend_session_idso the proxyrunner knows this is a resumed session (not a newinitialize). - Rebuilds the routing table from the newly-fetched backend capabilities.
- Re-applies the
HijackPreventionDecoratorusing the persisted token hash/salt.
- Re-creates backend HTTP connections by calling the
- Store the reconstructed
MultiSessionin the localsession.Managerfor subsequent requests.
This is a one-time cost per session per pod — after reconstruction, subsequent requests use the local in-memory session.
Current state: session.Manager.Get() only checks local storage. There is no fallback to Redis or any reconstruction logic. The factory's MakeSession always performs full initialization (sends initialize to all backends). No RestoreSession method exists.
Satisfies: R-VMCP-1, R-VMCP-5, R-VMCP-6
File: pkg/vmcp/server/sessionmanager/session_manager.go
Add a configurable upper bound on the number of MultiSession objects held in local memory. When the limit is reached, the least-recently-used session is evicted from the local session.Manager (its backend connections are closed). The session metadata remains in Redis, so a future request for the evicted session triggers reconstruction via factory.RestoreSession (RC-9).
This prevents unbounded memory growth when a vMCP pod handles many sessions over time — especially important when sessions are reconstructed from Redis after scale-in events route traffic from removed pods.
Current state: No eviction policy exists. All sessions remain in memory until TTL expiry.
Satisfies: R-VMCP-7
Files: pkg/vmcp/session/default_session.go, pkg/vmcp/server/sessionmanager/session_manager.go
When an individual backend session within a MultiSession expires or is lost (e.g., the proxyrunner returns an error indicating the backend session is no longer valid, or the backend connection is closed), the vMCP must update the session's backends array in Redis to reflect the change. Without this, another vMCP pod reconstructing the session via factory.RestoreSession (RC-9) would attempt to reconnect to a backend session that no longer exists, causing unnecessary errors and latency.
This means backend removal/expiry must trigger a write-back of the updated metadata to the session.Manager (which persists to Redis via Storage.Store()). The backendSessions map and MetadataKeyBackendIDs must both be updated atomically with the local state change.
Current state: Backend session failures are handled locally (the connection is removed or the call returns an error), but no metadata is updated. Since sessions are only held in-memory today, there is no external state to synchronize.
Satisfies: R-VMCP-1, R-VMCP-2
File: pkg/container/kubernetes/client.go
In DeployWorkload(), the StatefulSet is created via server-side apply with WithReplicas(1) hardcoded. Change this to read the backendReplicas value from the RunConfig (passed through from the CRD via RC-3). When the value is nil/absent, omit WithReplicas from the apply configuration so that the proxyrunner does not own the replicas field — allowing external tooling (HPA, kubectl scale) to manage it without being overwritten on the next server-side apply cycle.
All proxyrunner replicas for a given MCPServer share the same StatefulSet and use the same field manager (toolhive-container-manager), so they converge on the same desired replica count without conflict.
Current state: WithReplicas(1) is hardcoded. The proxyrunner has no mechanism to receive a desired backend replica count.
Satisfies: R-OP-3 (backend replicas), R-PR-4
Files: pkg/transport/proxy/streamable/streamable_proxy.go, pkg/transport/proxy/httpsse/http_proxy.go, pkg/transport/proxy/transparent/transparent_proxy.go
Today, each proxy type forwards all requests to a single targetURI (the headless service DNS name). With multiple StatefulSet pods, the proxy must route to the specific pod that owns the session.
Changes:
- On session initialization (
initializerequest or first SSE connection): The proxyrunner selects a backend pod (e.g., round-robin or least-connections across StatefulSet pods) and records the mappingsession_id → backend_pod + backend_urlin Redis (per the record format in §4.2). Thebackend_urluses the StatefulSet pod's stable DNS name (e.g.,http://<name>-<ordinal>.<headless-svc>.<ns>.svc.cluster.local:<port>). - On subsequent requests: Look up the session in local memory first. On cache miss, query Redis for the
backend_url. Route the request to that specific pod URL instead of the headless service. - On session not found (neither local nor Redis): Return
400 Bad Request.
This requires the proxy layer to support per-request target URL override rather than always using the fixed targetURI. The transparent proxy (httputil.ReverseProxy) can achieve this by setting the Director function to rewrite the target per-request based on session lookup. The SSE and streamable proxies need analogous routing logic.
Current state: All three proxy types route to a single targetURI set at startup. No per-session routing exists. Session IDs are tracked only for client correlation (SSE) or request/response matching (streamable), not for backend pod selection.
Satisfies: R-PR-1, R-PR-2, R-PR-3, R-PR-4
Files: pkg/transport/proxy/streamable/streamable_proxy.go, pkg/transport/proxy/httpsse/http_proxy.go
Same pattern as RC-10 but for the proxyrunner layer. Add a configurable upper bound on in-memory sessions. When the limit is reached, the least-recently-used session is evicted from local memory. The session metadata remains in Redis, so a subsequent request for the evicted session triggers a Redis lookup (RC-12) and routes to the correct backend pod.
Current state: No eviction policy exists. All sessions remain in memory until TTL expiry.
Satisfies: R-PR-7
Files: pkg/vmcp/server/server.go, cmd/thv-proxyrunner/main.go (or equivalent entrypoint), operator Deployment specs
When a pod is terminated (scale-in, rolling update), in-flight requests must be allowed to complete rather than being dropped. Two changes:
- SIGTERM handling: Both vMCP and proxyrunner must handle SIGTERM by stopping acceptance of new requests and draining in-flight requests before exiting. For vMCP, the
AdmissionQueuealready providesCloseAndDrain()which blocks until in-flight operations complete — this mechanism needs to be wired into the shutdown signal handler for each active session. For the proxyrunner, a similar drain of active proxy connections is needed. terminationGracePeriodSeconds: The operator should set an appropriate value on the pod spec for both vMCP and proxyrunner Deployments (default Kubernetes value is 30s, which may need to be increased for long-running MCP tool calls).
Current state: No explicit graceful shutdown handling for scale-in scenarios. The AdmissionQueue.CloseAndDrain() exists on defaultMultiSession but is only called during explicit session Close(), not during pod-level shutdown.
Satisfies: R-DEP-5
File: pkg/vmcp/session/factory.go, pkg/vmcp/session/hijack_prevention_decorator.go
The HijackPreventionDecorator binds a session to a token via HMAC-SHA256(secret, salt || token). The boundTokenHash and tokenSalt are already written to the session metadata map (MetadataKeyTokenHash, MetadataKeyTokenSalt), so they are persisted to Redis via RC-6/RC-8.
When factory.RestoreSession (RC-9) reconstructs a session on a different vMCP pod, it must:
- Read
tokenHashandtokenSaltfrom the persisted metadata. - Re-apply the
HijackPreventionDecoratorwith those values so that the same token-binding validation runs on the new pod. - The HMAC secret must be consistent across all vMCP replicas. Today, the secret is passed to the factory via
WithHMACSecret. All replicas of the sameVirtualMCPServerDeployment share the same pod spec and therefore the same secret — no additional change is needed for secret distribution.
Current state: Token hash and salt are already persisted in session metadata. The decorator is applied at session creation but there is no path to re-apply it during session reconstruction. The HMAC secret is already consistent across replicas (same Deployment spec).
Satisfies: R-SEC-1