Wire SessionStorage into MCPRemoteProxy for HA support#5237
Open
aron-muon wants to merge 2 commits intostacklok:mainfrom
Open
Wire SessionStorage into MCPRemoteProxy for HA support#5237aron-muon wants to merge 2 commits intostacklok:mainfrom
aron-muon wants to merge 2 commits intostacklok:mainfrom
Conversation
When MCPRemoteProxy runs with multiple replicas behind a load balancer
that doesn't preserve client-IP affinity (e.g. AWS ALB across multiple
AZs), every non-initialize request fails with `Session not found` because
the transparent proxy validates `Mcp-Session-Id` against pod-local
in-memory state on every hop. From transparent_proxy.go:
// Guard: reject non-initialize requests with unknown session IDs.
// When multiple proxyrunner replicas share a Redis session store,
// a valid session will always be found.
The transport layer already supports a Redis-backed session store via
runner.ScalingConfig.SessionRedis — MCPServer and VirtualMCPServer wire
it through. MCPRemoteProxy simply never populated it.
This change ports the symmetric work from MCPServer (PR stacklok#4368) and
VirtualMCPServer (PR stacklok#4367) to MCPRemoteProxy:
- Add MCPRemoteProxySpec.SessionStorage field (same SessionStorageConfig
shape used by MCPServer / VirtualMCPServer)
- populateScalingConfigForRemoteProxy: write the non-sensitive Redis
parameters (address/db/keyPrefix) into runner.ScalingConfig.SessionRedis
- buildRedisPasswordEnvVarForRemoteProxy: inject THV_SESSION_REDIS_PASSWORD
on the proxy Deployment via SecretKeyRef when sessionStorage.passwordRef
is set, so the password never lands in the RunConfig ConfigMap
Tests:
- TestPopulateScalingConfigForRemoteProxy mirrors TestPopulateScalingConfig
from mcpserver_runconfig_test.go (4 cases including a check that the
password never leaks into the serialized SessionRedis)
- TestBuildRedisPasswordEnvVarForRemoteProxy mirrors TestBuildRedisPasswordEnvVar
from virtualmcpserver_deployment_test.go (4 cases covering the matrix
of nil/memory/redis-no-pwd/redis-with-pwd)
Generated:
- zz_generated.deepcopy.go (controller-gen v0.17.3)
- toolhive.stacklok.dev_mcpremoteproxies.yaml CRD schema (controller-gen v0.17.3)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5237 +/- ##
==========================================
+ Coverage 67.86% 67.88% +0.02%
==========================================
Files 610 610
Lines 62522 62550 +28
==========================================
+ Hits 42431 42465 +34
+ Misses 16910 16902 -8
- Partials 3181 3183 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When MCPRemoteProxy runs with multiple replicas behind a load balancer that doesn't preserve client-IP affinity (e.g. AWS ALB across multiple AZs), every non-initialize request fails with
Session not found. The transparent proxy validatesMcp-Session-Idagainst pod-local in-memory state on every hop —transparent_proxy.goeven calls this out:The transport layer already supports a Redis-backed session store via
runner.ScalingConfig.SessionRedis. MCPServer (#4368) and VirtualMCPServer (#4367) both wire it through. MCPRemoteProxy was the only one missing the symmetric work, leaving HA deployments architecturally broken.What changed:
MCPRemoteProxySpec.SessionStoragefield — sameSessionStorageConfigshape used byMCPServerandVirtualMCPServer.populateScalingConfigForRemoteProxy— write the non-sensitive Redis parameters (address/db/keyPrefix) intorunner.ScalingConfig.SessionRedis.buildRedisPasswordEnvVarForRemoteProxy— injectTHV_SESSION_REDIS_PASSWORDon the proxy Deployment viaSecretKeyRefwhensessionStorage.passwordRefis set, so the password never lands in the RunConfig ConfigMap. Mirrors VirtualMCPServer'sbuildRedisPasswordEnvVarexactly.The change is intentionally a near-verbatim mirror of the MCPServer / VirtualMCPServer implementations to keep review easy — it's the same pattern reviewers have already accepted twice.
Type of change
Test plan
go test ./cmd/thv-operator/controllers/...(4 + 4 = 8 new test cases pass; no regressions in existing MCPRemoteProxy suite)go build ./...New tests:
TestPopulateScalingConfigForRemoteProxymirrorsTestPopulateScalingConfigfrommcpserver_runconfig_test.go. 4 cases including a serialization check that verifies the password never leaks into the RunConfig.TestBuildRedisPasswordEnvVarForRemoteProxymirrorsTestBuildRedisPasswordEnvVarfromvirtualmcpserver_deployment_test.go. 4 cases covering nil / memory / redis-no-pwd / redis-with-pwd.API Compatibility
v1beta1API. The added field (spec.sessionStorage) is optional and behaves identically to the existing nil case when omitted.Changes
cmd/thv-operator/api/v1beta1/mcpremoteproxy_types.goSessionStorage *SessionStorageConfigwith kubebuilder docs explaining the transparent_proxy session-validation requirementcmd/thv-operator/api/v1beta1/zz_generated.deepcopy.gocmd/thv-operator/controllers/mcpremoteproxy_runconfig.gopopulateScalingConfigForRemoteProxyhelper, called before runConfig is returnedcmd/thv-operator/controllers/mcpremoteproxy_deployment.gobuildRedisPasswordEnvVarForRemoteProxy+ call site appending it to the proxy envcmd/thv-operator/controllers/mcpremoteproxy_runconfig_test.goTestPopulateScalingConfigForRemoteProxy(4 cases)cmd/thv-operator/controllers/mcpremoteproxy_deployment_test.goTestBuildRedisPasswordEnvVarForRemoteProxy(4 cases)deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpremoteproxies.yamlsessionStoragesubschema in CRDDoes this introduce a user-facing change?
Yes. New optional field
MCPRemoteProxy.spec.sessionStorageenabling Redis-backed shared session state across replicas — required for HA when the upstream load balancer doesn't preserve client-IP affinity. Same shape asMCPServer.spec.sessionStorageso existing operators already know the pattern.🤖 Generated with Claude Code