RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity#55
RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity#55
Conversation
Introduces a new MCPServerEntry CRD that lets VirtualMCPServer connect directly to remote MCP servers without MCPRemoteProxy infrastructure, resolving the forced-auth (#3104) and dual-boundary confusion (#4109) issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RFC should focus on design intent, not implementation code. Keep YAML/Mermaid examples, replace Go blocks with prose describing controller behavior, discovery logic, and TLS handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implementation details like specific file paths belong in the implementation, not the RFC design document. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yrobla
left a comment
There was a problem hiding this comment.
This is a well-structured RFC with clear motivation, thorough security considerations, and good alternatives analysis. The naming convention (following Istio ServiceEntry) is a nice touch. A few issues worth addressing before implementation:
groupReftype inconsistency (string vs object pattern) needs resolutioncaBundleRefresource type (Secret vs ConfigMap) is ambiguous across sections- SSRF mitigation has a gap: HTTPS-only validation doesn't block private IP ranges
- Auth resolution timing in
ListWorkloadsInGroup()description is unclear - Static mode CA bundle propagation is unspecified
toolConfigRefdeferral creates an unacknowledged migration regression for MCPRemoteProxy users
- Clarify groupRef is plain string for consistency with MCPServer/MCPRemoteProxy - Fix Alt 1 YAML example to use string form for groupRef - Change caBundleRef to reference ConfigMap (CA certs are public data) - Add SSRF rationale: CEL IP blocking omitted since internal servers are legitimate - Clarify auth resolution loads config only, token exchange deferred to request time - Specify CA bundle volume mount for static mode (PEM files, not env vars) - Document toolConfigRef migration path via aggregation.tools[].workload Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yrobla
left a comment
There was a problem hiding this comment.
looks as a very good improvement to me
amirejaz
left a comment
There was a problem hiding this comment.
This looks good to me. Added one comment
| metadata: | ||
| name: context7 | ||
| spec: | ||
| remoteURL: https://mcp.context7.com/mcp |
There was a problem hiding this comment.
Nit: could we use a real public MCP server here, such as mcp-spec? context7 is currently being used as the example for both private and public MCP servers, which feels a bit confusing.
|
Question: Would it make sense to rename |
jerm-dro
left a comment
There was a problem hiding this comment.
Looks great! I appreciate the alternatives section.
|
We need this!! I love it. |
| @@ -0,0 +1,904 @@ | |||
| # RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends | |||
There was a problem hiding this comment.
(naming): if it's for remote MCPs only, why not "MCPRemoteServer"? just by the "MCPServerEntry" it wasn't clear to me it was for remote only
There was a problem hiding this comment.
Thanks for the detailed RFC — the three problems are real and worth solving, and the security thinking (no operator-side probing, HTTPS enforcement, credential separation) is solid throughout.
The core question this review raises is: does solving these three problems require a new CRD, or can they be addressed by extending what already exists?
After reading the RFC against the actual codebase, the case for a new CRD is weaker than it appears:
- Problem #1 (forced OIDC on public remotes) can be fixed by making
oidcConfigoptional onMCPRemoteProxy— a one-field change (see comment 2). - Problem #2 (dual auth boundary confusion) turns out not to exist in the code —
externalAuthConfigRefandoidcConfigalready serve separate purposes inpkg/vmcp/workloads/k8s.go(see comment 1). - Problem #3 (pod waste) can be solved by adding a
direct: trueflag toMCPRemoteProxythat skipsensureDeployment()andensureService()and usesremoteURLas the backend URL directly — touching ~4 files vs. the RFC's ~20 (see comments 3, 4).
The RFC does consider alternatives but doesn't seriously evaluate the most natural one: extending MCPRemoteProxy itself. Its alternatives section argues against extending MCPServer (fair) and against config-only approaches (fair), but never evaluates a lightweight mode on the resource that already exists for this exact purpose (comment 3).
Beyond the alternatives analysis, there are several concrete concerns: the operational cost of a new CRD is significantly underestimated (comment 4); the new CRD creates naming confusion alongside MCPServer and MCPRemoteProxy (comment 9); the MCPGroup status model would need restructuring for a third backend type (comment 5); the CA bundle volume-mounting introduces new operator complexity (comment 8); and the SSRF trade-off of moving outbound calls into the vMCP pod deserves explicit acknowledgement (comment 7).
Suggested path forward: Before investing in a new CRD, implement the simpler approach — make oidcConfig optional and add direct: true to MCPRemoteProxy. If real-world usage reveals that the RBAC separation story (platform teams vs. product teams managing backends independently) is a genuine need that can't be satisfied by MCPRemoteProxy's existing RBAC surface, revisit the CRD proposal with that as the primary motivation rather than the pod-waste argument.
If the team decides a new CRD is still the right call after considering the above, comments 5–12 cover the design-level issues that should be resolved before implementation begins.
🤖 Review assisted by Claude Code
|
+1 to what @ChrisJBurns said, I think we should consider reusing the existing CRDs and be more explicit in why it's not an option. There's also the cognitive load on the users, the documentation load etc.. |
I actually disagree with this. I think re-using existing CRDs would do us a diservice and further increase technical depth by using APIs that are not fit for purpose because they had very different intentions. |
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
c2965b7
Address all review feedback from specialized agent review: Critical fixes: - Add 3 CEL validation rules (proxy requires proxyConfig, proxyConfig requires oidcConfig, direct rejects proxyConfig) - Add spec.type immutability guard to prevent orphaned resources - Add Phase 0.5 for MCPRemoteProxy controller refactoring prerequisite - Document StaticBackendConfig Type field gap and KnownFields(true) risk - Document full RFC 8693 token exchange flow with subject token provenance - Document vMCP-to-proxy pod token forwarding with MCP auth spec rationale - Add session constraints for direct mode (single-replica requirement) Significant fixes: - Fix Phase 0 deprecation mechanism (Warning events, not deprecatedversion) - Make MCPGroup status field changes additive (not breaking rename) - Document name collision handling and WorkloadType enum extension - Document field indexer registration requirement - Strengthen SSRF mitigation (NetworkPolicy REQUIRED for IMDS/cluster API) - Add credential blast radius to threat model - Mark SSE as deprecated per MCP spec 2025-11-25 - Add MCP-Protocol-Version header injection requirement - Add reconnection handling section for direct mode - Expand MCPGroup controller changes to 9 explicit code changes - Fix Open Question 4 contradiction with groupRef: Required Documentation and polish: - Add mode selection guide, CRD short names, printer columns - Add allow-insecure annotation specification - Fix THV-0055 broken link, add spec.port to migration table - Add actionable deprecation timeline, standalone use explanation - Add RFC naming convention note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix CEL validation rule placement (struct-level, not field-level) - Add oldSelf null guard to type immutability rule - Correct auth flow: accurately describe dual-consumer externalAuthConfigRef in proxy mode and single-boundary direct mode - Remove incorrect Redis session store claim; single-replica is the only supported constraint for type:direct - Fix reconnection to require full MCP initialization handshake on HTTP 404, with Last-Event-ID resumption attempt before re-init - Add server-initiated notifications section (persistent GET stream) - Restrict embeddedAuthServer and awsSts for type:direct - Enumerate all MCPGroup controller changes including field indexer and RBAC markers - Remove false audience validation claim; add as Phase 2 requirement - Fix broken anchor cross-references - Document HTTPS enforcement as controller-side only - Specify allow-insecure annotation value - Add inline warning to addPlaintextHeaders YAML example - Add CA bundle MITM trust store warning - Add emergency credential rotation guidance - Remove remoteendpoint short name; keep mcpre only Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Remove the HTTPS enforcement requirement and the toolhive.stacklok.dev/allow-insecure annotation section. remoteURL is now simply "URL of the remote MCP server". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| - **`type: proxy`** — deploys a proxy pod with full auth middleware, authz | ||
| policy, and audit logging. Functionally equivalent to `MCPRemoteProxy` and | ||
| replaces it. | ||
| - **`type: direct`** — no pod deployed; VirtualMCPServer connects directly to |
There was a problem hiding this comment.
this is not just a vMCP concern right? Any client would be able to use these?
There was a problem hiding this comment.
Yeah technically anything inside of the cluster will be able to access that remote endpoint providing it doesn't have any auth on it. Although that would be true anyways regardless if we had the direct MCPRemoteEndpoint or not.
| | Scenario | Recommended Mode | Why | | ||
| |---|---|---| | ||
| | Public, unauthenticated remote (e.g., context7) | `direct` | No auth middleware needed; no pod required | | ||
| | Remote requiring only token exchange auth | `direct` | vMCP handles token exchange; one fewer hop | |
There was a problem hiding this comment.
I think this is misleading. I think we shouldn't single out vmcp auth using token exchange for direct, but rather any vmcp deployments. It would be the same if vmcp provides a static token or drives an oauth consent.
There was a problem hiding this comment.
Good call - generalized this row to "Remote with outgoing auth handled by vMCP (token exchange, header injection, etc.)" so it covers all supported auth types, not just token exchange.
|
|
||
| **Rule of thumb:** Use `direct` for simple, public, or token-exchange-only | ||
| remotes. Use `proxy` when you need an independent auth/authz/audit boundary | ||
| per remote, or when the backend needs to be accessible standalone. |
There was a problem hiding this comment.
should we rather say, use direct for simple, public remotes or remotes fronted by vmcp, use proxy when you need authentication or authorization in standalone mode?
There was a problem hiding this comment.
Agreed, reworded to: "Use direct for simple, public remotes or any remote fronted by vMCP where vMCP handles outgoing auth. Use proxy when you need an independent auth/authz/audit boundary per remote, or when the backend needs to be accessible standalone."
| | Standalone use (without vMCP) | Yes | No | | ||
| | Outgoing auth to remote | Yes (`externalAuthConfigRef`) | Yes (`externalAuthConfigRef`) | | ||
| | Header forwarding | Yes (`headerForward`) | Yes (`headerForward`) | | ||
| | Custom CA bundle | Yes (`caBundleRef`) | Yes (`caBundleRef`) | |
There was a problem hiding this comment.
curious what do we need the custom bundle for in this mode?
| --[externalAuthConfigRef credential]--> Proxy Pod | ||
| [proxy pod oidcConfig validates the incoming request] | ||
| [proxy pod applies externalAuthConfigRef as outgoing middleware] | ||
| --> Remote Server |
There was a problem hiding this comment.
this renders odd, mind asking claude to make it into a mermaid? (not urgent)
There was a problem hiding this comment.
Replaced both ASCII auth flow diagrams with mermaid sequence diagrams.
| server's expected audience claim. | ||
| - Client token lifetime should exceed the expected duration of the exchange | ||
| request. Exchanged tokens are managed by the `golang.org/x/oauth2` token | ||
| source and refreshed automatically on expiry per connection. |
There was a problem hiding this comment.
this is irrelevant to the RFC isn't it?
There was a problem hiding this comment.
Agreed, removed the implementation detail about golang.org/x/oauth2 token source and client token lifetime.
| transport: streamable-http | ||
|
|
||
| # REQUIRED: Group membership. | ||
| groupRef: engineering-team |
There was a problem hiding this comment.
Make groupRef required only for type: direct, optional for type: proxy - matches current MCPRemoteProxy behavior and the RFC's own Mode Selection Guide which lists "Standalone use without VirtualMCPServer → proxy"
| ConfigMap generation time (static mode has no K8s API access at runtime) | ||
| - `vmcp.Backend` needs a `Headers` field for resolved header values |
There was a problem hiding this comment.
I'm not sure I understand this and want to make sure we are not leaking key material into configmaps..
right how we have headerForward.addHeadersFromSecret in the remote proxy which always has a pod and the secrets re mounted as secretKeyRefs and runConfig only has env variable names.
if we had headerForward on the remote endpoint and Headers field wouldn't we be inlining the secrets?
There was a problem hiding this comment.
Valid concern - need to think through the static mode secret handling more carefully to make sure we're not inlining key material into ConfigMaps
There was a problem hiding this comment.
Alright, the original wording implied resolving secrets at ConfigMap generation time which would inline key material into the ConfigMap. Updated the RFC to use the same SecretKeyRef pattern MCPRemoteProxy already uses: the operator adds SecretKeyRef env vars to the vMCP Deployment (e.g.
TOOLHIVE_SECRET_HEADER_FORWARD_X_API_KEY_<ENDPOINT_NAME>), and the backend ConfigMap stores only the env var names - never the values. vMCP resolves them at runtime via the existing secrets.EnvironmentProvider. No secret material touches a ConfigMap.
| `pkg/vmcp/k8s/backend_reconciler.go` tries resources in order: MCPServer → | ||
| MCPRemoteProxy → MCPRemoteEndpoint. Same-name resources across types in the | ||
| same namespace always resolve to the first match. Log a warning when a | ||
| collision is detected. |
There was a problem hiding this comment.
does this fallback mean that if I have a mcpserver named "blah" and I delete it then the next reconcile might happen by accident against mcpremoteproxy "blah" ?
There was a problem hiding this comment.
Good catch - the fallback behaviour is surprising. Changed the design to reject collisions at creation time: the MCPRemoteEndpoint controller will reject creation if an MCPServer or MCPRemoteProxy with the same name exists in the namespace, setting ConfigurationValid=False with reason NameCollision. The existing
controllers will also be updated to reject collisions with MCPRemoteEndpoint
|
|
||
| // MCPRemoteEndpointProxyConfig struct-level rule: | ||
| // | ||
| // +kubebuilder:validation:XValidation:rule="has(self.oidcConfig)",message="spec.proxyConfig.oidcConfig is required" |
There was a problem hiding this comment.
why not just validation required?
There was a problem hiding this comment.
Yep good watch, no reason to use CEL here. Replaced with standard +kubebuilder:validation:Required marker on the field.
- Generalize mode selection guide to cover all auth types, not just token exchange - Reword rule of thumb to use "fronted by vMCP" vs "standalone" framing - Replace ASCII auth flow diagrams with mermaid sequence diagrams - Clarify dual-consumer auth behavior per mode - Remove implementation-level token lifetime detail - Add groupRef rationale inline comment - Replace name collision fallback with admission-time rejection - Replace CEL oidcConfig rule with standard kubebuilder Required marker Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use SecretKeyRef env vars on the vMCP Deployment instead of inlining secret values into the backend ConfigMap. Mirrors the existing pattern from MCPRemoteProxy. ConfigMap stores only env var names, vMCP resolves values at runtime via secrets.EnvironmentProvider. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Introduces
MCPRemoteEndpoint, a new CRD that unifies remote MCP server connectivity under a single resource with two explicit modes:type: proxy— Deploys a proxy pod with full auth middleware, authz policy, and audit logging. Functionally replacesMCPRemoteProxy.type: direct— No pod deployed; VirtualMCPServer connects directly to the remote URL. Resolves forced-auth on public remotes (#3104) and eliminates unnecessary infrastructure for simple remote backends.This supersedes THV-0055 (MCPServerEntry CRD).
Motivation
vMCP currently requires MCPRemoteProxy (which spawns
thv-proxyrunnerpods) to reach remote MCP servers. This creates three problems:externalAuthConfigRefserves two conflicting purposes (vMCP-to-proxy AND proxy-to-remote)Design Highlights
type: proxy(full auth middleware, replaces MCPRemoteProxy) andtype: direct(zero pods, zero services)type: direct)MCPRemoteEndpointwithtype: proxy; MCPRemoteProxy receives no new featurestype: directsupportstokenExchangeviaexternalAuthConfigReffor backends requiring OAuth credentialsRelated RFCs
Test plan