RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity by JAORMX · Pull Request #55 · stacklok/toolhive-rfcs

JAORMX · 2026-03-12T07:03:38Z

Summary

Introduces MCPRemoteEndpoint, a new CRD that unifies remote MCP server connectivity under a single resource with two explicit modes:

type: proxy — Deploys a proxy pod with full auth middleware, authz policy, and audit logging. Functionally replaces MCPRemoteProxy.
type: direct — No pod deployed; VirtualMCPServer connects directly to the remote URL. Resolves forced-auth on public remotes (#3104) and eliminates unnecessary infrastructure for simple remote backends.

This supersedes THV-0055 (MCPServerEntry CRD).

Motivation

vMCP currently requires MCPRemoteProxy (which spawns thv-proxyrunner pods) to reach remote MCP servers. This creates three problems:

Forced auth on public remotes: MCPRemoteProxy requires OIDC even when vMCP already handles client auth
Dual boundary confusion: externalAuthConfigRef serves two conflicting purposes (vMCP-to-proxy AND proxy-to-remote)
Resource waste: Every remote backend needs a Deployment + Service + Pod just to make an HTTP call

Design Highlights

Two explicit modes: type: proxy (full auth middleware, replaces MCPRemoteProxy) and type: direct (zero pods, zero services)
CEL validation: Mode-specific field constraints enforced at admission time (e.g., OIDC/telemetry config rejected for type: direct)
Validation-only controller for direct mode: No infrastructure created, no remote URL probing
Both discovery modes: Works with static (operator-generated ConfigMap) and dynamic (vMCP runtime discovery) from THV-0014
MCPRemoteProxy deprecated: Migrated to MCPRemoteEndpoint with type: proxy; MCPRemoteProxy receives no new features
Token exchange support: type: direct supports tokenExchange via externalAuthConfigRef for backends requiring OAuth credentials

Related RFCs

THV-0008 (Virtual MCP Server)
THV-0009 (Remote MCP Proxy)
THV-0010 (MCPGroup CRD)
THV-0014 (K8s-Aware vMCP)
THV-0023 (CRD v1alpha2 Optimization — config refs target MCPRemoteEndpoint)
THV-0026 (Header Passthrough)

Test plan

Review RFC structure and completeness
Validate CRD design against existing patterns
Verify mode-specific CEL validation rules
Verify security considerations (auth boundaries, token exchange restrictions)
Confirm MCPRemoteProxy migration path is documented

Introduces a new MCPServerEntry CRD that lets VirtualMCPServer connect directly to remote MCP servers without MCPRemoteProxy infrastructure, resolving the forced-auth (#3104) and dual-boundary confusion (#4109) issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RFC should focus on design intent, not implementation code. Keep YAML/Mermaid examples, replace Go blocks with prose describing controller behavior, discovery logic, and TLS handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implementation details like specific file paths belong in the implementation, not the RFC design document. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yrobla

This is a well-structured RFC with clear motivation, thorough security considerations, and good alternatives analysis. The naming convention (following Istio ServiceEntry) is a nice touch. A few issues worth addressing before implementation:

groupRef type inconsistency (string vs object pattern) needs resolution
caBundleRef resource type (Secret vs ConfigMap) is ambiguous across sections
SSRF mitigation has a gap: HTTPS-only validation doesn't block private IP ranges
Auth resolution timing in ListWorkloadsInGroup() description is unclear
Static mode CA bundle propagation is unspecified
toolConfigRef deferral creates an unacknowledged migration regression for MCPRemoteProxy users

rfcs/THV-0055-mcpserverentry-direct-remote-backends.md

- Clarify groupRef is plain string for consistency with MCPServer/MCPRemoteProxy - Fix Alt 1 YAML example to use string form for groupRef - Change caBundleRef to reference ConfigMap (CA certs are public data) - Add SSRF rationale: CEL IP blocking omitted since internal servers are legitimate - Clarify auth resolution loads config only, token exchange deferred to request time - Specify CA bundle volume mount for static mode (PEM files, not env vars) - Document toolConfigRef migration path via aggregation.tools[].workload Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yrobla

looks as a very good improvement to me

amirejaz

This looks good to me. Added one comment

amirejaz · 2026-03-12T13:20:29Z

rfcs/THV-0055-mcpserverentry-direct-remote-backends.md

+metadata:
+  name: context7
+spec:
+  remoteURL: https://mcp.context7.com/mcp


Nit: could we use a real public MCP server here, such as mcp-spec? context7 is currently being used as the example for both private and public MCP servers, which feels a bit confusing.

amirejaz · 2026-03-12T13:35:08Z

Question: Would it make sense to rename MCPServerEntry to MCPRemoteServerEntry to improve clarity?

jerm-dro

Looks great! I appreciate the alternatives section.

lorr1 · 2026-03-17T19:21:05Z

We need this!! I love it.

reyortiz3 · 2026-03-17T19:51:18Z

rfcs/THV-0055-mcpserverentry-direct-remote-backends.md

@@ -0,0 +1,904 @@
+# RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends


(naming): if it's for remote MCPs only, why not "MCPRemoteServer"? just by the "MCPServerEntry" it wasn't clear to me it was for remote only

ChrisJBurns

Thanks for the detailed RFC — the three problems are real and worth solving, and the security thinking (no operator-side probing, HTTPS enforcement, credential separation) is solid throughout.

The core question this review raises is: does solving these three problems require a new CRD, or can they be addressed by extending what already exists?

After reading the RFC against the actual codebase, the case for a new CRD is weaker than it appears:

Problem #1 (forced OIDC on public remotes) can be fixed by making oidcConfig optional on MCPRemoteProxy — a one-field change (see comment 2).
Problem #2 (dual auth boundary confusion) turns out not to exist in the code — externalAuthConfigRef and oidcConfig already serve separate purposes in pkg/vmcp/workloads/k8s.go (see comment 1).
Problem #3 (pod waste) can be solved by adding a direct: true flag to MCPRemoteProxy that skips ensureDeployment() and ensureService() and uses remoteURL as the backend URL directly — touching ~4 files vs. the RFC's ~20 (see comments 3, 4).

The RFC does consider alternatives but doesn't seriously evaluate the most natural one: extending MCPRemoteProxy itself. Its alternatives section argues against extending MCPServer (fair) and against config-only approaches (fair), but never evaluates a lightweight mode on the resource that already exists for this exact purpose (comment 3).

Beyond the alternatives analysis, there are several concrete concerns: the operational cost of a new CRD is significantly underestimated (comment 4); the new CRD creates naming confusion alongside MCPServer and MCPRemoteProxy (comment 9); the MCPGroup status model would need restructuring for a third backend type (comment 5); the CA bundle volume-mounting introduces new operator complexity (comment 8); and the SSRF trade-off of moving outbound calls into the vMCP pod deserves explicit acknowledgement (comment 7).

Suggested path forward: Before investing in a new CRD, implement the simpler approach — make oidcConfig optional and add direct: true to MCPRemoteProxy. If real-world usage reveals that the RBAC separation story (platform teams vs. product teams managing backends independently) is a genuine need that can't be satisfied by MCPRemoteProxy's existing RBAC surface, revisit the CRD proposal with that as the primary motivation rather than the pod-waste argument.

If the team decides a new CRD is still the right call after considering the above, comments 5–12 cover the design-level issues that should be resolved before implementation begins.

🤖 Review assisted by Claude Code

rfcs/THV-0055-mcpserverentry-direct-remote-backends.md

jhrozek · 2026-03-17T22:22:33Z

+1 to what @ChrisJBurns said, I think we should consider reusing the existing CRDs and be more explicit in why it's not an option. There's also the cognitive load on the users, the documentation load etc..

JAORMX · 2026-03-18T03:50:36Z

+1 to what @ChrisJBurns said, I think we should consider reusing the existing CRDs and be more explicit in why it's not an option. There's also the cognitive load on the users, the documentation load etc..

I actually disagree with this. I think re-using existing CRDs would do us a diservice and further increase technical depth by using APIs that are not fit for purpose because they had very different intentions.

Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>

Address all review feedback from specialized agent review: Critical fixes: - Add 3 CEL validation rules (proxy requires proxyConfig, proxyConfig requires oidcConfig, direct rejects proxyConfig) - Add spec.type immutability guard to prevent orphaned resources - Add Phase 0.5 for MCPRemoteProxy controller refactoring prerequisite - Document StaticBackendConfig Type field gap and KnownFields(true) risk - Document full RFC 8693 token exchange flow with subject token provenance - Document vMCP-to-proxy pod token forwarding with MCP auth spec rationale - Add session constraints for direct mode (single-replica requirement) Significant fixes: - Fix Phase 0 deprecation mechanism (Warning events, not deprecatedversion) - Make MCPGroup status field changes additive (not breaking rename) - Document name collision handling and WorkloadType enum extension - Document field indexer registration requirement - Strengthen SSRF mitigation (NetworkPolicy REQUIRED for IMDS/cluster API) - Add credential blast radius to threat model - Mark SSE as deprecated per MCP spec 2025-11-25 - Add MCP-Protocol-Version header injection requirement - Add reconnection handling section for direct mode - Expand MCPGroup controller changes to 9 explicit code changes - Fix Open Question 4 contradiction with groupRef: Required Documentation and polish: - Add mode selection guide, CRD short names, printer columns - Add allow-insecure annotation specification - Fix THV-0055 broken link, add spec.port to migration table - Add actionable deprecation timeline, standalone use explanation - Add RFC naming convention note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix CEL validation rule placement (struct-level, not field-level) - Add oldSelf null guard to type immutability rule - Correct auth flow: accurately describe dual-consumer externalAuthConfigRef in proxy mode and single-boundary direct mode - Remove incorrect Redis session store claim; single-replica is the only supported constraint for type:direct - Fix reconnection to require full MCP initialization handshake on HTTP 404, with Last-Event-ID resumption attempt before re-init - Add server-initiated notifications section (persistent GET stream) - Restrict embeddedAuthServer and awsSts for type:direct - Enumerate all MCPGroup controller changes including field indexer and RBAC markers - Remove false audience validation claim; add as Phase 2 requirement - Fix broken anchor cross-references - Document HTTPS enforcement as controller-side only - Specify allow-insecure annotation value - Add inline warning to addPlaintextHeaders YAML example - Add CA bundle MITM trust store warning - Add emergency credential rotation guidance - Remove remoteendpoint short name; keep mcpre only Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Remove the HTTPS enforcement requirement and the toolhive.stacklok.dev/allow-insecure annotation section. remoteURL is now simply "URL of the remote MCP server". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jhrozek · 2026-03-20T08:59:48Z