From 5e55c0990397ef3433ac812572b7544e8a413ad1 Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Thu, 12 Mar 2026 09:03:11 +0200 Subject: [PATCH 01/15] RFC: MCPServerEntry CRD for direct remote MCP server backends Introduces a new MCPServerEntry CRD that lets VirtualMCPServer connect directly to remote MCP servers without MCPRemoteProxy infrastructure, resolving the forced-auth (#3104) and dual-boundary confusion (#4109) issues. Co-Authored-By: Claude Opus 4.6 --- ...X-mcpserverentry-direct-remote-backends.md | 1173 +++++++++++++++++ 1 file changed, 1173 insertions(+) create mode 100644 rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md diff --git a/rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md b/rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md new file mode 100644 index 0000000..7b061f1 --- /dev/null +++ b/rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md @@ -0,0 +1,1173 @@ +# RFC-XXXX: MCPServerEntry CRD for Direct Remote MCP Server Backends + +- **Status**: Draft +- **Author(s)**: Juan Antonio Osorio (@jaosorior) +- **Created**: 2026-03-12 +- **Last Updated**: 2026-03-12 +- **Target Repository**: toolhive +- **Related Issues**: [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104), [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) + +## Summary + +Introduce a new `MCPServerEntry` CRD (short name: `mcpentry`) that allows +VirtualMCPServer to connect directly to remote MCP servers without deploying +MCPRemoteProxy infrastructure. MCPServerEntry is a lightweight, pod-less +configuration resource that declares a remote MCP endpoint and belongs to an +MCPGroup, enabling vMCP to reach remote servers with a single auth boundary +and zero additional pods. + +## Problem Statement + +vMCP currently relies on MCPRemoteProxy (which spawns `thv-proxyrunner` pods) +to reach remote MCP servers. This architecture creates three concrete problems: + +### 1. Forced Authentication on Public Remotes (Issue #3104) + +MCPRemoteProxy requires OIDC authentication configuration even when vMCP +already handles client authentication at its own boundary. This blocks +unauthenticated public remote MCP servers (e.g., context7, public API +gateways) from being placed behind vMCP without configuring unnecessary +auth on the proxy layer. + +### 2. Dual Auth Boundary Confusion (Issue #4109) + +MCPRemoteProxy's single `externalAuthConfigRef` field is used for both the +vMCP-to-proxy boundary AND the proxy-to-remote boundary. When vMCP needs +to authenticate to the remote server through the proxy, token exchange +becomes circular or broken because the same auth config serves two +conflicting purposes: + +``` +Client -> vMCP [boundary 1: client auth] + -> MCPRemoteProxy [boundary 2: vMCP auth + remote auth on SAME config] + -> Remote Server +``` + +The operator cannot express "use auth X for the proxy and auth Y for the +remote" because there is only one `externalAuthConfigRef`. + +### 3. Resource Waste + +Every remote MCP server behind vMCP requires a full Deployment + Service + +Pod just to make an HTTP call that vMCP could make directly. For +organizations with many remote MCP backends, this creates unnecessary +infrastructure cost and operational overhead. + +### Who Is Affected + +- **Platform teams** deploying vMCP with remote MCP backends in Kubernetes +- **Product teams** wanting to register external MCP services behind vMCP +- **Organizations** running public or unauthenticated remote MCP servers + behind vMCP for aggregation + +## Goals + +- Enable vMCP to connect directly to remote MCP servers without + MCPRemoteProxy in the path +- Eliminate the dual auth boundary confusion by providing a single, + unambiguous auth config for the vMCP-to-remote boundary +- Allow unauthenticated remote MCP servers behind vMCP without workarounds +- Deploy zero additional infrastructure (no pods, services, or deployments) + for remote backend declarations +- Follow existing Kubernetes patterns (groupRef, externalAuthConfigRef) + consistent with MCPServer + +## Non-Goals + +- **Deprecating MCPRemoteProxy**: MCPRemoteProxy remains valuable for + standalone proxy use cases with its own auth middleware, audit logging, + and observability. MCPServerEntry is specifically for "behind vMCP" use + cases. +- **Adding health probing from the operator**: The operator controller + should NOT probe remote URLs. Reachability from the operator pod does not + imply reachability from the vMCP pod, and probing expands the operator's + attack surface. Health checking belongs in vMCP's existing runtime + infrastructure (`healthCheckInterval`, circuit breaker). +- **Cross-namespace references**: MCPServerEntry follows the same + namespace-scoped patterns as other ToolHive CRDs. +- **Supporting stdio or container-based transports**: MCPServerEntry is + exclusively for remote HTTP-based MCP servers. +- **CLI mode support**: MCPServerEntry is a Kubernetes-only CRD. CLI mode + already supports remote backends via direct configuration. + +## Proposed Solution + +### High-Level Design + +Introduce a new `MCPServerEntry` CRD that acts as a catalog entry for a +remote MCP endpoint. The naming follows the Istio `ServiceEntry` pattern, +communicating "this is a catalog entry, not an active workload." + +```mermaid +graph TB + subgraph "Client Layer" + Client[MCP Client] + end + + subgraph "Virtual MCP Server" + InAuth[Incoming Auth
Validates: aud=vmcp] + Router[Request Router] + AuthMgr[Backend Auth Manager] + end + + subgraph "Backend Layer (In-Cluster)" + MCPServer1[MCPServer: github-mcp
Pod + Service] + MCPServer2[MCPServer: jira-mcp
Pod + Service] + end + + subgraph "Backend Layer (Remote)" + Entry1[MCPServerEntry: context7
No pods - config only] + Entry2[MCPServerEntry: salesforce
No pods - config only] + end + + subgraph "External Services" + Remote1[context7.com/mcp] + Remote2[mcp.salesforce.com] + end + + Client -->|Token: aud=vmcp| InAuth + InAuth --> Router + Router --> AuthMgr + + AuthMgr -->|In-cluster call| MCPServer1 + AuthMgr -->|In-cluster call| MCPServer2 + AuthMgr -->|Direct HTTPS
+ externalAuthConfig| Remote1 + AuthMgr -->|Direct HTTPS
+ externalAuthConfig| Remote2 + + Entry1 -.->|Declares endpoint| Remote1 + Entry2 -.->|Declares endpoint| Remote2 + + style Entry1 fill:#fff3e0,stroke:#ff9800 + style Entry2 fill:#fff3e0,stroke:#ff9800 + style MCPServer1 fill:#e3f2fd,stroke:#2196f3 + style MCPServer2 fill:#e3f2fd,stroke:#2196f3 +``` + +The key insight is that MCPServerEntry deploys **no infrastructure**. It is +pure configuration that tells vMCP "there is a remote MCP server at this +URL, use this auth to reach it." VirtualMCPServer discovers MCPServerEntry +resources the same way it discovers MCPServer resources: via `groupRef`. + +### Auth Flow Comparison + +**Current (with MCPRemoteProxy) - Two boundaries, one config:** + +``` +Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] + -> MCPRemoteProxy [deploys pod] + externalAuthConfigRef used for BOTH: + - vMCP-to-proxy auth (boundary 2a) + - proxy-to-remote auth (boundary 2b) + -> Remote Server +``` + +**Proposed (with MCPServerEntry) - One clean boundary:** + +``` +Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] + -> MCPServerEntry: vMCP applies externalAuthConfigRef directly + -> Remote Server + (ONE boundary, ONE auth config, no confusion) +``` + +```mermaid +sequenceDiagram + participant Client + participant vMCP as Virtual MCP Server + participant IDP as Identity Provider + participant Remote as Remote MCP Server + + Client->>vMCP: MCP Request
Authorization: Bearer token (aud=vmcp) + + Note over vMCP: Validate incoming token
(existing auth middleware) + + Note over vMCP: Look up MCPServerEntry
for target backend + + alt externalAuthConfigRef is set + vMCP->>IDP: Token exchange request
(per MCPExternalAuthConfig) + IDP-->>vMCP: Exchanged token (aud=remote-api) + vMCP->>Remote: Forward request
Authorization: Bearer exchanged-token + else No auth configured (public remote) + vMCP->>Remote: Forward request
(no Authorization header) + end + + Remote-->>vMCP: MCP Response + vMCP-->>Client: Response +``` + +### Detailed Design + +#### MCPServerEntry CRD + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServerEntry +metadata: + name: context7 + namespace: default +spec: + # REQUIRED: URL of the remote MCP server + remoteURL: https://mcp.context7.com/mcp + + # REQUIRED: Transport protocol + # +kubebuilder:validation:Enum=streamable-http;sse + transport: streamable-http + + # REQUIRED: Group membership (unlike MCPServer where it's optional) + # An MCPServerEntry without a group is dead config - it cannot be + # discovered by any VirtualMCPServer. + groupRef: engineering-team + + # OPTIONAL: Auth configuration for reaching the remote server. + # Omit entirely for unauthenticated public remotes (resolves #3104). + # Single unambiguous purpose: auth to the remote (resolves #4109). + externalAuthConfigRef: + name: salesforce-auth + + # OPTIONAL: Header forwarding configuration. + # Reuses existing pattern from MCPRemoteProxy (THV-0026). + headerForward: + addPlaintextHeaders: + X-Tenant-ID: "tenant-123" + addHeadersFromSecrets: + - headerName: X-API-Key + valueSecretRef: + name: remote-api-credentials + key: api-key + + # OPTIONAL: Custom CA bundle for private remote servers using + # internal/self-signed certificates. + caBundleRef: + name: internal-ca-bundle + key: ca.crt +``` + +**Example: Unauthenticated public remote (resolves #3104):** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServerEntry +metadata: + name: context7 +spec: + remoteURL: https://mcp.context7.com/mcp + transport: streamable-http + groupRef: engineering-team + # No externalAuthConfigRef - public endpoint, no auth needed +``` + +**Example: Authenticated remote with token exchange:** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServerEntry +metadata: + name: salesforce-mcp +spec: + remoteURL: https://mcp.salesforce.com + transport: streamable-http + groupRef: engineering-team + externalAuthConfigRef: + name: salesforce-token-exchange +--- +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPExternalAuthConfig +metadata: + name: salesforce-token-exchange +spec: + type: tokenExchange + tokenExchange: + tokenUrl: https://keycloak.company.com/realms/myrealm/protocol/openid-connect/token + clientId: salesforce-exchange + clientSecretRef: + name: salesforce-oauth + key: client-secret + audience: mcp.salesforce.com + scopes: ["mcp:read", "mcp:write"] +``` + +**Example: Remote with static header auth:** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPServerEntry +metadata: + name: internal-api-mcp +spec: + remoteURL: https://internal-mcp.corp.example.com/mcp + transport: sse + groupRef: engineering-team + headerForward: + addHeadersFromSecrets: + - headerName: Authorization + valueSecretRef: + name: internal-api-token + key: bearer-token + caBundleRef: + name: corp-ca-bundle + key: ca.crt +``` + +#### CRD Type Definitions + +```go +// MCPServerEntry declares a remote MCP server endpoint as a backend for +// VirtualMCPServer. Unlike MCPServer (which deploys container workloads) +// or MCPRemoteProxy (which deploys proxy pods), MCPServerEntry is a +// pure configuration resource that deploys no infrastructure. +// +// +kubebuilder:object:root=true +// +kubebuilder:subresource:status +// +kubebuilder:resource:shortName=mcpentry +// +kubebuilder:printcolumn:name="URL",type=string,JSONPath=`.spec.remoteURL` +// +kubebuilder:printcolumn:name="Transport",type=string,JSONPath=`.spec.transport` +// +kubebuilder:printcolumn:name="Group",type=string,JSONPath=`.spec.groupRef` +// +kubebuilder:printcolumn:name="Ready",type=string,JSONPath=`.status.conditions[?(@.type=="Ready")].status` +// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp` +type MCPServerEntry struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` + + Spec MCPServerEntrySpec `json:"spec,omitempty"` + Status MCPServerEntryStatus `json:"status,omitempty"` +} + +type MCPServerEntrySpec struct { + // RemoteURL is the URL of the remote MCP server. + // Must use HTTPS unless the toolhive.stacklok.dev/allow-insecure + // annotation is set to "true" (for development only). + // +kubebuilder:validation:Required + // +kubebuilder:validation:Pattern=`^https?://` + RemoteURL string `json:"remoteURL"` + + // Transport specifies the MCP transport protocol. + // +kubebuilder:validation:Required + // +kubebuilder:validation:Enum=streamable-http;sse + Transport string `json:"transport"` + + // GroupRef is the name of the MCPGroup this entry belongs to. + // Required because an MCPServerEntry without a group cannot be + // discovered by any VirtualMCPServer. + // +kubebuilder:validation:Required + // +kubebuilder:validation:MinLength=1 + GroupRef string `json:"groupRef"` + + // ExternalAuthConfigRef references an MCPExternalAuthConfig in the + // same namespace for authenticating to the remote server. + // Omit for unauthenticated public endpoints. + // +optional + ExternalAuthConfigRef *ExternalAuthConfigRef `json:"externalAuthConfigRef,omitempty"` + + // HeaderForward configures additional headers to inject into + // requests forwarded to the remote server. + // +optional + HeaderForward *HeaderForwardConfig `json:"headerForward,omitempty"` + + // CABundleRef references a ConfigMap or Secret containing a custom + // CA certificate bundle for TLS verification of the remote server. + // Useful for remote servers with private/internal CA certificates. + // +optional + CABundleRef *SecretKeyRef `json:"caBundleRef,omitempty"` +} + +type MCPServerEntryStatus struct { + // Conditions represent the latest available observations of the + // MCPServerEntry's state. + // +optional + Conditions []metav1.Condition `json:"conditions,omitempty"` + + // ObservedGeneration is the most recent generation observed. + // +optional + ObservedGeneration int64 `json:"observedGeneration,omitempty"` +} +``` + +**Condition types:** + +| Type | Purpose | When Set | +|------|---------|----------| +| `Ready` | Overall readiness | Always | +| `GroupRefValid` | Referenced MCPGroup exists | Always | +| `AuthConfigValid` | Referenced MCPExternalAuthConfig exists | Only when `externalAuthConfigRef` is set | +| `CABundleValid` | Referenced CA bundle exists | Only when `caBundleRef` is set | + +There is intentionally **no `RemoteReachable` condition**. The controller +should NOT probe remote URLs because: + +1. Reachability from the operator pod does not imply reachability from the + vMCP pod (different network policies, egress rules, DNS resolution). +2. Probing external URLs from the operator expands its attack surface and + requires egress network access it may not have. +3. It gives false confidence: a probe succeeding now doesn't mean it will + succeed when vMCP makes the actual request. +4. vMCP already has health checking infrastructure (`healthCheckInterval`, + circuit breaker) that operates at the right layer. + +#### Status Example + +```yaml +status: + conditions: + - type: Ready + status: "True" + reason: ValidationSucceeded + message: "MCPServerEntry is valid and ready for discovery" + lastTransitionTime: "2026-03-12T10:00:00Z" + - type: GroupRefValid + status: "True" + reason: GroupExists + message: "MCPGroup 'engineering-team' exists" + lastTransitionTime: "2026-03-12T10:00:00Z" + - type: AuthConfigValid + status: "True" + reason: AuthConfigExists + message: "MCPExternalAuthConfig 'salesforce-auth' exists" + lastTransitionTime: "2026-03-12T10:00:00Z" + observedGeneration: 1 +``` + +#### Component Changes + +##### Operator: New CRD and Controller + +**New files:** +- `cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go` - CRD type + definitions +- `cmd/thv-operator/controllers/mcpserverentry_controller.go` - + Validation-only controller + +The MCPServerEntry controller is intentionally simple. It performs +**validation only** and creates **no infrastructure**: + +```go +func (r *MCPServerEntryReconciler) Reconcile( + ctx context.Context, req ctrl.Request, +) (ctrl.Result, error) { + var entry mcpv1alpha1.MCPServerEntry + if err := r.Get(ctx, req.NamespacedName, &entry); err != nil { + return ctrl.Result{}, client.IgnoreNotFound(err) + } + + statusManager := NewStatusManager(&entry) + + // Validate groupRef exists + var group mcpv1alpha1.MCPGroup + if err := r.Get(ctx, client.ObjectKey{ + Namespace: entry.Namespace, + Name: entry.Spec.GroupRef, + }, &group); err != nil { + if apierrors.IsNotFound(err) { + statusManager.SetCondition("GroupRefValid", "GroupNotFound", + fmt.Sprintf("MCPGroup %q not found", entry.Spec.GroupRef), + metav1.ConditionFalse) + statusManager.SetCondition("Ready", "ValidationFailed", + "Referenced MCPGroup does not exist", metav1.ConditionFalse) + return r.updateStatus(ctx, &entry, statusManager) + } + return ctrl.Result{}, err + } + statusManager.SetCondition("GroupRefValid", "GroupExists", + fmt.Sprintf("MCPGroup %q exists", entry.Spec.GroupRef), + metav1.ConditionTrue) + + // Validate externalAuthConfigRef if set + if entry.Spec.ExternalAuthConfigRef != nil { + var authConfig mcpv1alpha1.MCPExternalAuthConfig + if err := r.Get(ctx, client.ObjectKey{ + Namespace: entry.Namespace, + Name: entry.Spec.ExternalAuthConfigRef.Name, + }, &authConfig); err != nil { + if apierrors.IsNotFound(err) { + statusManager.SetCondition("AuthConfigValid", + "AuthConfigNotFound", + fmt.Sprintf("MCPExternalAuthConfig %q not found", + entry.Spec.ExternalAuthConfigRef.Name), + metav1.ConditionFalse) + statusManager.SetCondition("Ready", "ValidationFailed", + "Referenced auth config does not exist", + metav1.ConditionFalse) + return r.updateStatus(ctx, &entry, statusManager) + } + return ctrl.Result{}, err + } + statusManager.SetCondition("AuthConfigValid", "AuthConfigExists", + fmt.Sprintf("MCPExternalAuthConfig %q exists", + entry.Spec.ExternalAuthConfigRef.Name), + metav1.ConditionTrue) + } + + // Validate HTTPS requirement + if !strings.HasPrefix(entry.Spec.RemoteURL, "https://") { + if entry.Annotations["toolhive.stacklok.dev/allow-insecure"] != "true" { + statusManager.SetCondition("Ready", "InsecureURL", + "remoteURL must use HTTPS (set annotation "+ + "toolhive.stacklok.dev/allow-insecure=true to override)", + metav1.ConditionFalse) + return r.updateStatus(ctx, &entry, statusManager) + } + } + + statusManager.SetCondition("Ready", "ValidationSucceeded", + "MCPServerEntry is valid and ready for discovery", + metav1.ConditionTrue) + + return r.updateStatus(ctx, &entry, statusManager) +} + +func (r *MCPServerEntryReconciler) SetupWithManager( + mgr ctrl.Manager, +) error { + return ctrl.NewControllerManagedBy(mgr). + For(&mcpv1alpha1.MCPServerEntry{}). + Watches(&mcpv1alpha1.MCPGroup{}, + handler.EnqueueRequestsFromMapFunc( + r.findEntriesForGroup, + )). + Watches(&mcpv1alpha1.MCPExternalAuthConfig{}, + handler.EnqueueRequestsFromMapFunc( + r.findEntriesForAuthConfig, + )). + Complete(r) +} +``` + +No finalizers are needed because MCPServerEntry creates no infrastructure +to clean up. + +##### Operator: MCPGroup Controller Update + +The MCPGroup controller must be updated to watch MCPServerEntry resources +in addition to MCPServer resources, so that `status.servers` and +`status.serverCount` reflect both types of backends in the group. + +**Files to modify:** +- `cmd/thv-operator/controllers/mcpgroup_controller.go` - Add watch for + MCPServerEntry, update status aggregation + +##### Operator: VirtualMCPServer Controller Update + +**Static mode (`outgoingAuth.source: inline`):** The operator generates +the ConfigMap that vMCP reads at startup. This ConfigMap must now include +MCPServerEntry backends alongside MCPServer backends. + +The controller discovers MCPServerEntry resources in the group and +serializes them as remote backend entries in the ConfigMap: + +```yaml +# Generated ConfigMap content +backends: + # From MCPServer resources (existing) + - name: github-mcp + url: http://github-mcp.default.svc:8080 + transport: sse + type: container + auth: + type: token_exchange + # ... + + # From MCPServerEntry resources (new) + - name: context7 + url: https://mcp.context7.com/mcp + transport: streamable-http + type: entry # New backend type + # No auth - public endpoint + + - name: salesforce-mcp + url: https://mcp.salesforce.com + transport: streamable-http + type: entry + auth: + type: token_exchange + # ... +``` + +**Files to modify:** +- `cmd/thv-operator/controllers/virtualmcpserver_controller.go` - Discover + MCPServerEntry resources in group +- `cmd/thv-operator/controllers/virtualmcpserver_vmcpconfig.go` - Include + entry backends in ConfigMap generation + +##### vMCP: Backend Type and Discovery + +**New backend type:** + +```go +// In pkg/vmcp/types.go +const ( + BackendTypeContainer BackendType = "container" + BackendTypeProxy BackendType = "proxy" + BackendTypeEntry BackendType = "entry" // New +) +``` + +**Discovery updates:** + +```go +// In pkg/vmcp/workloads/k8s.go +func (m *K8sWorkloadManager) ListWorkloadsInGroup( + ctx context.Context, groupName string, +) ([]Backend, error) { + var backends []Backend + + // Existing: discover MCPServer resources + mcpServers, err := m.discoverMCPServers(ctx, groupName) + if err != nil { + return nil, fmt.Errorf("discovering MCPServers: %w", err) + } + backends = append(backends, mcpServers...) + + // New: discover MCPServerEntry resources + entries, err := m.discoverMCPServerEntries(ctx, groupName) + if err != nil { + return nil, fmt.Errorf("discovering MCPServerEntries: %w", err) + } + backends = append(backends, entries...) + + return backends, nil +} + +func (m *K8sWorkloadManager) discoverMCPServerEntries( + ctx context.Context, groupName string, +) ([]Backend, error) { + var entryList mcpv1alpha1.MCPServerEntryList + if err := m.client.List(ctx, &entryList, + client.InNamespace(m.namespace), + client.MatchingFields{"spec.groupRef": groupName}, + ); err != nil { + return nil, err + } + + var backends []Backend + for _, entry := range entryList.Items { + backend := Backend{ + ID: fmt.Sprintf("%s/%s", entry.Namespace, entry.Name), + Name: entry.Name, + BaseURL: entry.Spec.RemoteURL, + Transport: entry.Spec.Transport, + Type: BackendTypeEntry, + } + + // Resolve auth if configured + if entry.Spec.ExternalAuthConfigRef != nil { + authConfig, err := m.resolveAuthConfig(ctx, + entry.Namespace, + entry.Spec.ExternalAuthConfigRef.Name, + ) + if err != nil { + return nil, fmt.Errorf( + "resolving auth for entry %s: %w", + entry.Name, err, + ) + } + backend.AuthConfig = authConfig + } + + // Resolve header forward config if set + if entry.Spec.HeaderForward != nil { + backend.HeaderForward = m.resolveHeaderForward( + ctx, entry.Namespace, entry.Spec.HeaderForward, + ) + } + + // Resolve CA bundle if set + if entry.Spec.CABundleRef != nil { + caBundle, err := m.resolveCABundle(ctx, + entry.Namespace, entry.Spec.CABundleRef, + ) + if err != nil { + return nil, fmt.Errorf( + "resolving CA bundle for entry %s: %w", + entry.Name, err, + ) + } + backend.CABundle = caBundle + } + + backends = append(backends, backend) + } + + return backends, nil +} +``` + +##### vMCP: HTTP Client for External TLS + +Backends of type `entry` connect to external URLs over HTTPS. The vMCP +HTTP client must be updated to: + +1. Use the system CA certificate pool by default (for public CAs). +2. Optionally append a custom CA bundle from `caBundleRef` (for private + CAs). +3. Apply the resolved `externalAuthConfigRef` credentials directly to + outgoing requests. + +```go +// In pkg/vmcp/client/client.go +func (c *Client) createTransportForEntry( + backend *Backend, +) (*http.Transport, error) { + tlsConfig := &tls.Config{ + MinVersion: tls.VersionTLS12, + } + + if backend.CABundle != nil { + pool, err := x509.SystemCertPool() + if err != nil { + pool = x509.NewCertPool() + } + if !pool.AppendCertsFromPEM(backend.CABundle) { + return nil, fmt.Errorf("failed to parse CA bundle") + } + tlsConfig.RootCAs = pool + } + + return &http.Transport{ + TLSClientConfig: tlsConfig, + }, nil +} +``` + +##### vMCP: Dynamic Mode Reconciler Update + +For dynamic mode (`outgoingAuth.source: discovered`), the reconciler +infrastructure from THV-0014 must be extended to watch MCPServerEntry +resources. + +**Files to modify:** +- `pkg/vmcp/k8s/manager.go` - Register MCPServerEntry watcher +- `pkg/vmcp/k8s/mcpserverentry_watcher.go` (new) - MCPServerEntry + reconciler + +```go +type MCPServerEntryWatcher struct { + client client.Client + registry vmcp.DynamicRegistry + groupRef string +} + +func (w *MCPServerEntryWatcher) Reconcile( + ctx context.Context, req ctrl.Request, +) (ctrl.Result, error) { + backendID := req.NamespacedName.String() + + var entry mcpv1alpha1.MCPServerEntry + if err := w.client.Get(ctx, req.NamespacedName, &entry); err != nil { + if apierrors.IsNotFound(err) { + w.registry.Remove(backendID) + return ctrl.Result{}, nil + } + return ctrl.Result{}, err + } + + if entry.Spec.GroupRef != w.groupRef { + // Not in our group, remove if previously tracked + w.registry.Remove(backendID) + return ctrl.Result{}, nil + } + + backend, err := w.convertToBackend(ctx, &entry) + if err != nil { + return ctrl.Result{}, err + } + backend.ID = backendID + + if err := w.registry.Upsert(backend); err != nil { + return ctrl.Result{}, err + } + + return ctrl.Result{}, nil +} + +func (w *MCPServerEntryWatcher) SetupWithManager( + mgr ctrl.Manager, +) error { + return ctrl.NewControllerManagedBy(mgr). + For(&mcpv1alpha1.MCPServerEntry{}). + Watches(&mcpv1alpha1.MCPExternalAuthConfig{}, + handler.EnqueueRequestsFromMapFunc( + w.findEntriesForAuthConfig, + )). + Watches(&corev1.Secret{}, + handler.EnqueueRequestsFromMapFunc( + w.findEntriesForSecret, + )). + Complete(w) +} +``` + +##### vMCP: Static Config Parser Update + +The static config parser must be updated to deserialize `type: entry` +backends from the ConfigMap and create appropriate HTTP clients with +external TLS support. + +**Files to modify:** +- `pkg/vmcp/config/` - Parse entry-type backends from static config + +## Security Considerations + +### Threat Model + +| Threat | Description | Mitigation | +|--------|-------------|------------| +| Man-in-the-middle on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | +| Credential exposure in CRD spec | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets, referenced via `externalAuthConfigRef` and `headerForward.addHeadersFromSecrets`; never inline in CRD spec | +| SSRF via remoteURL | Operator configures URL pointing to internal services | Mitigated by RBAC (only authorized users create MCPServerEntry); annotation required for non-HTTPS; NetworkPolicy should restrict vMCP egress | +| Auth config confusion (existing issue) | Dual-boundary auth leading to wrong tokens sent to wrong endpoints | Eliminated: MCPServerEntry has exactly one auth boundary with one purpose | +| Operator probing external URLs | Controller making network requests to untrusted URLs | Eliminated: controller performs validation only, no network probing | + +### Authentication and Authorization + +- **No new auth primitives**: MCPServerEntry reuses the existing + `MCPExternalAuthConfig` CRD and `externalAuthConfigRef` pattern. +- **Single boundary**: vMCP's incoming auth validates client tokens. + MCPServerEntry's `externalAuthConfigRef` handles outgoing auth to + the remote. These are cleanly separated. +- **RBAC**: Standard Kubernetes RBAC controls who can create/modify + MCPServerEntry resources. This enables fine-grained access: platform + teams manage VirtualMCPServer, product teams register MCPServerEntry + backends. +- **No privilege escalation**: MCPServerEntry grants no additional + permissions beyond what the referenced MCPExternalAuthConfig already + provides. + +### Data Security + +- **In transit**: HTTPS required for remote connections (with annotation + escape hatch for development). +- **At rest**: No sensitive data stored in MCPServerEntry spec. Auth + credentials are in K8s Secrets, referenced indirectly. +- **CA bundles**: Custom CA certificates referenced via `caBundleRef`, + stored in K8s Secrets/ConfigMaps with standard K8s encryption at rest. + +### Input Validation + +- **remoteURL**: Must match `^https?://` pattern. HTTPS enforced unless + annotation override. Validated by both CRD CEL rules and controller + reconciliation. +- **transport**: Enum validation (`streamable-http` or `sse`). +- **groupRef**: Required, validated to reference an existing MCPGroup. +- **externalAuthConfigRef**: When set, validated to reference an existing + MCPExternalAuthConfig. +- **headerForward**: Uses the same restricted header blocklist and + validation as MCPRemoteProxy (THV-0026). + +### Secrets Management + +- MCPServerEntry follows the same secret access patterns as MCPServer: + - **Dynamic mode**: vMCP reads secrets at runtime via K8s API + (namespace-scoped RBAC). + - **Static mode**: Operator mounts secrets as environment variables. +- Secret rotation follows existing patterns: + - **Dynamic mode**: Watch-based propagation, no pod restart needed. + - **Static mode**: Requires pod restart (Deployment rollout). + +### Audit and Logging + +- vMCP's existing audit middleware logs all requests routed to + MCPServerEntry backends, including user identity and target tool. +- The operator controller logs validation results (group existence, + auth config existence) at standard log levels. +- No sensitive data (URLs with credentials, auth tokens) is logged. + +### Mitigations + +1. **HTTPS enforcement**: Default requires HTTPS; annotation override + requires explicit operator action. +2. **No network probing**: Controller never connects to remote URLs. +3. **Single auth boundary**: Eliminates dual-boundary confusion. +4. **Existing patterns**: Reuses battle-tested secret access, RBAC, + and auth patterns from MCPServer. +5. **NetworkPolicy recommendation**: Documentation recommends restricting + vMCP pod egress to known remote endpoints. +6. **No new attack surface**: Zero additional pods deployed. + +## Alternatives Considered + +### Alternative 1: Add `remoteServerRefs` to VirtualMCPServer Spec + +Embed remote server configuration directly in the VirtualMCPServer CRD. + +```yaml +kind: VirtualMCPServer +spec: + groupRef: + name: engineering-team + remoteServerRefs: + - name: context7 + remoteURL: https://mcp.context7.com/mcp + transport: streamable-http + - name: salesforce + remoteURL: https://mcp.salesforce.com + transport: streamable-http + externalAuthConfigRef: + name: salesforce-auth +``` + +**Pros:** +- No new CRD needed +- Simple for small deployments + +**Cons:** +- Violates separation of concerns: VirtualMCPServer manages aggregation, + not backend declaration +- Breaks the `groupRef` discovery pattern: some backends discovered via + group, others embedded inline +- Bloats VirtualMCPServer spec +- Prevents independent lifecycle management: adding/removing a remote + backend requires editing the VirtualMCPServer, which may trigger + reconciliation of unrelated configuration +- Prevents fine-grained RBAC: only VirtualMCPServer editors can manage + remote backends + +**Why not chosen:** Inconsistent with existing patterns and prevents the +RBAC separation that makes MCPServerEntry valuable (platform teams manage +vMCP, product teams register backends). + +### Alternative 2: Extend MCPServer with Remote Mode + +Add a `mode: remote` field to the existing MCPServer CRD. + +```yaml +kind: MCPServer +spec: + mode: remote + remoteURL: https://mcp.context7.com/mcp + transport: streamable-http + groupRef: engineering-team +``` + +**Pros:** +- No new CRD +- Reuses existing MCPServer controller infrastructure + +**Cons:** +- MCPServer is fundamentally a container workload resource. Adding a + "don't deploy anything" mode creates confusing semantics: `spec.image` + becomes optional, `spec.resources` is meaningless, status conditions + designed for pod lifecycle don't apply. +- Controller logic becomes complex with conditional paths for + container vs remote modes. +- Existing MCPServer watchers (MCPGroup controller, VirtualMCPServer + controller) would need to handle both modes, adding complexity. +- The controller currently creates Deployments, Services, and ConfigMaps. + Adding a mode that creates none of these is a significant semantic + change. + +**Why not chosen:** Overloading MCPServer with remote-mode semantics +increases complexity and confusion. A separate CRD with clear "this is +configuration only" semantics is cleaner. + +### Alternative 3: Configure Remote Backends Only in vMCP Config + +Handle remote backends entirely in vMCP's configuration (ConfigMap or +runtime discovery) without a CRD. + +**Pros:** +- No CRD changes needed +- Simpler operator + +**Cons:** +- No Kubernetes-native resource to represent remote backends +- No status reporting, no `kubectl get` visibility +- No RBAC for who can manage remote backends +- Breaks the pattern where all backends are discoverable via `groupRef` +- MCPGroup status cannot reflect remote backends + +**Why not chosen:** Loses Kubernetes-native management, visibility, and +access control. + +## Compatibility + +### Backward Compatibility + +MCPServerEntry is a purely additive change: + +- **No changes to existing CRDs**: MCPServer, MCPRemoteProxy, + VirtualMCPServer, MCPGroup, and MCPExternalAuthConfig are unchanged. +- **No changes to existing behavior**: VirtualMCPServer continues to + discover MCPServer resources via `groupRef`. MCPServerEntry adds a + new discovery source alongside the existing one. +- **MCPRemoteProxy still works**: Organizations using MCPRemoteProxy + can continue to do so. MCPServerEntry is an alternative, not a + replacement. +- **No migration required**: Existing deployments work without + modification after the upgrade. + +### Forward Compatibility + +- **Extensibility**: The `MCPServerEntrySpec` can be extended with + additional fields (e.g., rate limiting, tool filtering) without + breaking changes. +- **API versioning**: Starts at `v1alpha1`, consistent with all other + ToolHive CRDs. +- **Future deprecation path**: If MCPRemoteProxy use cases are eventually + subsumed, MCPServerEntry provides a clean migration target. + +## Implementation Plan + +### Phase 1: CRD and Controller + +1. Define `MCPServerEntry` types in + `cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go` +2. Implement validation-only controller in + `cmd/thv-operator/controllers/mcpserverentry_controller.go` +3. Generate CRD manifests (`task operator-generate`, + `task operator-manifests`) +4. Update MCPGroup controller to watch MCPServerEntry resources +5. Add unit tests for controller validation logic + +### Phase 2: Static Mode Integration + +1. Update VirtualMCPServer controller to discover MCPServerEntry resources + in the group +2. Update ConfigMap generation to include entry-type backends +3. Update vMCP static config parser to deserialize entry backends +4. Add `BackendTypeEntry` to vMCP types +5. Implement external TLS transport creation for entry backends +6. Integration tests with envtest + +### Phase 3: Dynamic Mode Integration + +1. Create `MCPServerEntryWatcher` reconciler in `pkg/vmcp/k8s/` +2. Register watcher in the K8s manager alongside MCPServerWatcher +3. Update `ListWorkloadsInGroup()` to include MCPServerEntry +4. Resolve auth configs for entry backends at runtime +5. Integration tests for dynamic discovery of entry backends + +### Phase 4: Documentation and E2E + +1. CRD reference documentation +2. User guide with examples (public remote, authenticated remote, + private CA) +3. MCPRemoteProxy vs MCPServerEntry comparison guide +4. E2E Chainsaw tests for full lifecycle +5. E2E tests for mixed MCPServer + MCPServerEntry groups + +### Dependencies + +- THV-0014 (K8s-Aware vMCP) for dynamic mode support +- THV-0026 (Header Passthrough) for `headerForward` field reuse +- Existing MCPExternalAuthConfig CRD for auth configuration + +## Testing Strategy + +### Unit Tests + +- Controller validation: groupRef exists, authConfigRef exists, HTTPS + enforcement, annotation override +- CRD type serialization/deserialization +- Backend conversion from MCPServerEntry to internal Backend struct +- External TLS transport creation with and without custom CA bundles +- Static config parsing with entry-type backends + +### Integration Tests (envtest) + +- MCPServerEntry controller reconciliation with real API server +- VirtualMCPServer ConfigMap generation including entry backends +- MCPGroup status update with mixed MCPServer + MCPServerEntry members +- Dynamic mode: MCPServerEntry watcher reconciliation +- Auth config resolution for entry backends +- Secret change propagation to entry backends + +### End-to-End Tests (Chainsaw) + +- Full lifecycle: create MCPGroup, create MCPServerEntry, create + VirtualMCPServer, verify vMCP routes to remote backend +- Mixed group: MCPServer (container) + MCPServerEntry (remote) in same + group +- Unauthenticated public remote behind vMCP +- Authenticated remote with token exchange +- MCPServerEntry deletion removes backend from vMCP +- CA bundle configuration for private remotes + +### Security Tests + +- Verify HTTPS enforcement (HTTP URL without annotation is rejected) +- Verify RBAC separation (entry creation requires correct permissions) +- Verify no network probing from controller +- Verify secret values are not logged + +## Documentation + +- **CRD Reference**: Auto-generated CRD documentation for MCPServerEntry + fields, validation rules, and status conditions +- **User Guide**: How to add remote MCP backends to vMCP using + MCPServerEntry, with examples for common scenarios +- **Comparison Guide**: When to use MCPRemoteProxy vs MCPServerEntry: + + | Feature | MCPRemoteProxy | MCPServerEntry | + |---------|---------------|----------------| + | Deploys pods | Yes (proxy pod) | No | + | Own auth middleware | Yes (oidcConfig, authzConfig) | No | + | Own audit logging | Yes | No (uses vMCP's) | + | Standalone use | Yes | No (only via VirtualMCPServer) | + | GroupRef support | Yes (optional) | Yes (required) | + | Primary use case | Standalone proxy with full observability | Backend declaration for vMCP | + +- **Architecture Documentation**: Update `docs/arch/10-virtual-mcp-architecture.md` + to describe MCPServerEntry as a backend type + +## Open Questions + +1. **Should `remoteURL` strictly require HTTPS?** + Recommendation: Yes, with annotation override + (`toolhive.stacklok.dev/allow-insecure: "true"`) for development. + This prevents accidental plaintext credential transmission while + allowing local development workflows. + +2. **Should the CRD support custom CA bundles for private remote servers?** + Recommendation: Yes, via `caBundleRef` field referencing a Secret or + ConfigMap. This is essential for enterprises with internal CAs. The + current design includes this field. + +3. **Should there be a `disabled` field for temporarily removing an entry + from discovery without deleting it?** + This could be useful for maintenance windows or incident response. + However, it adds complexity and can be achieved by removing the + `groupRef` temporarily. Defer to post-implementation feedback. + +4. **Should MCPServerEntry support `toolConfigRef` for tool filtering?** + MCPRemoteProxy supports tool filtering via `toolConfigRef`. + VirtualMCPServer also has its own tool filtering/override configuration + in `spec.aggregation.tools`. For MCPServerEntry, tool filtering should + be configured at the VirtualMCPServer level (where it already exists) + rather than duplicating it on the entry. Defer unless there is a clear + use case for entry-level filtering. + +## References + +- [THV-0008: Virtual MCP Server](./THV-0008-virtual-mcp-server.md) - + VirtualMCPServer design, auth boundaries, capability aggregation +- [THV-0009: Remote MCP Server Proxy](./THV-0009-remote-mcp-proxy.md) - + MCPRemoteProxy CRD design +- [THV-0010: MCPGroup CRD](./THV-0010-kubernetes-mcpgroup-crd.md) - + Group-based backend discovery pattern +- [THV-0014: K8s-Aware vMCP](./THV-0014-vmcp-k8s-aware-refactor.md) - + Dynamic vs static discovery modes, reconciler infrastructure +- [THV-0026: Header Passthrough](./THV-0026-header-passthrough.md) - + `headerForward` configuration pattern +- [Istio ServiceEntry](https://istio.io/latest/docs/reference/config/networking/service-entry/) - + Naming pattern inspiration +- [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104) - + MCPRemoteProxy forces OIDC auth on public remotes behind vMCP +- [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) - + Dual auth boundary confusion with externalAuthConfigRef + +--- + +## RFC Lifecycle + + + +### Review History + +| Date | Reviewer | Decision | Notes | +|------|----------|----------|-------| +| 2026-03-12 | @jaosorior | Draft | Initial submission | + +### Implementation Tracking + +| Repository | PR | Status | +|------------|-----|--------| +| toolhive | TBD | Not started | From e8b6c513d7ec7a53f1e9298d303b488b9395698a Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Thu, 12 Mar 2026 09:05:59 +0200 Subject: [PATCH 02/15] Rename RFC to match PR number THV-0055 Co-Authored-By: Claude Opus 4.6 --- ...nds.md => THV-0055-mcpserverentry-direct-remote-backends.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename rfcs/{THV-XXXX-mcpserverentry-direct-remote-backends.md => THV-0055-mcpserverentry-direct-remote-backends.md} (99%) diff --git a/rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md similarity index 99% rename from rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md rename to rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index 7b061f1..d18484e 100644 --- a/rfcs/THV-XXXX-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -1,4 +1,4 @@ -# RFC-XXXX: MCPServerEntry CRD for Direct Remote MCP Server Backends +# RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends - **Status**: Draft - **Author(s)**: Juan Antonio Osorio (@jaosorior) From 841b88698862e4e8aa300b1629c901eec3d4470d Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Thu, 12 Mar 2026 09:09:28 +0200 Subject: [PATCH 03/15] Remove Go code samples, replace with prose descriptions RFC should focus on design intent, not implementation code. Keep YAML/Mermaid examples, replace Go blocks with prose describing controller behavior, discovery logic, and TLS handling. Co-Authored-By: Claude Opus 4.6 --- ...5-mcpserverentry-direct-remote-backends.md | 430 ++++-------------- 1 file changed, 80 insertions(+), 350 deletions(-) diff --git a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index d18484e..93e7dbc 100644 --- a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -310,77 +310,30 @@ spec: #### CRD Type Definitions -```go -// MCPServerEntry declares a remote MCP server endpoint as a backend for -// VirtualMCPServer. Unlike MCPServer (which deploys container workloads) -// or MCPRemoteProxy (which deploys proxy pods), MCPServerEntry is a -// pure configuration resource that deploys no infrastructure. -// -// +kubebuilder:object:root=true -// +kubebuilder:subresource:status -// +kubebuilder:resource:shortName=mcpentry -// +kubebuilder:printcolumn:name="URL",type=string,JSONPath=`.spec.remoteURL` -// +kubebuilder:printcolumn:name="Transport",type=string,JSONPath=`.spec.transport` -// +kubebuilder:printcolumn:name="Group",type=string,JSONPath=`.spec.groupRef` -// +kubebuilder:printcolumn:name="Ready",type=string,JSONPath=`.status.conditions[?(@.type=="Ready")].status` -// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp` -type MCPServerEntry struct { - metav1.TypeMeta `json:",inline"` - metav1.ObjectMeta `json:"metadata,omitempty"` - - Spec MCPServerEntrySpec `json:"spec,omitempty"` - Status MCPServerEntryStatus `json:"status,omitempty"` -} - -type MCPServerEntrySpec struct { - // RemoteURL is the URL of the remote MCP server. - // Must use HTTPS unless the toolhive.stacklok.dev/allow-insecure - // annotation is set to "true" (for development only). - // +kubebuilder:validation:Required - // +kubebuilder:validation:Pattern=`^https?://` - RemoteURL string `json:"remoteURL"` - - // Transport specifies the MCP transport protocol. - // +kubebuilder:validation:Required - // +kubebuilder:validation:Enum=streamable-http;sse - Transport string `json:"transport"` - - // GroupRef is the name of the MCPGroup this entry belongs to. - // Required because an MCPServerEntry without a group cannot be - // discovered by any VirtualMCPServer. - // +kubebuilder:validation:Required - // +kubebuilder:validation:MinLength=1 - GroupRef string `json:"groupRef"` - - // ExternalAuthConfigRef references an MCPExternalAuthConfig in the - // same namespace for authenticating to the remote server. - // Omit for unauthenticated public endpoints. - // +optional - ExternalAuthConfigRef *ExternalAuthConfigRef `json:"externalAuthConfigRef,omitempty"` - - // HeaderForward configures additional headers to inject into - // requests forwarded to the remote server. - // +optional - HeaderForward *HeaderForwardConfig `json:"headerForward,omitempty"` - - // CABundleRef references a ConfigMap or Secret containing a custom - // CA certificate bundle for TLS verification of the remote server. - // Useful for remote servers with private/internal CA certificates. - // +optional - CABundleRef *SecretKeyRef `json:"caBundleRef,omitempty"` -} - -type MCPServerEntryStatus struct { - // Conditions represent the latest available observations of the - // MCPServerEntry's state. - // +optional - Conditions []metav1.Condition `json:"conditions,omitempty"` - - // ObservedGeneration is the most recent generation observed. - // +optional - ObservedGeneration int64 `json:"observedGeneration,omitempty"` -} -``` +The `MCPServerEntry` CRD type is defined in +`cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go`. It follows the +standard kubebuilder pattern with `Spec` and `Status` subresources. + +The resource uses the short name `mcpentry` and exposes print columns for +URL, Transport, Group, Ready status, and Age. + +**Spec fields:** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `remoteURL` | string | Yes | URL of the remote MCP server. Must match `^https?://`. HTTPS enforced unless `toolhive.stacklok.dev/allow-insecure` annotation is set. | +| `transport` | enum | Yes | MCP transport protocol: `streamable-http` or `sse`. | +| `groupRef` | string | Yes | Name of the MCPGroup this entry belongs to (min length: 1). | +| `externalAuthConfigRef` | object | No | Reference to an MCPExternalAuthConfig in the same namespace. Omit for unauthenticated endpoints. | +| `headerForward` | object | No | Header forwarding configuration. Reuses existing `HeaderForwardConfig` type from MCPRemoteProxy. | +| `caBundleRef` | object | No | Reference to a Secret containing a custom CA certificate bundle for TLS verification. | + +**Status fields:** + +| Field | Type | Description | +|-------|------|-------------| +| `conditions` | []Condition | Standard Kubernetes conditions (see table below). | +| `observedGeneration` | int64 | Most recent generation observed by the controller. | **Condition types:** @@ -437,99 +390,25 @@ status: Validation-only controller The MCPServerEntry controller is intentionally simple. It performs -**validation only** and creates **no infrastructure**: - -```go -func (r *MCPServerEntryReconciler) Reconcile( - ctx context.Context, req ctrl.Request, -) (ctrl.Result, error) { - var entry mcpv1alpha1.MCPServerEntry - if err := r.Get(ctx, req.NamespacedName, &entry); err != nil { - return ctrl.Result{}, client.IgnoreNotFound(err) - } - - statusManager := NewStatusManager(&entry) - - // Validate groupRef exists - var group mcpv1alpha1.MCPGroup - if err := r.Get(ctx, client.ObjectKey{ - Namespace: entry.Namespace, - Name: entry.Spec.GroupRef, - }, &group); err != nil { - if apierrors.IsNotFound(err) { - statusManager.SetCondition("GroupRefValid", "GroupNotFound", - fmt.Sprintf("MCPGroup %q not found", entry.Spec.GroupRef), - metav1.ConditionFalse) - statusManager.SetCondition("Ready", "ValidationFailed", - "Referenced MCPGroup does not exist", metav1.ConditionFalse) - return r.updateStatus(ctx, &entry, statusManager) - } - return ctrl.Result{}, err - } - statusManager.SetCondition("GroupRefValid", "GroupExists", - fmt.Sprintf("MCPGroup %q exists", entry.Spec.GroupRef), - metav1.ConditionTrue) - - // Validate externalAuthConfigRef if set - if entry.Spec.ExternalAuthConfigRef != nil { - var authConfig mcpv1alpha1.MCPExternalAuthConfig - if err := r.Get(ctx, client.ObjectKey{ - Namespace: entry.Namespace, - Name: entry.Spec.ExternalAuthConfigRef.Name, - }, &authConfig); err != nil { - if apierrors.IsNotFound(err) { - statusManager.SetCondition("AuthConfigValid", - "AuthConfigNotFound", - fmt.Sprintf("MCPExternalAuthConfig %q not found", - entry.Spec.ExternalAuthConfigRef.Name), - metav1.ConditionFalse) - statusManager.SetCondition("Ready", "ValidationFailed", - "Referenced auth config does not exist", - metav1.ConditionFalse) - return r.updateStatus(ctx, &entry, statusManager) - } - return ctrl.Result{}, err - } - statusManager.SetCondition("AuthConfigValid", "AuthConfigExists", - fmt.Sprintf("MCPExternalAuthConfig %q exists", - entry.Spec.ExternalAuthConfigRef.Name), - metav1.ConditionTrue) - } - - // Validate HTTPS requirement - if !strings.HasPrefix(entry.Spec.RemoteURL, "https://") { - if entry.Annotations["toolhive.stacklok.dev/allow-insecure"] != "true" { - statusManager.SetCondition("Ready", "InsecureURL", - "remoteURL must use HTTPS (set annotation "+ - "toolhive.stacklok.dev/allow-insecure=true to override)", - metav1.ConditionFalse) - return r.updateStatus(ctx, &entry, statusManager) - } - } - - statusManager.SetCondition("Ready", "ValidationSucceeded", - "MCPServerEntry is valid and ready for discovery", - metav1.ConditionTrue) - - return r.updateStatus(ctx, &entry, statusManager) -} - -func (r *MCPServerEntryReconciler) SetupWithManager( - mgr ctrl.Manager, -) error { - return ctrl.NewControllerManagedBy(mgr). - For(&mcpv1alpha1.MCPServerEntry{}). - Watches(&mcpv1alpha1.MCPGroup{}, - handler.EnqueueRequestsFromMapFunc( - r.findEntriesForGroup, - )). - Watches(&mcpv1alpha1.MCPExternalAuthConfig{}, - handler.EnqueueRequestsFromMapFunc( - r.findEntriesForAuthConfig, - )). - Complete(r) -} -``` +**validation only** and creates **no infrastructure**. + +The reconciliation logic: + +1. Fetches the MCPServerEntry resource (ignores not-found for deletions). +2. Validates that the referenced MCPGroup exists in the same namespace. + Sets `GroupRefValid` condition accordingly. +3. If `externalAuthConfigRef` is set, validates that the referenced + MCPExternalAuthConfig exists. Sets `AuthConfigValid` condition. +4. Validates the HTTPS requirement: if `remoteURL` does not use HTTPS, + the controller checks for the `toolhive.stacklok.dev/allow-insecure` + annotation. Without it, the `Ready` condition is set to false with + reason `InsecureURL`. +5. If all validations pass, sets `Ready` to true with reason + `ValidationSucceeded`. + +The controller watches MCPGroup and MCPExternalAuthConfig resources via +`EnqueueRequestsFromMapFunc` handlers, so that changes to referenced +resources trigger re-validation of affected MCPServerEntry resources. No finalizers are needed because MCPServerEntry creates no infrastructure to clean up. @@ -589,211 +468,62 @@ backends: ##### vMCP: Backend Type and Discovery -**New backend type:** +A new `BackendTypeEntry` constant (`"entry"`) is added to +`pkg/vmcp/types.go` alongside the existing `BackendTypeContainer` and +`BackendTypeProxy`. -```go -// In pkg/vmcp/types.go -const ( - BackendTypeContainer BackendType = "container" - BackendTypeProxy BackendType = "proxy" - BackendTypeEntry BackendType = "entry" // New -) -``` +The `ListWorkloadsInGroup()` function in `pkg/vmcp/workloads/k8s.go` is +extended to discover MCPServerEntry resources in addition to MCPServer +resources. For each MCPServerEntry in the group, vMCP: -**Discovery updates:** - -```go -// In pkg/vmcp/workloads/k8s.go -func (m *K8sWorkloadManager) ListWorkloadsInGroup( - ctx context.Context, groupName string, -) ([]Backend, error) { - var backends []Backend - - // Existing: discover MCPServer resources - mcpServers, err := m.discoverMCPServers(ctx, groupName) - if err != nil { - return nil, fmt.Errorf("discovering MCPServers: %w", err) - } - backends = append(backends, mcpServers...) - - // New: discover MCPServerEntry resources - entries, err := m.discoverMCPServerEntries(ctx, groupName) - if err != nil { - return nil, fmt.Errorf("discovering MCPServerEntries: %w", err) - } - backends = append(backends, entries...) - - return backends, nil -} - -func (m *K8sWorkloadManager) discoverMCPServerEntries( - ctx context.Context, groupName string, -) ([]Backend, error) { - var entryList mcpv1alpha1.MCPServerEntryList - if err := m.client.List(ctx, &entryList, - client.InNamespace(m.namespace), - client.MatchingFields{"spec.groupRef": groupName}, - ); err != nil { - return nil, err - } - - var backends []Backend - for _, entry := range entryList.Items { - backend := Backend{ - ID: fmt.Sprintf("%s/%s", entry.Namespace, entry.Name), - Name: entry.Name, - BaseURL: entry.Spec.RemoteURL, - Transport: entry.Spec.Transport, - Type: BackendTypeEntry, - } - - // Resolve auth if configured - if entry.Spec.ExternalAuthConfigRef != nil { - authConfig, err := m.resolveAuthConfig(ctx, - entry.Namespace, - entry.Spec.ExternalAuthConfigRef.Name, - ) - if err != nil { - return nil, fmt.Errorf( - "resolving auth for entry %s: %w", - entry.Name, err, - ) - } - backend.AuthConfig = authConfig - } - - // Resolve header forward config if set - if entry.Spec.HeaderForward != nil { - backend.HeaderForward = m.resolveHeaderForward( - ctx, entry.Namespace, entry.Spec.HeaderForward, - ) - } - - // Resolve CA bundle if set - if entry.Spec.CABundleRef != nil { - caBundle, err := m.resolveCABundle(ctx, - entry.Namespace, entry.Spec.CABundleRef, - ) - if err != nil { - return nil, fmt.Errorf( - "resolving CA bundle for entry %s: %w", - entry.Name, err, - ) - } - backend.CABundle = caBundle - } - - backends = append(backends, backend) - } - - return backends, nil -} -``` +1. Lists MCPServerEntry resources filtered by `spec.groupRef`. +2. Converts each entry to an internal `Backend` struct using the entry's + `remoteURL`, `transport`, and name. +3. Resolves `externalAuthConfigRef` if set (using existing auth resolution + logic). +4. Resolves `headerForward` configuration if set. +5. Resolves `caBundleRef` if set (fetching the CA certificate from the + referenced Secret). +6. Appends the resulting backends alongside MCPServer-sourced backends. ##### vMCP: HTTP Client for External TLS Backends of type `entry` connect to external URLs over HTTPS. The vMCP -HTTP client must be updated to: +HTTP client in `pkg/vmcp/client/client.go` must be updated to: 1. Use the system CA certificate pool by default (for public CAs). 2. Optionally append a custom CA bundle from `caBundleRef` (for private - CAs). -3. Apply the resolved `externalAuthConfigRef` credentials directly to + CAs) to the system pool. +3. Enforce a minimum TLS version of 1.2. +4. Apply the resolved `externalAuthConfigRef` credentials directly to outgoing requests. -```go -// In pkg/vmcp/client/client.go -func (c *Client) createTransportForEntry( - backend *Backend, -) (*http.Transport, error) { - tlsConfig := &tls.Config{ - MinVersion: tls.VersionTLS12, - } - - if backend.CABundle != nil { - pool, err := x509.SystemCertPool() - if err != nil { - pool = x509.NewCertPool() - } - if !pool.AppendCertsFromPEM(backend.CABundle) { - return nil, fmt.Errorf("failed to parse CA bundle") - } - tlsConfig.RootCAs = pool - } - - return &http.Transport{ - TLSClientConfig: tlsConfig, - }, nil -} -``` - ##### vMCP: Dynamic Mode Reconciler Update For dynamic mode (`outgoingAuth.source: discovered`), the reconciler infrastructure from THV-0014 must be extended to watch MCPServerEntry resources. +**New files:** +- `pkg/vmcp/k8s/mcpserverentry_watcher.go` - MCPServerEntry reconciler + **Files to modify:** - `pkg/vmcp/k8s/manager.go` - Register MCPServerEntry watcher -- `pkg/vmcp/k8s/mcpserverentry_watcher.go` (new) - MCPServerEntry - reconciler - -```go -type MCPServerEntryWatcher struct { - client client.Client - registry vmcp.DynamicRegistry - groupRef string -} - -func (w *MCPServerEntryWatcher) Reconcile( - ctx context.Context, req ctrl.Request, -) (ctrl.Result, error) { - backendID := req.NamespacedName.String() - - var entry mcpv1alpha1.MCPServerEntry - if err := w.client.Get(ctx, req.NamespacedName, &entry); err != nil { - if apierrors.IsNotFound(err) { - w.registry.Remove(backendID) - return ctrl.Result{}, nil - } - return ctrl.Result{}, err - } - - if entry.Spec.GroupRef != w.groupRef { - // Not in our group, remove if previously tracked - w.registry.Remove(backendID) - return ctrl.Result{}, nil - } - - backend, err := w.convertToBackend(ctx, &entry) - if err != nil { - return ctrl.Result{}, err - } - backend.ID = backendID - - if err := w.registry.Upsert(backend); err != nil { - return ctrl.Result{}, err - } - - return ctrl.Result{}, nil -} - -func (w *MCPServerEntryWatcher) SetupWithManager( - mgr ctrl.Manager, -) error { - return ctrl.NewControllerManagedBy(mgr). - For(&mcpv1alpha1.MCPServerEntry{}). - Watches(&mcpv1alpha1.MCPExternalAuthConfig{}, - handler.EnqueueRequestsFromMapFunc( - w.findEntriesForAuthConfig, - )). - Watches(&corev1.Secret{}, - handler.EnqueueRequestsFromMapFunc( - w.findEntriesForSecret, - )). - Complete(w) -} -``` + +The `MCPServerEntryWatcher` follows the same reconciler pattern as the +existing `MCPServerWatcher` from THV-0014. It holds a reference to the +`DynamicRegistry` and the target `groupRef`. On reconciliation: + +1. If the resource is deleted (not found), it removes the backend from the + registry by namespaced name. +2. If the entry's `groupRef` doesn't match the watcher's group, it removes + the backend (handles group reassignment). +3. Otherwise, it converts the MCPServerEntry to a `Backend` struct + (resolving auth, headers, CA bundle) and upserts it into the registry. + +The watcher also watches MCPExternalAuthConfig and Secret resources via +`EnqueueRequestsFromMapFunc` handlers, so changes to referenced auth +configs or secrets trigger re-reconciliation of affected entries. ##### vMCP: Static Config Parser Update From 0952cb2c8c2612bc4522653b056c8248f5a192e1 Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Thu, 12 Mar 2026 09:11:56 +0200 Subject: [PATCH 04/15] Remove file path lists from component changes section Implementation details like specific file paths belong in the implementation, not the RFC design document. Co-Authored-By: Claude Opus 4.6 --- ...5-mcpserverentry-direct-remote-backends.md | 38 +++---------------- 1 file changed, 5 insertions(+), 33 deletions(-) diff --git a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index 93e7dbc..604f9da 100644 --- a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -383,12 +383,6 @@ status: ##### Operator: New CRD and Controller -**New files:** -- `cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go` - CRD type - definitions -- `cmd/thv-operator/controllers/mcpserverentry_controller.go` - - Validation-only controller - The MCPServerEntry controller is intentionally simple. It performs **validation only** and creates **no infrastructure**. @@ -419,10 +413,6 @@ The MCPGroup controller must be updated to watch MCPServerEntry resources in addition to MCPServer resources, so that `status.servers` and `status.serverCount` reflect both types of backends in the group. -**Files to modify:** -- `cmd/thv-operator/controllers/mcpgroup_controller.go` - Add watch for - MCPServerEntry, update status aggregation - ##### Operator: VirtualMCPServer Controller Update **Static mode (`outgoingAuth.source: inline`):** The operator generates @@ -460,12 +450,6 @@ backends: # ... ``` -**Files to modify:** -- `cmd/thv-operator/controllers/virtualmcpserver_controller.go` - Discover - MCPServerEntry resources in group -- `cmd/thv-operator/controllers/virtualmcpserver_vmcpconfig.go` - Include - entry backends in ConfigMap generation - ##### vMCP: Backend Type and Discovery A new `BackendTypeEntry` constant (`"entry"`) is added to @@ -504,12 +488,6 @@ For dynamic mode (`outgoingAuth.source: discovered`), the reconciler infrastructure from THV-0014 must be extended to watch MCPServerEntry resources. -**New files:** -- `pkg/vmcp/k8s/mcpserverentry_watcher.go` - MCPServerEntry reconciler - -**Files to modify:** -- `pkg/vmcp/k8s/manager.go` - Register MCPServerEntry watcher - The `MCPServerEntryWatcher` follows the same reconciler pattern as the existing `MCPServerWatcher` from THV-0014. It holds a reference to the `DynamicRegistry` and the target `groupRef`. On reconciliation: @@ -531,9 +509,6 @@ The static config parser must be updated to deserialize `type: entry` backends from the ConfigMap and create appropriate HTTP clients with external TLS support. -**Files to modify:** -- `pkg/vmcp/config/` - Parse entry-type backends from static config - ## Security Considerations ### Threat Model @@ -738,12 +713,9 @@ MCPServerEntry is a purely additive change: ### Phase 1: CRD and Controller -1. Define `MCPServerEntry` types in - `cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go` -2. Implement validation-only controller in - `cmd/thv-operator/controllers/mcpserverentry_controller.go` -3. Generate CRD manifests (`task operator-generate`, - `task operator-manifests`) +1. Define `MCPServerEntry` CRD types +2. Implement validation-only controller +3. Generate CRD manifests 4. Update MCPGroup controller to watch MCPServerEntry resources 5. Add unit tests for controller validation logic @@ -759,9 +731,9 @@ MCPServerEntry is a purely additive change: ### Phase 3: Dynamic Mode Integration -1. Create `MCPServerEntryWatcher` reconciler in `pkg/vmcp/k8s/` +1. Create MCPServerEntry reconciler for vMCP's dynamic registry 2. Register watcher in the K8s manager alongside MCPServerWatcher -3. Update `ListWorkloadsInGroup()` to include MCPServerEntry +3. Update workload discovery to include MCPServerEntry 4. Resolve auth configs for entry backends at runtime 5. Integration tests for dynamic discovery of entry backends From b70b8c958b824c678d2376bd156603dafdefe8f1 Mon Sep 17 00:00:00 2001 From: Juan Antonio Osorio Date: Thu, 12 Mar 2026 12:38:22 +0200 Subject: [PATCH 05/15] Address review feedback on MCPServerEntry RFC - Clarify groupRef is plain string for consistency with MCPServer/MCPRemoteProxy - Fix Alt 1 YAML example to use string form for groupRef - Change caBundleRef to reference ConfigMap (CA certs are public data) - Add SSRF rationale: CEL IP blocking omitted since internal servers are legitimate - Clarify auth resolution loads config only, token exchange deferred to request time - Specify CA bundle volume mount for static mode (PEM files, not env vars) - Document toolConfigRef migration path via aggregation.tools[].workload Co-Authored-By: Claude Opus 4.6 --- ...5-mcpserverentry-direct-remote-backends.md | 67 +++++++++++++------ 1 file changed, 48 insertions(+), 19 deletions(-) diff --git a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md index 604f9da..511f799 100644 --- a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md +++ b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md @@ -236,7 +236,8 @@ spec: key: api-key # OPTIONAL: Custom CA bundle for private remote servers using - # internal/self-signed certificates. + # internal/self-signed certificates. References a ConfigMap (not Secret) + # because CA certificates are public data. caBundleRef: name: internal-ca-bundle key: ca.crt @@ -323,10 +324,10 @@ URL, Transport, Group, Ready status, and Age. |-------|------|----------|-------------| | `remoteURL` | string | Yes | URL of the remote MCP server. Must match `^https?://`. HTTPS enforced unless `toolhive.stacklok.dev/allow-insecure` annotation is set. | | `transport` | enum | Yes | MCP transport protocol: `streamable-http` or `sse`. | -| `groupRef` | string | Yes | Name of the MCPGroup this entry belongs to (min length: 1). | +| `groupRef` | string | Yes | Name of the MCPGroup this entry belongs to (min length: 1). Uses a plain string (not `LocalObjectReference`) for consistency with `MCPServer.spec.groupRef` and `MCPRemoteProxy.spec.groupRef`. | | `externalAuthConfigRef` | object | No | Reference to an MCPExternalAuthConfig in the same namespace. Omit for unauthenticated endpoints. | | `headerForward` | object | No | Header forwarding configuration. Reuses existing `HeaderForwardConfig` type from MCPRemoteProxy. | -| `caBundleRef` | object | No | Reference to a Secret containing a custom CA certificate bundle for TLS verification. | +| `caBundleRef` | object | No | Reference to a ConfigMap containing a custom CA certificate bundle for TLS verification. ConfigMap is used rather than Secret because CA certificates are public data, consistent with the `kube-root-ca.crt` pattern. | **Status fields:** @@ -463,8 +464,11 @@ resources. For each MCPServerEntry in the group, vMCP: 1. Lists MCPServerEntry resources filtered by `spec.groupRef`. 2. Converts each entry to an internal `Backend` struct using the entry's `remoteURL`, `transport`, and name. -3. Resolves `externalAuthConfigRef` if set (using existing auth resolution - logic). +3. If `externalAuthConfigRef` is set, loads the referenced + MCPExternalAuthConfig spec and stores the auth strategy (token exchange + endpoint, client credentials reference, audience) in the `Backend` + struct. Actual token exchange is deferred to request time because + tokens are short-lived and may be per-user. 4. Resolves `headerForward` configuration if set. 5. Resolves `caBundleRef` if set (fetching the CA certificate from the referenced Secret). @@ -517,7 +521,7 @@ external TLS support. |--------|-------------|------------| | Man-in-the-middle on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | | Credential exposure in CRD spec | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets, referenced via `externalAuthConfigRef` and `headerForward.addHeadersFromSecrets`; never inline in CRD spec | -| SSRF via remoteURL | Operator configures URL pointing to internal services | Mitigated by RBAC (only authorized users create MCPServerEntry); annotation required for non-HTTPS; NetworkPolicy should restrict vMCP egress | +| SSRF via remoteURL | Operator configures URL pointing to internal services | Mitigated by RBAC (only authorized users create MCPServerEntry); annotation required for non-HTTPS; NetworkPolicy should restrict vMCP egress. Note: CEL-based IP range blocking (e.g., RFC 1918) is intentionally not applied because MCPServerEntry legitimately targets internal/corporate MCP servers. RBAC is the appropriate control layer since resource creation is restricted to trusted operators. | | Auth config confusion (existing issue) | Dual-boundary auth leading to wrong tokens sent to wrong endpoints | Eliminated: MCPServerEntry has exactly one auth boundary with one purpose | | Operator probing external URLs | Controller making network requests to untrusted URLs | Eliminated: controller performs validation only, no network probing | @@ -543,7 +547,8 @@ external TLS support. - **At rest**: No sensitive data stored in MCPServerEntry spec. Auth credentials are in K8s Secrets, referenced indirectly. - **CA bundles**: Custom CA certificates referenced via `caBundleRef`, - stored in K8s Secrets/ConfigMaps with standard K8s encryption at rest. + stored in K8s ConfigMaps. CA certificates are public data and do not + require Secret-level protection. ### Input Validation @@ -563,6 +568,17 @@ external TLS support. - **Dynamic mode**: vMCP reads secrets at runtime via K8s API (namespace-scoped RBAC). - **Static mode**: Operator mounts secrets as environment variables. +- **CA bundle propagation** differs from credential secrets because CA + certificates are multi-line PEM data that must be loaded from the + filesystem (Go's `crypto/tls` loads CA bundles via file reads, not + environment variables): + - **Dynamic mode**: vMCP reads the CA bundle data from the K8s API + at runtime (from the ConfigMap referenced by `caBundleRef`). + - **Static mode**: The operator mounts the ConfigMap referenced by + `caBundleRef` as a **volume** into the vMCP pod at a well-known + path (e.g., `/etc/toolhive/ca-bundles//ca.crt`). The + generated backend ConfigMap includes the mount path so vMCP can + construct the `tls.Config` at startup. - Secret rotation follows existing patterns: - **Dynamic mode**: Watch-based propagation, no pod restart needed. - **Static mode**: Requires pod restart (Deployment rollout). @@ -596,8 +612,7 @@ Embed remote server configuration directly in the VirtualMCPServer CRD. ```yaml kind: VirtualMCPServer spec: - groupRef: - name: engineering-team + groupRef: engineering-team remoteServerRefs: - name: context7 remoteURL: https://mcp.context7.com/mcp @@ -724,10 +739,14 @@ MCPServerEntry is a purely additive change: 1. Update VirtualMCPServer controller to discover MCPServerEntry resources in the group 2. Update ConfigMap generation to include entry-type backends -3. Update vMCP static config parser to deserialize entry backends -4. Add `BackendTypeEntry` to vMCP types -5. Implement external TLS transport creation for entry backends -6. Integration tests with envtest +3. Mount CA bundle ConfigMaps as volumes into the vMCP pod for entries + that specify `caBundleRef` (at a well-known path such as + `/etc/toolhive/ca-bundles//`) +4. Update vMCP static config parser to deserialize entry backends +5. Add `BackendTypeEntry` to vMCP types +6. Implement external TLS transport creation for entry backends + (loading CA bundles from mounted volume paths) +7. Integration tests with envtest ### Phase 3: Dynamic Mode Integration @@ -819,8 +838,10 @@ MCPServerEntry is a purely additive change: allowing local development workflows. 2. **Should the CRD support custom CA bundles for private remote servers?** - Recommendation: Yes, via `caBundleRef` field referencing a Secret or - ConfigMap. This is essential for enterprises with internal CAs. The + Recommendation: Yes, via `caBundleRef` field referencing a ConfigMap. + CA certificates are public data and ConfigMap is the semantically + appropriate resource type, consistent with the `kube-root-ca.crt` + pattern. This is essential for enterprises with internal CAs. The current design includes this field. 3. **Should there be a `disabled` field for temporarily removing an entry @@ -832,10 +853,18 @@ MCPServerEntry is a purely additive change: 4. **Should MCPServerEntry support `toolConfigRef` for tool filtering?** MCPRemoteProxy supports tool filtering via `toolConfigRef`. VirtualMCPServer also has its own tool filtering/override configuration - in `spec.aggregation.tools`. For MCPServerEntry, tool filtering should - be configured at the VirtualMCPServer level (where it already exists) - rather than duplicating it on the entry. Defer unless there is a clear - use case for entry-level filtering. + in `spec.aggregation.tools`, which supports per-backend filtering via + the `workload` field (e.g., `tools: [{workload: "salesforce", filter: [...]}]`). + For MCPServerEntry, tool filtering should be configured at the + VirtualMCPServer level rather than duplicating it on the entry. + **Migration note:** Users migrating from MCPRemoteProxy who rely on + `toolConfigRef` for per-backend tool filtering should configure + equivalent filtering in `VirtualMCPServer.spec.aggregation.tools` + with the `workload` field set to the MCPServerEntry name. If + post-implementation feedback reveals that `aggregation.tools` is + insufficient for per-backend filtering use cases, `toolConfigRef` + can be added to MCPServerEntry in a follow-up without breaking + changes. ## References From 121b68227ef954d3af65ef1f274c4b238f44bc58 Mon Sep 17 00:00:00 2001 From: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Date: Wed, 18 Mar 2026 17:08:45 +0000 Subject: [PATCH 06/15] adds unified remote backend rfc Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> --- ...premoteendpoint-unified-remote-backends.md | 774 ++++++++++++++++++ 1 file changed, 774 insertions(+) create mode 100644 rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md diff --git a/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md b/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md new file mode 100644 index 0000000..62db7e3 --- /dev/null +++ b/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md @@ -0,0 +1,774 @@ +# RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity + +- **Status**: Draft +- **Author(s)**: @ChrisJBurns, @jaosorior +- **Created**: 2026-03-18 +- **Last Updated**: 2026-03-18 +- **Target Repository**: toolhive +- **Supersedes**: [THV-0055](./THV-0055-mcpserverentry-direct-remote-backends.md) +- **Related Issues**: [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104), [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) + +## Summary + +Introduce a new `MCPRemoteEndpoint` CRD that unifies remote MCP server +connectivity under a single resource with two explicit modes: + +- **`type: proxy`** — deploys a proxy pod with full auth middleware, authz + policy, and audit logging. Functionally equivalent to `MCPRemoteProxy` and + replaces it. +- **`type: direct`** — no pod deployed; VirtualMCPServer connects directly to + the remote URL. Resolves forced-auth on public remotes + ([#3104](https://github.com/stacklok/toolhive/issues/3104)) and eliminates + unnecessary infrastructure for simple remote backends. + +`MCPRemoteProxy` is deprecated in favour of `MCPRemoteEndpoint` with +`type: proxy`. Existing `MCPRemoteProxy` resources continue to function during +the deprecation window with no immediate migration required. + +## Problem Statement + +### 1. Forced Authentication on Public Remotes (Issue #3104) + +`MCPRemoteProxy` requires OIDC authentication configuration even when +VirtualMCPServer already handles client authentication at its own boundary. +This blocks unauthenticated public remote MCP servers (e.g., context7, public +API gateways) from being placed behind vMCP without configuring unnecessary +auth on the proxy layer. + +### 2. Resource Waste + +Every remote MCP server behind vMCP requires a full Deployment + Service + Pod +just to forward HTTP requests that vMCP could make directly. For organisations +with many remote MCP backends, this creates unnecessary infrastructure cost and +operational overhead. + +### 3. CRD Proliferation and Overlapping Goals + +The original THV-0055 proposed `MCPServerEntry` as a companion resource to +`MCPRemoteProxy`. Both resources would have existed to serve the same +high-level user goal: connecting to a remote MCP server. Both reference a +`remoteURL`, join a `groupRef`, and support `externalAuthConfigRef`. The only +difference is whether a proxy pod is deployed. + +Having two separate CRDs for the same goal — differing only in their mechanism +— increases the API surface users must learn and makes the right choice +non-obvious before writing any YAML. The goal (`connect to a remote server`) +should be the abstraction; the mechanism (`via a proxy pod` vs `directly`) +should be a configuration choice within it. + +### Who Is Affected + +- **Platform teams** deploying vMCP with remote MCP backends in Kubernetes +- **Product teams** wanting to register external MCP services behind vMCP +- **Existing `MCPRemoteProxy` users** who will migrate to + `MCPRemoteEndpoint` with `type: proxy` + +## Goals + +- Provide a single, purpose-built CRD for all remote MCP server connectivity +- Enable vMCP to connect directly to remote MCP servers without a proxy pod + for simple use cases +- Allow unauthenticated remote MCP servers behind vMCP without workarounds +- Retain the full feature set of `MCPRemoteProxy` (auth middleware, authz, + audit logging) under `type: proxy` +- Deprecate `MCPRemoteProxy` with a clear migration path +- Reduce long-term CRD surface area rather than growing it + +## Non-Goals + +- **Removing `MCPRemoteProxy` immediately**: It remains functional during the + deprecation window. Removal is a follow-up once adoption of + `MCPRemoteEndpoint` is confirmed. +- **Adding health probing from the operator**: The controller should NOT probe + remote URLs. Health checking belongs in vMCP's existing runtime + infrastructure (`healthCheckInterval`, circuit breaker). +- **Cross-namespace references**: `MCPRemoteEndpoint` follows the same + namespace-scoped patterns as other ToolHive CRDs. +- **Supporting stdio or container-based transports**: `MCPRemoteEndpoint` is + exclusively for remote HTTP-based MCP servers. +- **CLI mode support**: `MCPRemoteEndpoint` is a Kubernetes-only CRD. + +## Proposed Solution + +### High-Level Design + +`MCPRemoteEndpoint` is a single CRD with a `type` discriminator field. Fields +that are shared across both modes sit at the top level. Fields that only apply +to the proxy pod deployment are grouped under `proxyConfig` and are ignored +when `type: direct`. + +```mermaid +graph TB + subgraph "Client Layer" + Client[MCP Client] + end + + subgraph "Virtual MCP Server" + InAuth[Incoming Auth] + Router[Request Router] + AuthMgr[Backend Auth Manager] + end + + subgraph "MCPRemoteEndpoint: type=proxy" + ProxyPod[Proxy Pod
OIDC + Authz + Audit] + end + + subgraph "MCPRemoteEndpoint: type=direct" + DirectEntry[Config Only
No pods] + end + + subgraph "External Services" + Remote1[remote.example.com/mcp] + Remote2[public-api.example.com/mcp] + end + + Client -->|Token: aud=vmcp| InAuth + InAuth --> Router + Router --> AuthMgr + + AuthMgr -->|Via proxy pod| ProxyPod + ProxyPod -->|Authenticated HTTPS| Remote1 + + AuthMgr -->|Direct HTTPS| Remote2 + DirectEntry -.->|Declares endpoint| Remote2 + + style DirectEntry fill:#fff3e0,stroke:#ff9800 + style ProxyPod fill:#e3f2fd,stroke:#2196f3 +``` + +### Mode Comparison + +| Capability | `type: proxy` | `type: direct` | +|---|---|---| +| Deploys proxy pod | Yes | No | +| Own OIDC validation | Yes | No (vMCP handles this) | +| Own authz policy | Yes | No | +| Own audit logging | Yes | No (uses vMCP's) | +| Standalone use (without vMCP) | Yes | No | +| Outgoing auth to remote | Yes (`externalAuthConfigRef`) | Yes (`externalAuthConfigRef`) | +| Header forwarding | Yes (`headerForward`) | Yes (`headerForward`) | +| Custom CA bundle | Yes (`caBundleRef`) | Yes (`caBundleRef`) | +| Tool filtering | Yes (`toolConfigRef`) | Yes (`toolConfigRef`) | +| GroupRef support | Yes | Yes | + +### Auth Flow Comparison + +**`type: proxy` — vMCP routes through proxy pod:** + +``` +Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] + -> MCPRemoteEndpoint proxy pod [own OIDC + authz] + externalAuthConfigRef: proxy-to-remote auth + -> Remote Server +``` + +**`type: direct` — vMCP connects directly:** + +``` +Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] + -> MCPRemoteEndpoint: vMCP applies externalAuthConfigRef directly + -> Remote Server (ONE boundary, ONE auth config) +``` + +### Detailed Design + +#### MCPRemoteEndpoint CRD + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPRemoteEndpoint +metadata: + name: context7 + namespace: default +spec: + # REQUIRED: Connectivity mode + # - proxy: deploy a proxy pod with full auth middleware + # - direct: no pod; vMCP connects directly to remoteURL + # +kubebuilder:validation:Enum=proxy;direct + # +kubebuilder:default=proxy + type: direct + + # REQUIRED: URL of the remote MCP server + # Must use HTTPS unless toolhive.stacklok.dev/allow-insecure annotation is set + # +kubebuilder:validation:Pattern=`^https?://` + remoteURL: https://mcp.context7.com/mcp + + # REQUIRED: Transport protocol + # +kubebuilder:validation:Enum=streamable-http;sse + transport: streamable-http + + # REQUIRED: Group membership + # An MCPRemoteEndpoint without a group cannot be discovered by any + # VirtualMCPServer. + groupRef: engineering-team + + # OPTIONAL: Auth for outgoing requests to the remote server. + # Applies to both modes: + # proxy: auth from the proxy pod to the remote + # direct: auth from vMCP to the remote + # Omit entirely for unauthenticated public remotes. + externalAuthConfigRef: + name: salesforce-auth + + # OPTIONAL: Header forwarding configuration. + # Applies to both modes. + headerForward: + addPlaintextHeaders: + X-Tenant-ID: "tenant-123" + addHeadersFromSecret: + - headerName: X-API-Key + valueSecretRef: + name: remote-api-credentials + key: api-key + + # OPTIONAL: Custom CA bundle for private remote servers using + # internal or self-signed certificates. References a ConfigMap + # (not Secret) because CA certificates are public data, consistent + # with the kube-root-ca.crt pattern. + # Applies to both modes. + caBundleRef: + name: internal-ca-bundle + key: ca.crt + + # OPTIONAL: Tool filtering. Applies to both modes. + toolConfigRef: + name: my-tool-config + + # OPTIONAL: Proxy pod configuration. + # Only valid when type: proxy. Ignored when type: direct. + # +kubebuilder:validation:XValidation:rule="self.type == 'direct' ? !has(self.proxyConfig) : true" + proxyConfig: + # REQUIRED within proxyConfig: OIDC for validating incoming tokens + oidcConfig: + type: kubernetes + + # OPTIONAL: Authorization policy + authzConfig: + type: inline + inline: + policies: [...] + + # OPTIONAL: Audit logging + audit: + enabled: true + + # OPTIONAL: Observability + telemetry: + openTelemetry: + enabled: true + + # OPTIONAL: Container resource limits + resources: + limits: + cpu: "500m" + memory: "128Mi" + + # OPTIONAL: Service account + serviceAccount: my-service-account + + # OPTIONAL: Port to expose the proxy on + # +kubebuilder:default=8080 + proxyPort: 8080 + + # OPTIONAL: Session affinity for the proxy Service + # +kubebuilder:validation:Enum=ClientIP;None + # +kubebuilder:default=ClientIP + sessionAffinity: ClientIP + + # OPTIONAL: Trust X-Forwarded-* headers from reverse proxies + # +kubebuilder:default=false + trustProxyHeaders: false + + # OPTIONAL: Path prefix for ingress routing scenarios + endpointPrefix: "" + + # OPTIONAL: Metadata overrides for created resources + resourceOverrides: {} +``` + +**Example: Unauthenticated public remote (direct mode):** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPRemoteEndpoint +metadata: + name: context7 +spec: + type: direct + remoteURL: https://mcp.context7.com/mcp + transport: streamable-http + groupRef: engineering-team + # No externalAuthConfigRef - public endpoint, no auth needed +``` + +**Example: Authenticated remote with token exchange (direct mode):** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPRemoteEndpoint +metadata: + name: salesforce-mcp +spec: + type: direct + remoteURL: https://mcp.salesforce.com + transport: streamable-http + groupRef: engineering-team + externalAuthConfigRef: + name: salesforce-token-exchange +``` + +**Example: Standalone proxy with full auth middleware (proxy mode):** + +```yaml +apiVersion: toolhive.stacklok.dev/v1alpha1 +kind: MCPRemoteEndpoint +metadata: + name: internal-api-mcp +spec: + type: proxy + remoteURL: https://internal-mcp.corp.example.com/mcp + transport: sse + groupRef: engineering-team + externalAuthConfigRef: + name: internal-api-auth + proxyConfig: + oidcConfig: + type: kubernetes + authzConfig: + type: inline + inline: + policies: ["permit(principal, action, resource);"] + audit: + enabled: true +``` + +#### Spec Fields + +**Top-level (both modes):** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `type` | enum | Yes | `proxy` or `direct`. Default: `proxy`. | +| `remoteURL` | string | Yes | URL of the remote MCP server. Must use HTTPS unless `toolhive.stacklok.dev/allow-insecure` annotation is set. | +| `transport` | enum | Yes | MCP transport protocol: `streamable-http` or `sse`. | +| `groupRef` | string | Yes | Name of the MCPGroup this endpoint belongs to. Plain string, consistent with `MCPServer.spec.groupRef` and `MCPRemoteProxy.spec.groupRef`. | +| `externalAuthConfigRef` | object | No | Auth for outgoing requests to the remote server. In `proxy` mode, this is proxy→remote auth. In `direct` mode, this is vMCP→remote auth. Omit for unauthenticated endpoints. | +| `headerForward` | object | No | Header forwarding configuration. Reuses existing `HeaderForwardConfig` type. Applies to both modes. | +| `caBundleRef` | object | No | ConfigMap containing a custom CA certificate bundle for TLS verification. Applies to both modes. | +| `toolConfigRef` | object | No | Tool filtering configuration. Applies to both modes. | + +**`proxyConfig` (only when `type: proxy`):** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `oidcConfig` | object | Yes | OIDC configuration for validating incoming tokens on the proxy pod. | +| `authzConfig` | object | No | Authorization policy for the proxy pod. | +| `audit` | object | No | Audit logging configuration. | +| `telemetry` | object | No | Observability configuration. | +| `resources` | object | No | Container resource requirements. | +| `serviceAccount` | string | No | Existing service account to use. Auto-created if unset. | +| `proxyPort` | int | No | Port to expose the proxy on. Default: 8080. | +| `sessionAffinity` | enum | No | `ClientIP` or `None`. Default: `ClientIP`. | +| `trustProxyHeaders` | bool | No | Trust X-Forwarded-* headers. Default: false. | +| `endpointPrefix` | string | No | Path prefix for ingress routing. | +| `resourceOverrides` | object | No | Metadata overrides for created resources. | + +#### Status Fields + +| Field | Type | Description | +|-------|------|-------------| +| `conditions` | []Condition | Standard Kubernetes conditions. | +| `phase` | string | Current phase: `Pending`, `Ready`, `Failed`, `Terminating`. | +| `url` | string | URL where the endpoint can be accessed. For `type: proxy`, the internal cluster URL of the proxy service. For `type: direct`, echoes `spec.remoteURL`. | +| `observedGeneration` | int64 | Most recent generation observed by the controller. | + +**Condition types:** + +| Type | Purpose | When Set | +|------|---------|----------| +| `Ready` | Overall readiness | Always | +| `GroupRefValid` | Referenced MCPGroup exists | Always | +| `AuthConfigValid` | Referenced MCPExternalAuthConfig exists | When `externalAuthConfigRef` is set | +| `CABundleValid` | Referenced CA bundle ConfigMap exists | When `caBundleRef` is set | +| `DeploymentReady` | Proxy deployment is healthy | Only when `type: proxy` | +| `ConfigurationValid` | Spec has passed all validation checks | Always | + +There is intentionally **no `RemoteReachable` condition**. The controller should +NOT probe remote URLs because reachability from the operator pod does not imply +reachability from the vMCP pod, and probing external URLs expands the operator's +attack surface. + +#### Component Changes + +##### Operator: MCPRemoteEndpoint Controller + +The controller has two code paths based on `spec.type`: + +**`type: proxy` path** — identical to the existing `MCPRemoteProxy` controller: +1. Validates spec (OIDC config, group ref, auth config ref, CA bundle ref) +2. Ensures Deployment, Service, ServiceAccount, RBAC +3. Monitors deployment health and updates `Ready` condition +4. Sets `status.url` to the internal cluster service URL + +**`type: direct` path** — validation only, no infrastructure created: +1. Validates that the referenced MCPGroup exists. Sets `GroupRefValid` condition. +2. If `externalAuthConfigRef` is set, validates the referenced MCPExternalAuthConfig exists. Sets `AuthConfigValid` condition. +3. If `caBundleRef` is set, validates the referenced ConfigMap exists. Sets `CABundleValid` condition. +4. Validates HTTPS requirement; checks for `toolhive.stacklok.dev/allow-insecure` annotation if `remoteURL` uses HTTP. +5. If all validations pass, sets `Ready` to true and sets `status.url = spec.remoteURL`. + +No finalizers are needed for `type: direct` because no infrastructure is created. +`type: proxy` uses the same finalizer pattern as the existing MCPRemoteProxy controller. + +The controller watches MCPGroup, MCPExternalAuthConfig, and (for `type: proxy`) +Deployment resources via `EnqueueRequestsFromMapFunc` handlers, so that changes +to referenced resources trigger re-reconciliation. + +##### Operator: MCPGroup Controller Update + +The MCPGroup controller currently watches MCPServer and MCPRemoteProxy. It must +be updated to also watch MCPRemoteEndpoint resources, replacing the +MCPRemoteProxy watch once the deprecation window closes. + +`status.remoteProxies` is renamed to `status.remoteEndpoints` and +`status.remoteProxyCount` to `status.remoteEndpointCount` as part of this +change. Both old and new fields are populated during the deprecation window. + +##### Operator: VirtualMCPServer Controller Update + +**Static mode:** The ConfigMap generated by the operator must include +MCPRemoteEndpoint backends. For `type: proxy`, the backend URL is the proxy +service URL (same as MCPRemoteProxy today). For `type: direct`, the backend URL +is `spec.remoteURL`. + +```yaml +backends: + # From MCPServer resources (unchanged) + - name: github-mcp + url: http://github-mcp.default.svc:8080 + transport: sse + type: container + + # From MCPRemoteEndpoint type: proxy + - name: internal-api-mcp + url: http://internal-api-mcp.default.svc:8080 + transport: sse + type: proxy + + # From MCPRemoteEndpoint type: direct + - name: context7 + url: https://mcp.context7.com/mcp + transport: streamable-http + type: direct + # No auth - public endpoint + + - name: salesforce-mcp + url: https://mcp.salesforce.com + transport: streamable-http + type: direct + auth: + type: token_exchange + # ... +``` + +**CA bundle propagation in static mode:** For `type: direct` endpoints with +`caBundleRef`, the operator mounts the referenced ConfigMap as a volume into the +vMCP pod at `/etc/toolhive/ca-bundles//ca.crt`. The generated +backend ConfigMap includes the mount path so vMCP can construct the correct +`tls.Config` at startup. + +##### vMCP: Backend Discovery Update + +`ListWorkloadsInGroup()` in `pkg/vmcp/workloads/k8s.go` is extended to list +MCPRemoteEndpoint resources alongside MCPServer resources. `GetWorkloadAsVMCPBackend()` +gains a new `WorkloadTypeMCPRemoteEndpoint` case that: + +1. Fetches the MCPRemoteEndpoint resource. +2. Converts based on type: + - `type: proxy` — uses `status.url` (the proxy service URL), same as current MCPRemoteProxy handling. + - `type: direct` — uses `spec.remoteURL` directly, resolves auth config, header forwarding, and CA bundle. +3. Applies `externalAuthConfigRef`, `headerForward`, and `caBundleRef` at the + vMCP layer (for `type: direct`) or leaves them for the proxy pod (for `type: proxy`). + +##### vMCP: HTTP Client for Direct Mode + +Backends of `type: direct` connect to external URLs over HTTPS. The vMCP HTTP +client is updated to: + +1. Use the system CA certificate pool by default. +2. Optionally append a custom CA bundle from `caBundleRef` to the system pool. +3. Enforce a minimum TLS version of 1.2. +4. Apply `externalAuthConfigRef` credentials to outgoing requests. + +##### vMCP: Dynamic Mode Reconciler Update + +The `BackendReconciler` in `pkg/vmcp/k8s/backend_reconciler.go` currently +watches MCPServer and MCPRemoteProxy. It is extended to also watch +MCPRemoteEndpoint, following the same `EnqueueRequestsFromMapFunc` pattern. +The `fetchBackendResource()` method gains a third resource type to try. + +## Security Considerations + +### Threat Model + +| Threat | Description | Mitigation | +|--------|-------------|------------| +| MITM on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | +| Credential exposure | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets, referenced via `externalAuthConfigRef` and `headerForward.addHeadersFromSecret`; never inline | +| SSRF via remoteURL | Operator configures URL pointing to internal services | RBAC (only authorised users create MCPRemoteEndpoint); HTTPS enforced by default; NetworkPolicy should restrict vMCP pod egress. CEL-based RFC 1918 IP blocking is intentionally omitted because `type: direct` legitimately targets internal/corporate MCP servers — RBAC is the appropriate control layer. | +| Auth config confusion | Wrong tokens sent to wrong endpoints | Eliminated: `externalAuthConfigRef` at the top level has one unambiguous purpose — auth to the remote server | +| Operator probing external URLs | Controller makes network requests to untrusted URLs | Eliminated: controller performs validation only, no network probing | +| Expanded vMCP egress surface | vMCP pod makes outbound calls to arbitrary URLs in `type: direct` mode | Acknowledged trade-off. In `type: proxy`, the proxy pod makes outbound calls and vMCP's blast radius is limited. In `type: direct`, the vMCP pod makes outbound calls directly. Mitigated by NetworkPolicy restricting vMCP egress and RBAC restricting who can create MCPRemoteEndpoint resources. | + +### Authentication and Authorization + +- **No new auth primitives**: `MCPRemoteEndpoint` reuses the existing + `MCPExternalAuthConfig` CRD and `externalAuthConfigRef` pattern. +- **Single boundary in direct mode**: vMCP's incoming auth validates client + tokens. `externalAuthConfigRef` handles outgoing auth to the remote. Cleanly + separated with no dual-purpose confusion. +- **Full auth stack in proxy mode**: identical to existing MCPRemoteProxy — + OIDC validation, authz policy, token exchange all apply. + +### Secrets Management + +- **Dynamic mode**: vMCP reads secrets at runtime via K8s API (namespace-scoped RBAC). +- **Static mode**: Operator mounts credential secrets as environment variables; + CA bundle ConfigMaps are mounted as volumes at well-known paths + (`/etc/toolhive/ca-bundles//ca.crt`). +- Secret rotation in dynamic mode: watch-based propagation, no pod restart needed. +- Secret rotation in static mode: requires Deployment rollout. + +## Deprecation + +`MCPRemoteProxy` is deprecated as of this RFC. The timeline is: + +1. **Now**: `MCPRemoteProxy` receives a deprecation annotation and emits a + Kubernetes Event warning on creation/update. +2. **v1beta1 graduation**: `MCPRemoteEndpoint` graduates to `v1beta1`. + `MCPRemoteProxy` remains in `v1alpha1` with no further feature development. +3. **Future release (TBD)**: `MCPRemoteProxy` CRD is removed after a minimum + two-release deprecation window. + +### Migration: MCPRemoteProxy → MCPRemoteEndpoint + +| `MCPRemoteProxy` field | `MCPRemoteEndpoint` equivalent | +|---|---| +| `spec.remoteURL` | `spec.remoteURL` | +| `spec.transport` | `spec.transport` | +| `spec.groupRef` | `spec.groupRef` | +| `spec.externalAuthConfigRef` | `spec.externalAuthConfigRef` | +| `spec.headerForward` | `spec.headerForward` | +| `spec.toolConfigRef` | `spec.toolConfigRef` | +| `spec.oidcConfig` | `spec.proxyConfig.oidcConfig` | +| `spec.authzConfig` | `spec.proxyConfig.authzConfig` | +| `spec.audit` | `spec.proxyConfig.audit` | +| `spec.telemetry` | `spec.proxyConfig.telemetry` | +| `spec.resources` | `spec.proxyConfig.resources` | +| `spec.serviceAccount` | `spec.proxyConfig.serviceAccount` | +| `spec.proxyPort` | `spec.proxyConfig.proxyPort` | +| `spec.sessionAffinity` | `spec.proxyConfig.sessionAffinity` | +| `spec.trustProxyHeaders` | `spec.proxyConfig.trustProxyHeaders` | +| `spec.endpointPrefix` | `spec.proxyConfig.endpointPrefix` | +| `spec.resourceOverrides` | `spec.proxyConfig.resourceOverrides` | + +## Alternatives Considered + +### Alternative 1: Keep MCPServerEntry Alongside MCPRemoteProxy (THV-0055) + +Introduce a new `MCPServerEntry` CRD for `type: direct` behaviour while +retaining `MCPRemoteProxy` unchanged. + +**Pros:** +- No breaking changes, no deprecation work +- MCPRemoteProxy remains "focused" on proxy pod management + +**Cons:** +- Two CRDs serving the same high-level goal ("connect to a remote server") with + overlapping fields (`remoteURL`, `groupRef`, `externalAuthConfigRef`, + `headerForward`) +- Users must choose between them before writing YAML; the distinction is subtle + and non-obvious +- Grows the long-term CRD surface area +- Requires new controller, new RBAC rules, Helm updates, MCPGroup controller + changes, BackendReconciler changes, new tests — roughly 20 files + +**Why not chosen:** Two CRDs with overlapping goals increases cognitive load +rather than reducing it. The goal of connecting to a remote server should be +one resource; the mechanism is a configuration choice within it. + +### Alternative 2: Add `direct: true` Flag to MCPRemoteProxy + +Add a boolean field to MCPRemoteProxy that skips pod creation and uses +`remoteURL` as the backend URL directly. + +**Pros:** +- No new CRD +- Smallest implementation footprint + +**Cons:** +- MCPRemoteProxy has ~9 fields that are pod-deployment-specific + (`proxyPort`, `resources`, `serviceAccount`, `sessionAffinity`, + `trustProxyHeaders`, `endpointPrefix`, `audit`, `telemetry`, + `resourceOverrides`). In `direct: true` mode these are all inapplicable and + confusing +- A resource named "MCPRemoteProxy" that doesn't deploy a proxy is semantically + misleading +- Field pollution is high enough that it would effectively require a v1alpha2 + +**Why not chosen:** Field pollution creates exactly the "which fields apply +when" confusion the RFC aims to avoid. The typed sub-config approach in +`MCPRemoteEndpoint` solves this cleanly. + +### Alternative 3: Inline Remote Backends in VirtualMCPServer + +Embed remote server configuration directly in the VirtualMCPServer spec. + +**Pros:** +- No new CRD needed + +**Cons:** +- Violates separation of concerns +- Prevents fine-grained RBAC: only VirtualMCPServer editors can manage remote backends +- Editing a remote backend requires touching VirtualMCPServer, which triggers + reconciliation of unrelated configuration + +**Why not chosen:** Inconsistent with existing patterns and prevents RBAC +separation. + +## Compatibility + +### Backward Compatibility + +- `MCPRemoteProxy` continues to function during the deprecation window. +- `MCPServer` is unchanged. +- `VirtualMCPServer`, `MCPGroup`, and `MCPExternalAuthConfig` receive additive + changes only (new watches, new status fields populated alongside old ones). +- No migration is required immediately. + +### Forward Compatibility + +- `MCPRemoteEndpoint` starts at `v1alpha1`, on a graduation path to `v1beta1` + as part of the broader CRD revamp. +- The `type` field and typed sub-configs allow future modes to be added without + breaking changes. +- Removing `MCPRemoteProxy` follows the standard two-release deprecation window. + +## Implementation Plan + +### Phase 0: MCPRemoteProxy Deprecation Markers + +1. Add `+kubebuilder:deprecatedversion` annotation to MCPRemoteProxy +2. Emit a Kubernetes Event warning when MCPRemoteProxy is created or updated +3. Update documentation to direct users to MCPRemoteEndpoint + +### Phase 1: CRD and Controller + +1. Define `MCPRemoteEndpoint` CRD types with `type`, shared fields, and `proxyConfig` +2. Implement controller with both code paths (`type: proxy` reusing MCPRemoteProxy + controller logic; `type: direct` validation only) +3. Generate CRD manifests and update Helm chart +4. Update MCPGroup controller to watch MCPRemoteEndpoint +5. Add unit tests for both controller paths + +### Phase 2: Static Mode Integration + +1. Update VirtualMCPServer controller to discover MCPRemoteEndpoint resources +2. Update ConfigMap generation to include both proxy and direct backend types +3. Mount CA bundle ConfigMaps as volumes for `type: direct` endpoints with `caBundleRef` +4. Update vMCP static config parser to deserialise both backend types +5. Implement external TLS transport for `type: direct` backends +6. Integration tests with envtest + +### Phase 3: Dynamic Mode Integration + +1. Extend `BackendReconciler` to watch MCPRemoteEndpoint +2. Extend `ListWorkloadsInGroup()` and `GetWorkloadAsVMCPBackend()` in + `pkg/vmcp/workloads/k8s.go` +3. Register watcher in the K8s manager +4. Integration tests for dynamic discovery + +### Phase 4: Documentation and E2E + +1. CRD reference documentation for MCPRemoteEndpoint +2. Migration guide: MCPRemoteProxy → MCPRemoteEndpoint +3. User guide covering both modes with examples +4. E2E Chainsaw tests for full lifecycle (both modes) +5. E2E tests for mixed MCPServer + MCPRemoteEndpoint groups + +### Dependencies + +- THV-0014 (K8s-Aware vMCP) for dynamic mode support (Phase 3) +- Broader CRD revamp / v1beta1 graduation work + +## Testing Strategy + +### Unit Tests + +- Controller validation for both modes +- CEL validation rules (proxyConfig rejected when type: direct) +- CRD type serialisation/deserialisation +- Backend conversion for both types +- External TLS transport creation with and without custom CA bundles +- Static config parsing for both backend types + +### Integration Tests (envtest) + +- MCPRemoteEndpoint controller reconciliation for `type: proxy` +- MCPRemoteEndpoint controller reconciliation for `type: direct` +- VirtualMCPServer ConfigMap generation with both types +- MCPGroup status update with MCPRemoteEndpoint members +- Dynamic mode: BackendReconciler handling MCPRemoteEndpoint +- MCPRemoteProxy deprecation warning event emission + +### End-to-End Tests (Chainsaw) + +- Full lifecycle: `type: proxy` (functional parity with MCPRemoteProxy) +- Full lifecycle: `type: direct` with unauthenticated public remote +- Full lifecycle: `type: direct` with token exchange auth +- Mixed group: MCPServer + MCPRemoteEndpoint (both types) in same group +- MCPRemoteEndpoint deletion removes backend from vMCP +- CA bundle configuration for private remotes + +## Open Questions + +1. **Should `groupRef` be required on MCPRemoteEndpoint?** + Recommendation: Yes, consistent with the reasoning that an endpoint without + a group is unreachable. As a follow-up, consider making `groupRef` required + on MCPServer and MCPRemoteProxy too for consistency. + +2. **When should MCPRemoteProxy be removed?** + Recommendation: After two releases post-MCPRemoteEndpoint GA. Track as a + separate issue once Phase 1 is merged. + +3. **Should `toolConfigRef` be in the shared fields or mode-specific?** + Recommendation: Shared top-level field. Tool filtering applies equally to + both modes and is already supported in `VirtualMCPServer.spec.aggregation.tools` + as a fallback. + +4. **Should there be a `disabled` field?** + Recommendation: Defer. Users can remove the resource or change `groupRef` + (which, unlike the original MCPServerEntry proposal, is not required to be + non-empty once the resource is created — removal from the group is a valid + operation). Revisit based on post-implementation feedback. + +## References + +- [THV-0055: MCPServerEntry CRD](./THV-0055-mcpserverentry-direct-remote-backends.md) — + superseded by this RFC +- [THV-0008: Virtual MCP Server](./THV-0008-virtual-mcp-server.md) +- [THV-0009: Remote MCP Server Proxy](./THV-0009-remote-mcp-proxy.md) +- [THV-0010: MCPGroup CRD](./THV-0010-kubernetes-mcpgroup-crd.md) +- [THV-0014: K8s-Aware vMCP](./THV-0014-vmcp-k8s-aware-refactor.md) +- [THV-0026: Header Passthrough](./THV-0026-header-passthrough.md) +- [Kubernetes API Conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md) +- [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104) +- [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) + +--- + +## RFC Lifecycle + +| Date | Reviewer | Decision | Notes | +|------|----------|----------|-------| +| 2026-03-18 | @ChrisJBurns, @jaosorior | Draft | Initial submission, supersedes THV-0055 | From c2965b7a1a36d5d1c28694ccab30a663cd2094a7 Mon Sep 17 00:00:00 2001 From: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Date: Wed, 18 Mar 2026 17:09:03 +0000 Subject: [PATCH 07/15] removes direct backend rfc Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> --- ...5-mcpserverentry-direct-remote-backends.md | 904 ------------------ 1 file changed, 904 deletions(-) delete mode 100644 rfcs/THV-0055-mcpserverentry-direct-remote-backends.md diff --git a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md b/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md deleted file mode 100644 index 511f799..0000000 --- a/rfcs/THV-0055-mcpserverentry-direct-remote-backends.md +++ /dev/null @@ -1,904 +0,0 @@ -# RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends - -- **Status**: Draft -- **Author(s)**: Juan Antonio Osorio (@jaosorior) -- **Created**: 2026-03-12 -- **Last Updated**: 2026-03-12 -- **Target Repository**: toolhive -- **Related Issues**: [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104), [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) - -## Summary - -Introduce a new `MCPServerEntry` CRD (short name: `mcpentry`) that allows -VirtualMCPServer to connect directly to remote MCP servers without deploying -MCPRemoteProxy infrastructure. MCPServerEntry is a lightweight, pod-less -configuration resource that declares a remote MCP endpoint and belongs to an -MCPGroup, enabling vMCP to reach remote servers with a single auth boundary -and zero additional pods. - -## Problem Statement - -vMCP currently relies on MCPRemoteProxy (which spawns `thv-proxyrunner` pods) -to reach remote MCP servers. This architecture creates three concrete problems: - -### 1. Forced Authentication on Public Remotes (Issue #3104) - -MCPRemoteProxy requires OIDC authentication configuration even when vMCP -already handles client authentication at its own boundary. This blocks -unauthenticated public remote MCP servers (e.g., context7, public API -gateways) from being placed behind vMCP without configuring unnecessary -auth on the proxy layer. - -### 2. Dual Auth Boundary Confusion (Issue #4109) - -MCPRemoteProxy's single `externalAuthConfigRef` field is used for both the -vMCP-to-proxy boundary AND the proxy-to-remote boundary. When vMCP needs -to authenticate to the remote server through the proxy, token exchange -becomes circular or broken because the same auth config serves two -conflicting purposes: - -``` -Client -> vMCP [boundary 1: client auth] - -> MCPRemoteProxy [boundary 2: vMCP auth + remote auth on SAME config] - -> Remote Server -``` - -The operator cannot express "use auth X for the proxy and auth Y for the -remote" because there is only one `externalAuthConfigRef`. - -### 3. Resource Waste - -Every remote MCP server behind vMCP requires a full Deployment + Service + -Pod just to make an HTTP call that vMCP could make directly. For -organizations with many remote MCP backends, this creates unnecessary -infrastructure cost and operational overhead. - -### Who Is Affected - -- **Platform teams** deploying vMCP with remote MCP backends in Kubernetes -- **Product teams** wanting to register external MCP services behind vMCP -- **Organizations** running public or unauthenticated remote MCP servers - behind vMCP for aggregation - -## Goals - -- Enable vMCP to connect directly to remote MCP servers without - MCPRemoteProxy in the path -- Eliminate the dual auth boundary confusion by providing a single, - unambiguous auth config for the vMCP-to-remote boundary -- Allow unauthenticated remote MCP servers behind vMCP without workarounds -- Deploy zero additional infrastructure (no pods, services, or deployments) - for remote backend declarations -- Follow existing Kubernetes patterns (groupRef, externalAuthConfigRef) - consistent with MCPServer - -## Non-Goals - -- **Deprecating MCPRemoteProxy**: MCPRemoteProxy remains valuable for - standalone proxy use cases with its own auth middleware, audit logging, - and observability. MCPServerEntry is specifically for "behind vMCP" use - cases. -- **Adding health probing from the operator**: The operator controller - should NOT probe remote URLs. Reachability from the operator pod does not - imply reachability from the vMCP pod, and probing expands the operator's - attack surface. Health checking belongs in vMCP's existing runtime - infrastructure (`healthCheckInterval`, circuit breaker). -- **Cross-namespace references**: MCPServerEntry follows the same - namespace-scoped patterns as other ToolHive CRDs. -- **Supporting stdio or container-based transports**: MCPServerEntry is - exclusively for remote HTTP-based MCP servers. -- **CLI mode support**: MCPServerEntry is a Kubernetes-only CRD. CLI mode - already supports remote backends via direct configuration. - -## Proposed Solution - -### High-Level Design - -Introduce a new `MCPServerEntry` CRD that acts as a catalog entry for a -remote MCP endpoint. The naming follows the Istio `ServiceEntry` pattern, -communicating "this is a catalog entry, not an active workload." - -```mermaid -graph TB - subgraph "Client Layer" - Client[MCP Client] - end - - subgraph "Virtual MCP Server" - InAuth[Incoming Auth
Validates: aud=vmcp] - Router[Request Router] - AuthMgr[Backend Auth Manager] - end - - subgraph "Backend Layer (In-Cluster)" - MCPServer1[MCPServer: github-mcp
Pod + Service] - MCPServer2[MCPServer: jira-mcp
Pod + Service] - end - - subgraph "Backend Layer (Remote)" - Entry1[MCPServerEntry: context7
No pods - config only] - Entry2[MCPServerEntry: salesforce
No pods - config only] - end - - subgraph "External Services" - Remote1[context7.com/mcp] - Remote2[mcp.salesforce.com] - end - - Client -->|Token: aud=vmcp| InAuth - InAuth --> Router - Router --> AuthMgr - - AuthMgr -->|In-cluster call| MCPServer1 - AuthMgr -->|In-cluster call| MCPServer2 - AuthMgr -->|Direct HTTPS
+ externalAuthConfig| Remote1 - AuthMgr -->|Direct HTTPS
+ externalAuthConfig| Remote2 - - Entry1 -.->|Declares endpoint| Remote1 - Entry2 -.->|Declares endpoint| Remote2 - - style Entry1 fill:#fff3e0,stroke:#ff9800 - style Entry2 fill:#fff3e0,stroke:#ff9800 - style MCPServer1 fill:#e3f2fd,stroke:#2196f3 - style MCPServer2 fill:#e3f2fd,stroke:#2196f3 -``` - -The key insight is that MCPServerEntry deploys **no infrastructure**. It is -pure configuration that tells vMCP "there is a remote MCP server at this -URL, use this auth to reach it." VirtualMCPServer discovers MCPServerEntry -resources the same way it discovers MCPServer resources: via `groupRef`. - -### Auth Flow Comparison - -**Current (with MCPRemoteProxy) - Two boundaries, one config:** - -``` -Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] - -> MCPRemoteProxy [deploys pod] - externalAuthConfigRef used for BOTH: - - vMCP-to-proxy auth (boundary 2a) - - proxy-to-remote auth (boundary 2b) - -> Remote Server -``` - -**Proposed (with MCPServerEntry) - One clean boundary:** - -``` -Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] - -> MCPServerEntry: vMCP applies externalAuthConfigRef directly - -> Remote Server - (ONE boundary, ONE auth config, no confusion) -``` - -```mermaid -sequenceDiagram - participant Client - participant vMCP as Virtual MCP Server - participant IDP as Identity Provider - participant Remote as Remote MCP Server - - Client->>vMCP: MCP Request
Authorization: Bearer token (aud=vmcp) - - Note over vMCP: Validate incoming token
(existing auth middleware) - - Note over vMCP: Look up MCPServerEntry
for target backend - - alt externalAuthConfigRef is set - vMCP->>IDP: Token exchange request
(per MCPExternalAuthConfig) - IDP-->>vMCP: Exchanged token (aud=remote-api) - vMCP->>Remote: Forward request
Authorization: Bearer exchanged-token - else No auth configured (public remote) - vMCP->>Remote: Forward request
(no Authorization header) - end - - Remote-->>vMCP: MCP Response - vMCP-->>Client: Response -``` - -### Detailed Design - -#### MCPServerEntry CRD - -```yaml -apiVersion: toolhive.stacklok.dev/v1alpha1 -kind: MCPServerEntry -metadata: - name: context7 - namespace: default -spec: - # REQUIRED: URL of the remote MCP server - remoteURL: https://mcp.context7.com/mcp - - # REQUIRED: Transport protocol - # +kubebuilder:validation:Enum=streamable-http;sse - transport: streamable-http - - # REQUIRED: Group membership (unlike MCPServer where it's optional) - # An MCPServerEntry without a group is dead config - it cannot be - # discovered by any VirtualMCPServer. - groupRef: engineering-team - - # OPTIONAL: Auth configuration for reaching the remote server. - # Omit entirely for unauthenticated public remotes (resolves #3104). - # Single unambiguous purpose: auth to the remote (resolves #4109). - externalAuthConfigRef: - name: salesforce-auth - - # OPTIONAL: Header forwarding configuration. - # Reuses existing pattern from MCPRemoteProxy (THV-0026). - headerForward: - addPlaintextHeaders: - X-Tenant-ID: "tenant-123" - addHeadersFromSecrets: - - headerName: X-API-Key - valueSecretRef: - name: remote-api-credentials - key: api-key - - # OPTIONAL: Custom CA bundle for private remote servers using - # internal/self-signed certificates. References a ConfigMap (not Secret) - # because CA certificates are public data. - caBundleRef: - name: internal-ca-bundle - key: ca.crt -``` - -**Example: Unauthenticated public remote (resolves #3104):** - -```yaml -apiVersion: toolhive.stacklok.dev/v1alpha1 -kind: MCPServerEntry -metadata: - name: context7 -spec: - remoteURL: https://mcp.context7.com/mcp - transport: streamable-http - groupRef: engineering-team - # No externalAuthConfigRef - public endpoint, no auth needed -``` - -**Example: Authenticated remote with token exchange:** - -```yaml -apiVersion: toolhive.stacklok.dev/v1alpha1 -kind: MCPServerEntry -metadata: - name: salesforce-mcp -spec: - remoteURL: https://mcp.salesforce.com - transport: streamable-http - groupRef: engineering-team - externalAuthConfigRef: - name: salesforce-token-exchange ---- -apiVersion: toolhive.stacklok.dev/v1alpha1 -kind: MCPExternalAuthConfig -metadata: - name: salesforce-token-exchange -spec: - type: tokenExchange - tokenExchange: - tokenUrl: https://keycloak.company.com/realms/myrealm/protocol/openid-connect/token - clientId: salesforce-exchange - clientSecretRef: - name: salesforce-oauth - key: client-secret - audience: mcp.salesforce.com - scopes: ["mcp:read", "mcp:write"] -``` - -**Example: Remote with static header auth:** - -```yaml -apiVersion: toolhive.stacklok.dev/v1alpha1 -kind: MCPServerEntry -metadata: - name: internal-api-mcp -spec: - remoteURL: https://internal-mcp.corp.example.com/mcp - transport: sse - groupRef: engineering-team - headerForward: - addHeadersFromSecrets: - - headerName: Authorization - valueSecretRef: - name: internal-api-token - key: bearer-token - caBundleRef: - name: corp-ca-bundle - key: ca.crt -``` - -#### CRD Type Definitions - -The `MCPServerEntry` CRD type is defined in -`cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go`. It follows the -standard kubebuilder pattern with `Spec` and `Status` subresources. - -The resource uses the short name `mcpentry` and exposes print columns for -URL, Transport, Group, Ready status, and Age. - -**Spec fields:** - -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| `remoteURL` | string | Yes | URL of the remote MCP server. Must match `^https?://`. HTTPS enforced unless `toolhive.stacklok.dev/allow-insecure` annotation is set. | -| `transport` | enum | Yes | MCP transport protocol: `streamable-http` or `sse`. | -| `groupRef` | string | Yes | Name of the MCPGroup this entry belongs to (min length: 1). Uses a plain string (not `LocalObjectReference`) for consistency with `MCPServer.spec.groupRef` and `MCPRemoteProxy.spec.groupRef`. | -| `externalAuthConfigRef` | object | No | Reference to an MCPExternalAuthConfig in the same namespace. Omit for unauthenticated endpoints. | -| `headerForward` | object | No | Header forwarding configuration. Reuses existing `HeaderForwardConfig` type from MCPRemoteProxy. | -| `caBundleRef` | object | No | Reference to a ConfigMap containing a custom CA certificate bundle for TLS verification. ConfigMap is used rather than Secret because CA certificates are public data, consistent with the `kube-root-ca.crt` pattern. | - -**Status fields:** - -| Field | Type | Description | -|-------|------|-------------| -| `conditions` | []Condition | Standard Kubernetes conditions (see table below). | -| `observedGeneration` | int64 | Most recent generation observed by the controller. | - -**Condition types:** - -| Type | Purpose | When Set | -|------|---------|----------| -| `Ready` | Overall readiness | Always | -| `GroupRefValid` | Referenced MCPGroup exists | Always | -| `AuthConfigValid` | Referenced MCPExternalAuthConfig exists | Only when `externalAuthConfigRef` is set | -| `CABundleValid` | Referenced CA bundle exists | Only when `caBundleRef` is set | - -There is intentionally **no `RemoteReachable` condition**. The controller -should NOT probe remote URLs because: - -1. Reachability from the operator pod does not imply reachability from the - vMCP pod (different network policies, egress rules, DNS resolution). -2. Probing external URLs from the operator expands its attack surface and - requires egress network access it may not have. -3. It gives false confidence: a probe succeeding now doesn't mean it will - succeed when vMCP makes the actual request. -4. vMCP already has health checking infrastructure (`healthCheckInterval`, - circuit breaker) that operates at the right layer. - -#### Status Example - -```yaml -status: - conditions: - - type: Ready - status: "True" - reason: ValidationSucceeded - message: "MCPServerEntry is valid and ready for discovery" - lastTransitionTime: "2026-03-12T10:00:00Z" - - type: GroupRefValid - status: "True" - reason: GroupExists - message: "MCPGroup 'engineering-team' exists" - lastTransitionTime: "2026-03-12T10:00:00Z" - - type: AuthConfigValid - status: "True" - reason: AuthConfigExists - message: "MCPExternalAuthConfig 'salesforce-auth' exists" - lastTransitionTime: "2026-03-12T10:00:00Z" - observedGeneration: 1 -``` - -#### Component Changes - -##### Operator: New CRD and Controller - -The MCPServerEntry controller is intentionally simple. It performs -**validation only** and creates **no infrastructure**. - -The reconciliation logic: - -1. Fetches the MCPServerEntry resource (ignores not-found for deletions). -2. Validates that the referenced MCPGroup exists in the same namespace. - Sets `GroupRefValid` condition accordingly. -3. If `externalAuthConfigRef` is set, validates that the referenced - MCPExternalAuthConfig exists. Sets `AuthConfigValid` condition. -4. Validates the HTTPS requirement: if `remoteURL` does not use HTTPS, - the controller checks for the `toolhive.stacklok.dev/allow-insecure` - annotation. Without it, the `Ready` condition is set to false with - reason `InsecureURL`. -5. If all validations pass, sets `Ready` to true with reason - `ValidationSucceeded`. - -The controller watches MCPGroup and MCPExternalAuthConfig resources via -`EnqueueRequestsFromMapFunc` handlers, so that changes to referenced -resources trigger re-validation of affected MCPServerEntry resources. - -No finalizers are needed because MCPServerEntry creates no infrastructure -to clean up. - -##### Operator: MCPGroup Controller Update - -The MCPGroup controller must be updated to watch MCPServerEntry resources -in addition to MCPServer resources, so that `status.servers` and -`status.serverCount` reflect both types of backends in the group. - -##### Operator: VirtualMCPServer Controller Update - -**Static mode (`outgoingAuth.source: inline`):** The operator generates -the ConfigMap that vMCP reads at startup. This ConfigMap must now include -MCPServerEntry backends alongside MCPServer backends. - -The controller discovers MCPServerEntry resources in the group and -serializes them as remote backend entries in the ConfigMap: - -```yaml -# Generated ConfigMap content -backends: - # From MCPServer resources (existing) - - name: github-mcp - url: http://github-mcp.default.svc:8080 - transport: sse - type: container - auth: - type: token_exchange - # ... - - # From MCPServerEntry resources (new) - - name: context7 - url: https://mcp.context7.com/mcp - transport: streamable-http - type: entry # New backend type - # No auth - public endpoint - - - name: salesforce-mcp - url: https://mcp.salesforce.com - transport: streamable-http - type: entry - auth: - type: token_exchange - # ... -``` - -##### vMCP: Backend Type and Discovery - -A new `BackendTypeEntry` constant (`"entry"`) is added to -`pkg/vmcp/types.go` alongside the existing `BackendTypeContainer` and -`BackendTypeProxy`. - -The `ListWorkloadsInGroup()` function in `pkg/vmcp/workloads/k8s.go` is -extended to discover MCPServerEntry resources in addition to MCPServer -resources. For each MCPServerEntry in the group, vMCP: - -1. Lists MCPServerEntry resources filtered by `spec.groupRef`. -2. Converts each entry to an internal `Backend` struct using the entry's - `remoteURL`, `transport`, and name. -3. If `externalAuthConfigRef` is set, loads the referenced - MCPExternalAuthConfig spec and stores the auth strategy (token exchange - endpoint, client credentials reference, audience) in the `Backend` - struct. Actual token exchange is deferred to request time because - tokens are short-lived and may be per-user. -4. Resolves `headerForward` configuration if set. -5. Resolves `caBundleRef` if set (fetching the CA certificate from the - referenced Secret). -6. Appends the resulting backends alongside MCPServer-sourced backends. - -##### vMCP: HTTP Client for External TLS - -Backends of type `entry` connect to external URLs over HTTPS. The vMCP -HTTP client in `pkg/vmcp/client/client.go` must be updated to: - -1. Use the system CA certificate pool by default (for public CAs). -2. Optionally append a custom CA bundle from `caBundleRef` (for private - CAs) to the system pool. -3. Enforce a minimum TLS version of 1.2. -4. Apply the resolved `externalAuthConfigRef` credentials directly to - outgoing requests. - -##### vMCP: Dynamic Mode Reconciler Update - -For dynamic mode (`outgoingAuth.source: discovered`), the reconciler -infrastructure from THV-0014 must be extended to watch MCPServerEntry -resources. - -The `MCPServerEntryWatcher` follows the same reconciler pattern as the -existing `MCPServerWatcher` from THV-0014. It holds a reference to the -`DynamicRegistry` and the target `groupRef`. On reconciliation: - -1. If the resource is deleted (not found), it removes the backend from the - registry by namespaced name. -2. If the entry's `groupRef` doesn't match the watcher's group, it removes - the backend (handles group reassignment). -3. Otherwise, it converts the MCPServerEntry to a `Backend` struct - (resolving auth, headers, CA bundle) and upserts it into the registry. - -The watcher also watches MCPExternalAuthConfig and Secret resources via -`EnqueueRequestsFromMapFunc` handlers, so changes to referenced auth -configs or secrets trigger re-reconciliation of affected entries. - -##### vMCP: Static Config Parser Update - -The static config parser must be updated to deserialize `type: entry` -backends from the ConfigMap and create appropriate HTTP clients with -external TLS support. - -## Security Considerations - -### Threat Model - -| Threat | Description | Mitigation | -|--------|-------------|------------| -| Man-in-the-middle on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | -| Credential exposure in CRD spec | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets, referenced via `externalAuthConfigRef` and `headerForward.addHeadersFromSecrets`; never inline in CRD spec | -| SSRF via remoteURL | Operator configures URL pointing to internal services | Mitigated by RBAC (only authorized users create MCPServerEntry); annotation required for non-HTTPS; NetworkPolicy should restrict vMCP egress. Note: CEL-based IP range blocking (e.g., RFC 1918) is intentionally not applied because MCPServerEntry legitimately targets internal/corporate MCP servers. RBAC is the appropriate control layer since resource creation is restricted to trusted operators. | -| Auth config confusion (existing issue) | Dual-boundary auth leading to wrong tokens sent to wrong endpoints | Eliminated: MCPServerEntry has exactly one auth boundary with one purpose | -| Operator probing external URLs | Controller making network requests to untrusted URLs | Eliminated: controller performs validation only, no network probing | - -### Authentication and Authorization - -- **No new auth primitives**: MCPServerEntry reuses the existing - `MCPExternalAuthConfig` CRD and `externalAuthConfigRef` pattern. -- **Single boundary**: vMCP's incoming auth validates client tokens. - MCPServerEntry's `externalAuthConfigRef` handles outgoing auth to - the remote. These are cleanly separated. -- **RBAC**: Standard Kubernetes RBAC controls who can create/modify - MCPServerEntry resources. This enables fine-grained access: platform - teams manage VirtualMCPServer, product teams register MCPServerEntry - backends. -- **No privilege escalation**: MCPServerEntry grants no additional - permissions beyond what the referenced MCPExternalAuthConfig already - provides. - -### Data Security - -- **In transit**: HTTPS required for remote connections (with annotation - escape hatch for development). -- **At rest**: No sensitive data stored in MCPServerEntry spec. Auth - credentials are in K8s Secrets, referenced indirectly. -- **CA bundles**: Custom CA certificates referenced via `caBundleRef`, - stored in K8s ConfigMaps. CA certificates are public data and do not - require Secret-level protection. - -### Input Validation - -- **remoteURL**: Must match `^https?://` pattern. HTTPS enforced unless - annotation override. Validated by both CRD CEL rules and controller - reconciliation. -- **transport**: Enum validation (`streamable-http` or `sse`). -- **groupRef**: Required, validated to reference an existing MCPGroup. -- **externalAuthConfigRef**: When set, validated to reference an existing - MCPExternalAuthConfig. -- **headerForward**: Uses the same restricted header blocklist and - validation as MCPRemoteProxy (THV-0026). - -### Secrets Management - -- MCPServerEntry follows the same secret access patterns as MCPServer: - - **Dynamic mode**: vMCP reads secrets at runtime via K8s API - (namespace-scoped RBAC). - - **Static mode**: Operator mounts secrets as environment variables. -- **CA bundle propagation** differs from credential secrets because CA - certificates are multi-line PEM data that must be loaded from the - filesystem (Go's `crypto/tls` loads CA bundles via file reads, not - environment variables): - - **Dynamic mode**: vMCP reads the CA bundle data from the K8s API - at runtime (from the ConfigMap referenced by `caBundleRef`). - - **Static mode**: The operator mounts the ConfigMap referenced by - `caBundleRef` as a **volume** into the vMCP pod at a well-known - path (e.g., `/etc/toolhive/ca-bundles//ca.crt`). The - generated backend ConfigMap includes the mount path so vMCP can - construct the `tls.Config` at startup. -- Secret rotation follows existing patterns: - - **Dynamic mode**: Watch-based propagation, no pod restart needed. - - **Static mode**: Requires pod restart (Deployment rollout). - -### Audit and Logging - -- vMCP's existing audit middleware logs all requests routed to - MCPServerEntry backends, including user identity and target tool. -- The operator controller logs validation results (group existence, - auth config existence) at standard log levels. -- No sensitive data (URLs with credentials, auth tokens) is logged. - -### Mitigations - -1. **HTTPS enforcement**: Default requires HTTPS; annotation override - requires explicit operator action. -2. **No network probing**: Controller never connects to remote URLs. -3. **Single auth boundary**: Eliminates dual-boundary confusion. -4. **Existing patterns**: Reuses battle-tested secret access, RBAC, - and auth patterns from MCPServer. -5. **NetworkPolicy recommendation**: Documentation recommends restricting - vMCP pod egress to known remote endpoints. -6. **No new attack surface**: Zero additional pods deployed. - -## Alternatives Considered - -### Alternative 1: Add `remoteServerRefs` to VirtualMCPServer Spec - -Embed remote server configuration directly in the VirtualMCPServer CRD. - -```yaml -kind: VirtualMCPServer -spec: - groupRef: engineering-team - remoteServerRefs: - - name: context7 - remoteURL: https://mcp.context7.com/mcp - transport: streamable-http - - name: salesforce - remoteURL: https://mcp.salesforce.com - transport: streamable-http - externalAuthConfigRef: - name: salesforce-auth -``` - -**Pros:** -- No new CRD needed -- Simple for small deployments - -**Cons:** -- Violates separation of concerns: VirtualMCPServer manages aggregation, - not backend declaration -- Breaks the `groupRef` discovery pattern: some backends discovered via - group, others embedded inline -- Bloats VirtualMCPServer spec -- Prevents independent lifecycle management: adding/removing a remote - backend requires editing the VirtualMCPServer, which may trigger - reconciliation of unrelated configuration -- Prevents fine-grained RBAC: only VirtualMCPServer editors can manage - remote backends - -**Why not chosen:** Inconsistent with existing patterns and prevents the -RBAC separation that makes MCPServerEntry valuable (platform teams manage -vMCP, product teams register backends). - -### Alternative 2: Extend MCPServer with Remote Mode - -Add a `mode: remote` field to the existing MCPServer CRD. - -```yaml -kind: MCPServer -spec: - mode: remote - remoteURL: https://mcp.context7.com/mcp - transport: streamable-http - groupRef: engineering-team -``` - -**Pros:** -- No new CRD -- Reuses existing MCPServer controller infrastructure - -**Cons:** -- MCPServer is fundamentally a container workload resource. Adding a - "don't deploy anything" mode creates confusing semantics: `spec.image` - becomes optional, `spec.resources` is meaningless, status conditions - designed for pod lifecycle don't apply. -- Controller logic becomes complex with conditional paths for - container vs remote modes. -- Existing MCPServer watchers (MCPGroup controller, VirtualMCPServer - controller) would need to handle both modes, adding complexity. -- The controller currently creates Deployments, Services, and ConfigMaps. - Adding a mode that creates none of these is a significant semantic - change. - -**Why not chosen:** Overloading MCPServer with remote-mode semantics -increases complexity and confusion. A separate CRD with clear "this is -configuration only" semantics is cleaner. - -### Alternative 3: Configure Remote Backends Only in vMCP Config - -Handle remote backends entirely in vMCP's configuration (ConfigMap or -runtime discovery) without a CRD. - -**Pros:** -- No CRD changes needed -- Simpler operator - -**Cons:** -- No Kubernetes-native resource to represent remote backends -- No status reporting, no `kubectl get` visibility -- No RBAC for who can manage remote backends -- Breaks the pattern where all backends are discoverable via `groupRef` -- MCPGroup status cannot reflect remote backends - -**Why not chosen:** Loses Kubernetes-native management, visibility, and -access control. - -## Compatibility - -### Backward Compatibility - -MCPServerEntry is a purely additive change: - -- **No changes to existing CRDs**: MCPServer, MCPRemoteProxy, - VirtualMCPServer, MCPGroup, and MCPExternalAuthConfig are unchanged. -- **No changes to existing behavior**: VirtualMCPServer continues to - discover MCPServer resources via `groupRef`. MCPServerEntry adds a - new discovery source alongside the existing one. -- **MCPRemoteProxy still works**: Organizations using MCPRemoteProxy - can continue to do so. MCPServerEntry is an alternative, not a - replacement. -- **No migration required**: Existing deployments work without - modification after the upgrade. - -### Forward Compatibility - -- **Extensibility**: The `MCPServerEntrySpec` can be extended with - additional fields (e.g., rate limiting, tool filtering) without - breaking changes. -- **API versioning**: Starts at `v1alpha1`, consistent with all other - ToolHive CRDs. -- **Future deprecation path**: If MCPRemoteProxy use cases are eventually - subsumed, MCPServerEntry provides a clean migration target. - -## Implementation Plan - -### Phase 1: CRD and Controller - -1. Define `MCPServerEntry` CRD types -2. Implement validation-only controller -3. Generate CRD manifests -4. Update MCPGroup controller to watch MCPServerEntry resources -5. Add unit tests for controller validation logic - -### Phase 2: Static Mode Integration - -1. Update VirtualMCPServer controller to discover MCPServerEntry resources - in the group -2. Update ConfigMap generation to include entry-type backends -3. Mount CA bundle ConfigMaps as volumes into the vMCP pod for entries - that specify `caBundleRef` (at a well-known path such as - `/etc/toolhive/ca-bundles//`) -4. Update vMCP static config parser to deserialize entry backends -5. Add `BackendTypeEntry` to vMCP types -6. Implement external TLS transport creation for entry backends - (loading CA bundles from mounted volume paths) -7. Integration tests with envtest - -### Phase 3: Dynamic Mode Integration - -1. Create MCPServerEntry reconciler for vMCP's dynamic registry -2. Register watcher in the K8s manager alongside MCPServerWatcher -3. Update workload discovery to include MCPServerEntry -4. Resolve auth configs for entry backends at runtime -5. Integration tests for dynamic discovery of entry backends - -### Phase 4: Documentation and E2E - -1. CRD reference documentation -2. User guide with examples (public remote, authenticated remote, - private CA) -3. MCPRemoteProxy vs MCPServerEntry comparison guide -4. E2E Chainsaw tests for full lifecycle -5. E2E tests for mixed MCPServer + MCPServerEntry groups - -### Dependencies - -- THV-0014 (K8s-Aware vMCP) for dynamic mode support -- THV-0026 (Header Passthrough) for `headerForward` field reuse -- Existing MCPExternalAuthConfig CRD for auth configuration - -## Testing Strategy - -### Unit Tests - -- Controller validation: groupRef exists, authConfigRef exists, HTTPS - enforcement, annotation override -- CRD type serialization/deserialization -- Backend conversion from MCPServerEntry to internal Backend struct -- External TLS transport creation with and without custom CA bundles -- Static config parsing with entry-type backends - -### Integration Tests (envtest) - -- MCPServerEntry controller reconciliation with real API server -- VirtualMCPServer ConfigMap generation including entry backends -- MCPGroup status update with mixed MCPServer + MCPServerEntry members -- Dynamic mode: MCPServerEntry watcher reconciliation -- Auth config resolution for entry backends -- Secret change propagation to entry backends - -### End-to-End Tests (Chainsaw) - -- Full lifecycle: create MCPGroup, create MCPServerEntry, create - VirtualMCPServer, verify vMCP routes to remote backend -- Mixed group: MCPServer (container) + MCPServerEntry (remote) in same - group -- Unauthenticated public remote behind vMCP -- Authenticated remote with token exchange -- MCPServerEntry deletion removes backend from vMCP -- CA bundle configuration for private remotes - -### Security Tests - -- Verify HTTPS enforcement (HTTP URL without annotation is rejected) -- Verify RBAC separation (entry creation requires correct permissions) -- Verify no network probing from controller -- Verify secret values are not logged - -## Documentation - -- **CRD Reference**: Auto-generated CRD documentation for MCPServerEntry - fields, validation rules, and status conditions -- **User Guide**: How to add remote MCP backends to vMCP using - MCPServerEntry, with examples for common scenarios -- **Comparison Guide**: When to use MCPRemoteProxy vs MCPServerEntry: - - | Feature | MCPRemoteProxy | MCPServerEntry | - |---------|---------------|----------------| - | Deploys pods | Yes (proxy pod) | No | - | Own auth middleware | Yes (oidcConfig, authzConfig) | No | - | Own audit logging | Yes | No (uses vMCP's) | - | Standalone use | Yes | No (only via VirtualMCPServer) | - | GroupRef support | Yes (optional) | Yes (required) | - | Primary use case | Standalone proxy with full observability | Backend declaration for vMCP | - -- **Architecture Documentation**: Update `docs/arch/10-virtual-mcp-architecture.md` - to describe MCPServerEntry as a backend type - -## Open Questions - -1. **Should `remoteURL` strictly require HTTPS?** - Recommendation: Yes, with annotation override - (`toolhive.stacklok.dev/allow-insecure: "true"`) for development. - This prevents accidental plaintext credential transmission while - allowing local development workflows. - -2. **Should the CRD support custom CA bundles for private remote servers?** - Recommendation: Yes, via `caBundleRef` field referencing a ConfigMap. - CA certificates are public data and ConfigMap is the semantically - appropriate resource type, consistent with the `kube-root-ca.crt` - pattern. This is essential for enterprises with internal CAs. The - current design includes this field. - -3. **Should there be a `disabled` field for temporarily removing an entry - from discovery without deleting it?** - This could be useful for maintenance windows or incident response. - However, it adds complexity and can be achieved by removing the - `groupRef` temporarily. Defer to post-implementation feedback. - -4. **Should MCPServerEntry support `toolConfigRef` for tool filtering?** - MCPRemoteProxy supports tool filtering via `toolConfigRef`. - VirtualMCPServer also has its own tool filtering/override configuration - in `spec.aggregation.tools`, which supports per-backend filtering via - the `workload` field (e.g., `tools: [{workload: "salesforce", filter: [...]}]`). - For MCPServerEntry, tool filtering should be configured at the - VirtualMCPServer level rather than duplicating it on the entry. - **Migration note:** Users migrating from MCPRemoteProxy who rely on - `toolConfigRef` for per-backend tool filtering should configure - equivalent filtering in `VirtualMCPServer.spec.aggregation.tools` - with the `workload` field set to the MCPServerEntry name. If - post-implementation feedback reveals that `aggregation.tools` is - insufficient for per-backend filtering use cases, `toolConfigRef` - can be added to MCPServerEntry in a follow-up without breaking - changes. - -## References - -- [THV-0008: Virtual MCP Server](./THV-0008-virtual-mcp-server.md) - - VirtualMCPServer design, auth boundaries, capability aggregation -- [THV-0009: Remote MCP Server Proxy](./THV-0009-remote-mcp-proxy.md) - - MCPRemoteProxy CRD design -- [THV-0010: MCPGroup CRD](./THV-0010-kubernetes-mcpgroup-crd.md) - - Group-based backend discovery pattern -- [THV-0014: K8s-Aware vMCP](./THV-0014-vmcp-k8s-aware-refactor.md) - - Dynamic vs static discovery modes, reconciler infrastructure -- [THV-0026: Header Passthrough](./THV-0026-header-passthrough.md) - - `headerForward` configuration pattern -- [Istio ServiceEntry](https://istio.io/latest/docs/reference/config/networking/service-entry/) - - Naming pattern inspiration -- [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104) - - MCPRemoteProxy forces OIDC auth on public remotes behind vMCP -- [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) - - Dual auth boundary confusion with externalAuthConfigRef - ---- - -## RFC Lifecycle - - - -### Review History - -| Date | Reviewer | Decision | Notes | -|------|----------|----------|-------| -| 2026-03-12 | @jaosorior | Draft | Initial submission | - -### Implementation Tracking - -| Repository | PR | Status | -|------------|-----|--------| -| toolhive | TBD | Not started | From 5d4d8f331610db6d1afaeec263d6f57ddc3ba912 Mon Sep 17 00:00:00 2001 From: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Date: Wed, 18 Mar 2026 18:03:37 +0000 Subject: [PATCH 08/15] Address comprehensive review feedback on RFC-0057 Address all review feedback from specialized agent review: Critical fixes: - Add 3 CEL validation rules (proxy requires proxyConfig, proxyConfig requires oidcConfig, direct rejects proxyConfig) - Add spec.type immutability guard to prevent orphaned resources - Add Phase 0.5 for MCPRemoteProxy controller refactoring prerequisite - Document StaticBackendConfig Type field gap and KnownFields(true) risk - Document full RFC 8693 token exchange flow with subject token provenance - Document vMCP-to-proxy pod token forwarding with MCP auth spec rationale - Add session constraints for direct mode (single-replica requirement) Significant fixes: - Fix Phase 0 deprecation mechanism (Warning events, not deprecatedversion) - Make MCPGroup status field changes additive (not breaking rename) - Document name collision handling and WorkloadType enum extension - Document field indexer registration requirement - Strengthen SSRF mitigation (NetworkPolicy REQUIRED for IMDS/cluster API) - Add credential blast radius to threat model - Mark SSE as deprecated per MCP spec 2025-11-25 - Add MCP-Protocol-Version header injection requirement - Add reconnection handling section for direct mode - Expand MCPGroup controller changes to 9 explicit code changes - Fix Open Question 4 contradiction with groupRef: Required Documentation and polish: - Add mode selection guide, CRD short names, printer columns - Add allow-insecure annotation specification - Fix THV-0055 broken link, add spec.port to migration table - Add actionable deprecation timeline, standalone use explanation - Add RFC naming convention note Co-Authored-By: Claude Opus 4.6 (1M context) --- ...premoteendpoint-unified-remote-backends.md | 570 +++++++++++++++--- 1 file changed, 471 insertions(+), 99 deletions(-) diff --git a/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md b/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md index 62db7e3..34ee881 100644 --- a/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md +++ b/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md @@ -5,7 +5,8 @@ - **Created**: 2026-03-18 - **Last Updated**: 2026-03-18 - **Target Repository**: toolhive -- **Supersedes**: [THV-0055](./THV-0055-mcpserverentry-direct-remote-backends.md) +- **Supersedes**: THV-0055 (MCPServerEntry CRD — removed from this repo; see git history) +- **Identifier Note**: This RFC adopts the `RFC-NNNN` naming convention; earlier RFCs used the `THV-NNNN` prefix - **Related Issues**: [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104), [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) ## Summary @@ -88,6 +89,24 @@ should be a configuration choice within it. exclusively for remote HTTP-based MCP servers. - **CLI mode support**: `MCPRemoteEndpoint` is a Kubernetes-only CRD. +## Mode Selection Guide + +Use this table to choose between `type: proxy` and `type: direct`: + +| Scenario | Recommended Mode | Why | +|----------|-----------------|-----| +| Public, unauthenticated remote (e.g., context7) | `direct` | No auth middleware needed; avoid unnecessary pod | +| Remote requiring only token exchange auth | `direct` | vMCP handles token exchange directly; one fewer hop | +| Remote requiring its own OIDC validation boundary | `proxy` | Proxy pod validates tokens independently of vMCP | +| Remote requiring Cedar authz policies per-endpoint | `proxy` | Authz policies run in the proxy pod | +| Remote needing audit logging at the endpoint level | `proxy` | Proxy pod has its own audit middleware | +| Standalone use without VirtualMCPServer | `proxy` | Direct mode requires vMCP to function | +| Many remotes where pod-per-remote is too costly | `direct` | No Deployment/Service/Pod per remote | +| Remotes behind strict egress NetworkPolicy | `proxy` | Blast radius is limited to the proxy pod's credentials | + +**Rule of thumb:** Use `direct` for simple, public, or token-exchange-only remotes. +Use `proxy` when you need an independent auth/authz/audit boundary per remote. + ## Proposed Solution ### High-Level Design @@ -143,13 +162,15 @@ graph TB | Deploys proxy pod | Yes | No | | Own OIDC validation | Yes | No (vMCP handles this) | | Own authz policy | Yes | No | -| Own audit logging | Yes | No (uses vMCP's) | -| Standalone use (without vMCP) | Yes | No | +| Own audit logging | Yes (proxy-level) | No (vMCP's audit middleware only — see [Audit Limitations](#audit-limitations-in-direct-mode)) | +| Standalone use (without vMCP) | Yes (proxy pod exposes its own Service and can be accessed directly by MCP clients; useful for single-remote deployments without vMCP aggregation) | No (requires vMCP to route traffic) | | Outgoing auth to remote | Yes (`externalAuthConfigRef`) | Yes (`externalAuthConfigRef`) | | Header forwarding | Yes (`headerForward`) | Yes (`headerForward`) | | Custom CA bundle | Yes (`caBundleRef`) | Yes (`caBundleRef`) | | Tool filtering | Yes (`toolConfigRef`) | Yes (`toolConfigRef`) | | GroupRef support | Yes | Yes | +| Multi-replica vMCP | Yes (session state is in the proxy pod) | Requires single-replica vMCP or shared session store (see [Session Constraints](#session-constraints-in-direct-mode)) | +| Credential blast radius | Isolated per proxy pod | All credentials in the vMCP pod (see [Security Considerations](#credential-blast-radius-in-direct-mode)) | ### Auth Flow Comparison @@ -157,19 +178,46 @@ graph TB ``` Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] - -> MCPRemoteEndpoint proxy pod [own OIDC + authz] - externalAuthConfigRef: proxy-to-remote auth - -> Remote Server + -> vMCP forwards client's validated token to proxy pod Service URL + -> Proxy pod [own OIDC validation + Cedar authz] + -> If externalAuthConfigRef (tokenExchange type): + proxy uses validated incoming client token as RFC 8693 + subject_token to obtain a service token for the remote + -> Proxy sends service token to Remote Server ``` +**Important: vMCP-to-proxy pod token flow.** vMCP forwards the client's +original `aud=vmcp` token to the proxy pod. The proxy pod independently +validates this token via its own `oidcConfig`. This is **not** token +passthrough in the MCP auth spec sense — the proxy pod is a separate +trust boundary that performs its own validation. The proxy's OIDC +configuration must accept the same issuer and audience as vMCP's incoming +auth, or the proxy must be configured with a compatible trust relationship. + **`type: direct` — vMCP connects directly:** ``` Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] - -> MCPRemoteEndpoint: vMCP applies externalAuthConfigRef directly - -> Remote Server (ONE boundary, ONE auth config) + -> If externalAuthConfigRef (tokenExchange type): + vMCP uses the client's validated aud=vmcp token as the + RFC 8693 subject_token in a token exchange request + to obtain a service token for the remote + -> If externalAuthConfigRef (other types): + vMCP resolves credentials from the referenced config + -> vMCP sends service token / credentials to Remote Server ``` +**Token exchange operational requirements for `type: direct`:** +- The token exchange server (STS) must trust the IdP that issued the + client's `aud=vmcp` token (i.e., there must be a federation or trust + relationship between vMCP's IdP and the STS). +- The client token must remain valid for the duration of the token + exchange request. Short-lived tokens may expire during the exchange + window; configure sufficient token lifetime or use refresh tokens. +- The `audience` parameter in the token exchange request targets the + remote server. The STS must be configured to issue tokens for the + requested audience. + ### Detailed Design #### MCPRemoteEndpoint CRD @@ -181,11 +229,12 @@ metadata: name: context7 namespace: default spec: - # REQUIRED: Connectivity mode + # REQUIRED: Connectivity mode — IMMUTABLE after creation # - proxy: deploy a proxy pod with full auth middleware # - direct: no pod; vMCP connects directly to remoteURL # +kubebuilder:validation:Enum=proxy;direct # +kubebuilder:default=proxy + # +kubebuilder:validation:XValidation:rule="self.type == oldSelf.type",message="spec.type is immutable" type: direct # REQUIRED: URL of the remote MCP server @@ -194,6 +243,8 @@ spec: remoteURL: https://mcp.context7.com/mcp # REQUIRED: Transport protocol + # streamable-http is RECOMMENDED for new deployments. + # sse is DEPRECATED (MCP spec 2025-11-25) and retained for legacy compatibility only. # +kubebuilder:validation:Enum=streamable-http;sse transport: streamable-http @@ -234,9 +285,15 @@ spec: toolConfigRef: name: my-tool-config - # OPTIONAL: Proxy pod configuration. - # Only valid when type: proxy. Ignored when type: direct. - # +kubebuilder:validation:XValidation:rule="self.type == 'direct' ? !has(self.proxyConfig) : true" + # Proxy pod configuration. + # REQUIRED when type: proxy. MUST NOT be set when type: direct. + # + # CEL validation rules (on MCPRemoteEndpointSpec): + # +kubebuilder:validation:XValidation:rule="self.type == 'direct' ? !has(self.proxyConfig) : true",message="proxyConfig must not be set when type is direct" + # +kubebuilder:validation:XValidation:rule="self.type == 'proxy' ? has(self.proxyConfig) : true",message="proxyConfig is required when type is proxy" + # + # CEL validation rule (on ProxyConfig struct, following VirtualMCPServer IncomingAuthConfig pattern): + # +kubebuilder:validation:XValidation:rule="has(self.oidcConfig)",message="proxyConfig.oidcConfig is required" proxyConfig: # REQUIRED within proxyConfig: OIDC for validating incoming tokens oidcConfig: @@ -270,7 +327,13 @@ spec: # +kubebuilder:default=8080 proxyPort: 8080 - # OPTIONAL: Session affinity for the proxy Service + # OPTIONAL: Session affinity for the proxy Service. + # NOTE: ClientIP is a rough approximation that breaks under NAT. + # The MCP Streamable HTTP spec requires session affinity based on + # the Mcp-Session-Id header. Future work should implement header-based + # affinity (e.g., via Ingress annotations or a Service mesh). + # ClientIP is the default because it is the only option supported + # natively by Kubernetes Services without an Ingress controller. # +kubebuilder:validation:Enum=ClientIP;None # +kubebuilder:default=ClientIP sessionAffinity: ClientIP @@ -342,19 +405,33 @@ spec: enabled: true ``` +#### CRD Metadata + +``` +// +kubebuilder:resource:shortName=mcpre;remoteendpoint +// +kubebuilder:printcolumn:name="Type",type="string",JSONPath=".spec.type",description="Connectivity mode (proxy or direct)" +// +kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase" +// +kubebuilder:printcolumn:name="Remote URL",type="string",JSONPath=".spec.remoteURL" +// +kubebuilder:printcolumn:name="URL",type="string",JSONPath=".status.url" +// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" +``` + +Short names: `mcpre`, `remoteendpoint` (following the pattern of `mcpg` for +MCPGroup, `vmcp` for VirtualMCPServer, `extauth` for MCPExternalAuthConfig). + #### Spec Fields **Top-level (both modes):** | Field | Type | Required | Description | |-------|------|----------|-------------| -| `type` | enum | Yes | `proxy` or `direct`. Default: `proxy`. | -| `remoteURL` | string | Yes | URL of the remote MCP server. Must use HTTPS unless `toolhive.stacklok.dev/allow-insecure` annotation is set. | -| `transport` | enum | Yes | MCP transport protocol: `streamable-http` or `sse`. | +| `type` | enum | Yes | `proxy` or `direct`. Default: `proxy`. **Immutable after creation** — changing type would orphan infrastructure. Delete and recreate to change mode. | +| `remoteURL` | string | Yes | URL of the remote MCP server. Must use HTTPS unless `toolhive.stacklok.dev/allow-insecure` annotation is set. The `allow-insecure` annotation is intended only for development/testing with local MCP servers; it MUST NOT be used in production. When set, the controller emits a Warning event. | +| `transport` | enum | Yes | MCP transport protocol: `streamable-http` (recommended) or `sse` (deprecated per MCP spec 2025-11-25, retained for legacy compatibility). | | `groupRef` | string | Yes | Name of the MCPGroup this endpoint belongs to. Plain string, consistent with `MCPServer.spec.groupRef` and `MCPRemoteProxy.spec.groupRef`. | -| `externalAuthConfigRef` | object | No | Auth for outgoing requests to the remote server. In `proxy` mode, this is proxy→remote auth. In `direct` mode, this is vMCP→remote auth. Omit for unauthenticated endpoints. | +| `externalAuthConfigRef` | object | No | References an `MCPExternalAuthConfig` for outgoing auth to the remote server. For `tokenExchange` type configs, this is **not** a simple credential — it is middleware that uses the validated incoming client token as the RFC 8693 `subject_token` to obtain a service token for the remote. In `proxy` mode, the proxy pod performs the exchange. In `direct` mode, vMCP performs the exchange. Omit for unauthenticated endpoints. | | `headerForward` | object | No | Header forwarding configuration. Reuses existing `HeaderForwardConfig` type. Applies to both modes. | -| `caBundleRef` | object | No | ConfigMap containing a custom CA certificate bundle for TLS verification. Applies to both modes. | +| `caBundleRef` | object | No | ConfigMap containing a custom CA certificate bundle for TLS verification. New field (not present on MCPRemoteProxy). See [CA Bundle Security](#ca-bundle-trust-store-considerations) for trust implications. Applies to both modes. | | `toolConfigRef` | object | No | Tool filtering configuration. Applies to both modes. | **`proxyConfig` (only when `type: proxy`):** @@ -373,15 +450,39 @@ spec: | `endpointPrefix` | string | No | Path prefix for ingress routing. | | `resourceOverrides` | object | No | Metadata overrides for created resources. | +#### The `toolhive.stacklok.dev/allow-insecure` Annotation + +By default, the controller rejects `remoteURL` values using plain HTTP. +Setting the annotation `toolhive.stacklok.dev/allow-insecure: "true"` on +the MCPRemoteEndpoint resource overrides this check. + +- **Accepted values:** `"true"` (any other value or absence enforces HTTPS). +- **Scope:** Per-resource. There is no cluster-wide override. +- **Audit trail:** The controller emits a Warning event with reason + `InsecureTransport` whenever it reconciles a resource with this annotation. +- **Security note:** HTTP URLs expose traffic to man-in-the-middle attacks. + This annotation is intended only for development/testing environments or + cluster-internal URLs where TLS termination happens at a load balancer. +- **Precedent:** This annotation is new — it does not exist on MCPRemoteProxy + or any other ToolHive CRD today. + #### Status Fields | Field | Type | Description | |-------|------|-------------| | `conditions` | []Condition | Standard Kubernetes conditions. | | `phase` | string | Current phase: `Pending`, `Ready`, `Failed`, `Terminating`. | -| `url` | string | URL where the endpoint can be accessed. For `type: proxy`, the internal cluster URL of the proxy service. For `type: direct`, echoes `spec.remoteURL`. | +| `url` | string | URL where the endpoint can be accessed. For `type: proxy`, the internal cluster URL of the proxy service (set once the Deployment is ready). For `type: direct`, set to `spec.remoteURL` immediately upon successful validation. | | `observedGeneration` | int64 | Most recent generation observed by the controller. | +**`status.url` lifecycle:** For `type: proxy`, `status.url` is empty until the +proxy Deployment reaches `Ready` state. Backend discoverers (both static and +dynamic mode) MUST treat an empty `status.url` as "backend not yet available" +and skip the backend rather than removing it from the registry. This prevents +backends from being briefly removed during proxy pod startup or rolling +updates. For `type: direct`, `status.url` is set immediately after validation +succeeds, so this race does not apply. + **Condition types:** | Type | Purpose | When Set | @@ -402,9 +503,19 @@ attack surface. ##### Operator: MCPRemoteEndpoint Controller +**Pre-requisite: MCPRemoteProxy controller refactoring.** The existing +`mcpremoteproxy_controller.go` is 1,125 lines with all reconciliation logic +bound to `*mcpv1alpha1.MCPRemoteProxy` receiver methods. None of this logic +is directly extractable without refactoring. Before Phase 1 implementation, +the proxy reconciliation logic (Deployment/Service/ServiceAccount creation, +RBAC setup, health monitoring, status updates) must be extracted into a +shared `pkg/operator/remoteproxy/` package with functions that accept +interfaces or generic parameters rather than concrete CRD types. This +refactoring is scoped as Phase 0.5 in the implementation plan below. + The controller has two code paths based on `spec.type`: -**`type: proxy` path** — identical to the existing `MCPRemoteProxy` controller: +**`type: proxy` path** — uses the extracted shared proxy reconciliation logic: 1. Validates spec (OIDC config, group ref, auth config ref, CA bundle ref) 2. Ensures Deployment, Service, ServiceAccount, RBAC 3. Monitors deployment health and updates `Ready` condition @@ -427,12 +538,35 @@ to referenced resources trigger re-reconciliation. ##### Operator: MCPGroup Controller Update The MCPGroup controller currently watches MCPServer and MCPRemoteProxy. It must -be updated to also watch MCPRemoteEndpoint resources, replacing the -MCPRemoteProxy watch once the deprecation window closes. - -`status.remoteProxies` is renamed to `status.remoteEndpoints` and -`status.remoteProxyCount` to `status.remoteEndpointCount` as part of this -change. Both old and new fields are populated during the deprecation window. +be updated to also watch MCPRemoteEndpoint resources. The MCPRemoteProxy watch +is retained during the deprecation window and removed only when MCPRemoteProxy +is removed. + +**Status field changes (additive, not renaming):** New fields +`status.remoteEndpoints` and `status.remoteEndpointCount` are added alongside +the existing `status.remoteProxies` and `status.remoteProxyCount`. Both old +and new fields are populated during the deprecation window. The old fields are +removed only when MCPRemoteProxy support is removed. This preserves backward +compatibility for GitOps pipelines, monitoring dashboards, and jsonpath queries +that depend on the existing field names. + +**Required code changes (this is not a single bullet point):** +1. Register a field indexer for `MCPRemoteEndpoint.spec.groupRef` in the + operator's `main.go` `SetupFieldIndexers()`, following the existing pattern + for MCPServer and MCPRemoteProxy field indexers. +2. Add a new `findReferencingMCPRemoteEndpoints()` method, mirroring + `findReferencingMCPRemoteProxies()`. +3. Add a new `findMCPGroupForMCPRemoteEndpoint()` watch mapper function. +4. Register the MCPRemoteEndpoint watch in `SetupWithManager()` via + `Watches(&mcpv1alpha1.MCPRemoteEndpoint{}, handler.EnqueueRequestsFromMapFunc(...))`. +5. Update `updateGroupMemberStatus()` to call `findReferencingMCPRemoteEndpoints()` + and populate the new status fields. +6. Update `handleListFailure()` and `handleDeletion()` to account for + MCPRemoteEndpoint members. +7. Add new RBAC markers: `+kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteendpoints,verbs=get;list;watch`. +8. Add `+kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteendpoints/status,verbs=get;update;patch`. +9. Update integration test suites (`mcp-group/suite_test.go` etc.) to register + the new field indexer. ##### Operator: VirtualMCPServer Controller Update @@ -441,36 +575,56 @@ MCPRemoteEndpoint backends. For `type: proxy`, the backend URL is the proxy service URL (same as MCPRemoteProxy today). For `type: direct`, the backend URL is `spec.remoteURL`. +**StaticBackendConfig schema change required:** The current `StaticBackendConfig` +struct in `pkg/vmcp/config/config.go` has only `Name`, `URL`, `Transport`, and +`Metadata` fields. It does **not** have a `Type` field. Since vMCP uses +`KnownFields(true)` strict YAML parsing (see `pkg/vmcp/config/yaml_loader.go:44`), +writing a `type` field to the ConfigMap before updating the vMCP binary will +cause a startup failure. The implementation must: +1. Add a `Type` field to `StaticBackendConfig` (values: `container`, `proxy`, `direct`) +2. Add optional `CABundlePath`, `Headers`, and auth-related fields +3. Update the vMCP binary **before** the operator starts writing these fields +4. Phase the rollout: operator Helm chart and vMCP image must be updated together + +Updated `StaticBackendConfig` fields needed: + ```yaml backends: # From MCPServer resources (unchanged) - name: github-mcp url: http://github-mcp.default.svc:8080 transport: sse - type: container + # type field omitted for backward compat with existing MCPServer backends # From MCPRemoteEndpoint type: proxy - name: internal-api-mcp url: http://internal-api-mcp.default.svc:8080 transport: sse - type: proxy # From MCPRemoteEndpoint type: direct - name: context7 url: https://mcp.context7.com/mcp transport: streamable-http - type: direct # No auth - public endpoint - name: salesforce-mcp url: https://mcp.salesforce.com transport: streamable-http - type: direct - auth: - type: token_exchange - # ... + # Auth resolved by operator and embedded in config ``` +**Additional touch points for static mode:** +- `listMCPRemoteEndpointsAsMap()` — new function to list MCPRemoteEndpoint + resources for ConfigMap generation. +- `getExternalAuthConfigNameFromWorkload()` — must handle MCPRemoteEndpoint + in addition to MCPServer and MCPRemoteProxy. +- `addHeadersFromSecret` in `headerForward` requires K8s Secret resolution at + ConfigMap generation time. The operator must resolve secrets and embed header + values (or mount them as environment variables) since vMCP in static mode + cannot access the K8s API. Neither `vmcp.Backend` nor `StaticBackendConfig` + currently has a `Headers` field — this must be added. +- Deployment volume mount logic must be updated for CA bundles. + **CA bundle propagation in static mode:** For `type: direct` endpoints with `caBundleRef`, the operator mounts the referenced ConfigMap as a volume into the vMCP pod at `/etc/toolhive/ca-bundles//ca.crt`. The generated @@ -479,6 +633,10 @@ backend ConfigMap includes the mount path so vMCP can construct the correct ##### vMCP: Backend Discovery Update +A new `WorkloadTypeMCPRemoteEndpoint` constant must be added to +`pkg/vmcp/workloads/discoverer.go` alongside the existing +`WorkloadTypeMCPServer` and `WorkloadTypeMCPRemoteProxy`. + `ListWorkloadsInGroup()` in `pkg/vmcp/workloads/k8s.go` is extended to list MCPRemoteEndpoint resources alongside MCPServer resources. `GetWorkloadAsVMCPBackend()` gains a new `WorkloadTypeMCPRemoteEndpoint` case that: @@ -490,6 +648,16 @@ gains a new `WorkloadTypeMCPRemoteEndpoint` case that: 3. Applies `externalAuthConfigRef`, `headerForward`, and `caBundleRef` at the vMCP layer (for `type: direct`) or leaves them for the proxy pod (for `type: proxy`). +**Name collision handling:** `fetchBackendResource()` in +`pkg/vmcp/k8s/backend_reconciler.go` currently tries MCPServer first, then +MCPRemoteProxy, returning the first match. With MCPRemoteEndpoint added as a +third type, the resolution order becomes: MCPServer → MCPRemoteProxy → +MCPRemoteEndpoint. Resources with the same `metadata.name` across different +types in the same namespace will always resolve to the first match in this +priority order. This is a known limitation. Operators should use distinct names +for resources across types. The controller should log a warning if a name +collision is detected during reconciliation. + ##### vMCP: HTTP Client for Direct Mode Backends of `type: direct` connect to external URLs over HTTPS. The vMCP HTTP @@ -499,13 +667,61 @@ client is updated to: 2. Optionally append a custom CA bundle from `caBundleRef` to the system pool. 3. Enforce a minimum TLS version of 1.2. 4. Apply `externalAuthConfigRef` credentials to outgoing requests. +5. Inject the `MCP-Protocol-Version` header on all post-initialisation requests + with the version negotiated during the MCP `initialize` handshake. This is + required by the MCP Streamable HTTP specification. +6. Track and send the `Mcp-Session-Id` header received from the remote server + on all subsequent requests within the same session. + +##### vMCP: Reconnection Handling for Direct Mode + +When vMCP's connection to a `type: direct` remote server is interrupted, the +following reconnection behaviour applies (per MCP Streamable HTTP spec): + +1. vMCP detects connection loss (HTTP error, timeout, or stream termination). +2. vMCP retries the request with exponential backoff (initial: 1s, max: 30s, + jitter: +/- 25%). +3. If the remote server responds with HTTP 404 to a request carrying an + `Mcp-Session-Id`, this indicates the session has expired. vMCP MUST: + a. Discard the existing session state. + b. Re-run the `initialize` → `initialized` handshake. + c. Re-fetch the tool list via `tools/list`. + d. Update the backend's tool inventory in the dynamic registry. +4. If the remote server is unreachable after max retries, the backend is marked + as `unavailable` in vMCP's health status and the circuit breaker opens. + +This reconnection logic reuses the existing circuit breaker and health check +infrastructure in `pkg/vmcp/`. + +##### Session Constraints in Direct Mode + +MCP Streamable HTTP sessions are stateful: the remote server issues an +`Mcp-Session-Id` header that must be sent on all subsequent requests. vMCP's +session manager (`pkg/transport/session/`) currently uses local in-process +storage (`storage_local.go`). + +**Constraint: `type: direct` requires single-replica vMCP or a shared session +store.** Multiple vMCP replicas cannot share MCP session state with the current +local storage implementation. If a different replica handles a follow-up +request, the remote server will reject it (HTTP 404). + +**Mitigation options (choose one at implementation time):** +- **Single-replica (default):** Document that `type: direct` backends require + `replicas: 1` on the VirtualMCPServer Deployment. This is acceptable for + many deployments and avoids infrastructure complexity. +- **Shared session store:** Implement a Redis-backed session storage backend + in `pkg/transport/session/`. The infrastructure pattern already exists + (the storage interface in `storage.go` supports alternative implementations). + This is a follow-up if multi-replica `type: direct` is needed. ##### vMCP: Dynamic Mode Reconciler Update The `BackendReconciler` in `pkg/vmcp/k8s/backend_reconciler.go` currently watches MCPServer and MCPRemoteProxy. It is extended to also watch MCPRemoteEndpoint, following the same `EnqueueRequestsFromMapFunc` pattern. -The `fetchBackendResource()` method gains a third resource type to try. +The `fetchBackendResource()` method gains a third resource type to try +(see [Name collision handling](#vmcp-backend-discovery-update) above for +resolution order). ## Security Considerations @@ -515,20 +731,105 @@ The `fetchBackendResource()` method gains a third resource type to try. |--------|-------------|------------| | MITM on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | | Credential exposure | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets, referenced via `externalAuthConfigRef` and `headerForward.addHeadersFromSecret`; never inline | -| SSRF via remoteURL | Operator configures URL pointing to internal services | RBAC (only authorised users create MCPRemoteEndpoint); HTTPS enforced by default; NetworkPolicy should restrict vMCP pod egress. CEL-based RFC 1918 IP blocking is intentionally omitted because `type: direct` legitimately targets internal/corporate MCP servers — RBAC is the appropriate control layer. | +| SSRF via remoteURL | Attacker (or compromised workload with CRD write access) sets `remoteURL` to internal targets | See [SSRF Mitigation](#ssrf-mitigation) below | | Auth config confusion | Wrong tokens sent to wrong endpoints | Eliminated: `externalAuthConfigRef` at the top level has one unambiguous purpose — auth to the remote server | | Operator probing external URLs | Controller makes network requests to untrusted URLs | Eliminated: controller performs validation only, no network probing | -| Expanded vMCP egress surface | vMCP pod makes outbound calls to arbitrary URLs in `type: direct` mode | Acknowledged trade-off. In `type: proxy`, the proxy pod makes outbound calls and vMCP's blast radius is limited. In `type: direct`, the vMCP pod makes outbound calls directly. Mitigated by NetworkPolicy restricting vMCP egress and RBAC restricting who can create MCPRemoteEndpoint resources. | +| Expanded vMCP egress surface | vMCP pod makes outbound calls to arbitrary URLs in `type: direct` mode | Acknowledged trade-off. See [Credential Blast Radius](#credential-blast-radius-in-direct-mode) below | +| Credential blast radius | Single vMCP pod compromise yields all backend credentials | See [Credential Blast Radius](#credential-blast-radius-in-direct-mode) below | +| Trust store injection via CA bundle | ConfigMap write access allows injecting a malicious CA | See [CA Bundle Trust Store](#ca-bundle-trust-store-considerations) below | + +#### SSRF Mitigation + +RBAC alone is insufficient when the threat model includes a compromised workload +that has been granted CRD write access (e.g., a CI pipeline service account). +The following mitigations are **required**: + +1. **NetworkPolicy (REQUIRED):** The Helm chart MUST include a default + NetworkPolicy for the vMCP pod that blocks egress to: + - `169.254.169.254/32` (AWS/GCP/Azure IMDS) + - `fd00:ec2::254/128` (AWS IMDSv2 IPv6) + - `kubernetes.default.svc` and `kubernetes.default.svc.cluster.local` + - The operator pod's Service CIDR (to prevent CRD self-modification loops) + + These rules MUST be applied by default with an opt-out annotation for + environments that intentionally target these addresses. + +2. **RBAC (REQUIRED):** Only cluster administrators or platform team service + accounts should have `create`/`update` permissions on MCPRemoteEndpoint. + +3. **CEL-based IP blocking (intentionally omitted):** `type: direct` + legitimately targets internal/corporate MCP servers on RFC 1918 addresses. + Blocking all private IPs at the admission level would break valid use cases. + NetworkPolicy is the correct layer for targeted SSRF protection. + +#### Credential Blast Radius in Direct Mode + +In `type: proxy` mode, each proxy pod holds only its own backend's credentials. +A compromised proxy pod yields credentials for one backend. + +In `type: direct` mode, the vMCP pod holds credentials for **every** direct +backend simultaneously (resolved via `externalAuthConfigRef` at runtime). A +single vMCP pod compromise yields all backend credentials. + +**Additional risk — credential confusion:** If two `type: direct` backends +share the same token exchange server (STS) but target different audiences, a +bug in audience parameter handling could cause vMCP to send a token scoped for +backend A to backend B. Each `externalAuthConfigRef` must specify the target +audience explicitly, and the token exchange implementation must validate that +the returned token's audience matches the intended backend. + +**Recommendation for high-security environments:** Use `type: proxy` for +backends with sensitive credentials. Use `type: direct` only for +unauthenticated or low-sensitivity backends. Document this trade-off in the +user guide. + +#### CA Bundle Trust Store Considerations + +`caBundleRef` references a ConfigMap containing CA certificates. While CA +certificates are public data (following the `kube-root-ca.crt` pattern), +ConfigMap write access is a trust decision: anyone who can write to the +referenced ConfigMap can inject a malicious CA certificate, enabling MITM +attacks against the remote server. + +**Mitigations:** +- The `caBundleRef` ConfigMap SHOULD be in a restricted namespace or protected + by RBAC so that only trusted operators can modify it. +- The controller SHOULD emit a Warning event if the referenced ConfigMap is + in a different namespace than the MCPRemoteEndpoint (not currently possible + since cross-namespace references are not supported, but worth noting for + future-proofing). +- Documentation MUST note that CA bundle ConfigMaps are security-sensitive + despite containing "public" data. + +#### Audit Limitations in Direct Mode + +In `type: proxy` mode, the proxy pod's audit middleware logs: +- Incoming request details (client identity, tool invoked, timestamp) +- Outgoing request to the remote (URL, auth outcome, response status) + +In `type: direct` mode, vMCP's existing audit middleware covers incoming client +requests but does **not** currently log: +- The remote URL contacted for each backend request +- The outgoing auth outcome (token exchange success/failure) +- The remote server's HTTP response status + +**Required enhancement:** vMCP's audit middleware must be extended for +`type: direct` backends to log the remote URL, outgoing auth method, and +remote HTTP status code. This is scoped as part of Phase 2 implementation. ### Authentication and Authorization - **No new auth primitives**: `MCPRemoteEndpoint` reuses the existing `MCPExternalAuthConfig` CRD and `externalAuthConfigRef` pattern. - **Single boundary in direct mode**: vMCP's incoming auth validates client - tokens. `externalAuthConfigRef` handles outgoing auth to the remote. Cleanly - separated with no dual-purpose confusion. + tokens. `externalAuthConfigRef` handles outgoing auth to the remote. For + `tokenExchange` type configs, this is middleware that uses the validated + client token as the RFC 8693 `subject_token` — not a static credential. + See the [Auth Flow Comparison](#auth-flow-comparison) for the full flow. - **Full auth stack in proxy mode**: identical to existing MCPRemoteProxy — - OIDC validation, authz policy, token exchange all apply. + OIDC validation, authz policy, token exchange all apply. vMCP forwards the + client's original `aud=vmcp` token to the proxy pod's Service URL; the proxy + pod independently validates this token via its own `oidcConfig`. ### Secrets Management @@ -541,36 +842,54 @@ The `fetchBackendResource()` method gains a third resource type to try. ## Deprecation -`MCPRemoteProxy` is deprecated as of this RFC. The timeline is: +`MCPRemoteProxy` is deprecated as of this RFC. + +**Note on deprecation mechanism:** The `+kubebuilder:deprecatedversion` +annotation only works for deprecating API versions within the same CRD (e.g., +`v1alpha1` → `v1beta1` of MCPRemoteEndpoint). It cannot deprecate one CRD in +favour of a different CRD. The deprecation of MCPRemoteProxy in favour of +MCPRemoteEndpoint is communicated through: +1. Controller-emitted Warning events on every MCPRemoteProxy reconciliation +2. A deprecation annotation on the MCPRemoteProxy CRD (`deprecated: "true"`) +3. Documentation updates directing users to MCPRemoteEndpoint + +**Timeline:** -1. **Now**: `MCPRemoteProxy` receives a deprecation annotation and emits a - Kubernetes Event warning on creation/update. -2. **v1beta1 graduation**: `MCPRemoteEndpoint` graduates to `v1beta1`. - `MCPRemoteProxy` remains in `v1alpha1` with no further feature development. -3. **Future release (TBD)**: `MCPRemoteProxy` CRD is removed after a minimum - two-release deprecation window. +| Phase | Trigger | What Happens | +|-------|---------|-------------| +| Deprecation announced | This RFC merges | MCPRemoteProxy controller emits Warning events on every `Reconcile()`. CRD description updated. Documentation updated. | +| Feature freeze | MCPRemoteEndpoint Phase 1 merged | MCPRemoteProxy receives no new features. Bug fixes and security patches only. | +| Migration window | MCPRemoteEndpoint reaches GA | Minimum 2 minor releases (e.g., v0.X → v0.X+2) for users to migrate. Migration guide published. | +| Removal | After migration window | MCPRemoteProxy CRD, controller, Helm templates, and RBAC rules removed. | + +The removal date will be set to a specific release version once MCPRemoteEndpoint +reaches GA. "Two-release window" means two minor version releases of the +ToolHive operator, not calendar time. ### Migration: MCPRemoteProxy → MCPRemoteEndpoint -| `MCPRemoteProxy` field | `MCPRemoteEndpoint` equivalent | -|---|---| -| `spec.remoteURL` | `spec.remoteURL` | -| `spec.transport` | `spec.transport` | -| `spec.groupRef` | `spec.groupRef` | -| `spec.externalAuthConfigRef` | `spec.externalAuthConfigRef` | -| `spec.headerForward` | `spec.headerForward` | -| `spec.toolConfigRef` | `spec.toolConfigRef` | -| `spec.oidcConfig` | `spec.proxyConfig.oidcConfig` | -| `spec.authzConfig` | `spec.proxyConfig.authzConfig` | -| `spec.audit` | `spec.proxyConfig.audit` | -| `spec.telemetry` | `spec.proxyConfig.telemetry` | -| `spec.resources` | `spec.proxyConfig.resources` | -| `spec.serviceAccount` | `spec.proxyConfig.serviceAccount` | -| `spec.proxyPort` | `spec.proxyConfig.proxyPort` | -| `spec.sessionAffinity` | `spec.proxyConfig.sessionAffinity` | -| `spec.trustProxyHeaders` | `spec.proxyConfig.trustProxyHeaders` | -| `spec.endpointPrefix` | `spec.proxyConfig.endpointPrefix` | -| `spec.resourceOverrides` | `spec.proxyConfig.resourceOverrides` | +| `MCPRemoteProxy` field | `MCPRemoteEndpoint` equivalent | Notes | +|---|---|---| +| `spec.remoteURL` | `spec.remoteURL` | | +| `spec.port` | `spec.proxyConfig.proxyPort` | `spec.port` is deprecated on MCPRemoteProxy; use `proxyPort` | +| `spec.proxyPort` | `spec.proxyConfig.proxyPort` | | +| `spec.transport` | `spec.transport` | | +| `spec.groupRef` | `spec.groupRef` | | +| `spec.externalAuthConfigRef` | `spec.externalAuthConfigRef` | | +| `spec.headerForward` | `spec.headerForward` | | +| `spec.toolConfigRef` | `spec.toolConfigRef` | | +| `spec.oidcConfig` | `spec.proxyConfig.oidcConfig` | | +| `spec.authzConfig` | `spec.proxyConfig.authzConfig` | | +| `spec.audit` | `spec.proxyConfig.audit` | | +| `spec.telemetry` | `spec.proxyConfig.telemetry` | | +| `spec.resources` | `spec.proxyConfig.resources` | | +| `spec.serviceAccount` | `spec.proxyConfig.serviceAccount` | | +| `spec.sessionAffinity` | `spec.proxyConfig.sessionAffinity` | | +| `spec.trustProxyHeaders` | `spec.proxyConfig.trustProxyHeaders` | | +| `spec.endpointPrefix` | `spec.proxyConfig.endpointPrefix` | | +| `spec.resourceOverrides` | `spec.proxyConfig.resourceOverrides` | | +| *(not present)* | `spec.type` | Set to `proxy` for equivalent behaviour | +| *(not present)* | `spec.caBundleRef` | New field; not available on MCPRemoteProxy | ## Alternatives Considered @@ -658,47 +977,83 @@ separation. ### Phase 0: MCPRemoteProxy Deprecation Markers -1. Add `+kubebuilder:deprecatedversion` annotation to MCPRemoteProxy -2. Emit a Kubernetes Event warning when MCPRemoteProxy is created or updated -3. Update documentation to direct users to MCPRemoteEndpoint +1. Add a deprecation annotation (`deprecated: "true"`) to the MCPRemoteProxy + CRD description in `mcpremoteproxy_types.go`. +2. Emit a Kubernetes Warning event on every MCPRemoteProxy `Reconcile()` call + (not just create/update) directing users to MCPRemoteEndpoint. +3. Update documentation to direct users to MCPRemoteEndpoint. + +### Phase 0.5: MCPRemoteProxy Controller Refactoring (Pre-requisite) + +Extract shared proxy reconciliation logic from `mcpremoteproxy_controller.go` +(1,125 lines) into a reusable package: + +1. Create `pkg/operator/remoteproxy/` (or similar) with functions for: + - Deployment creation/update with container spec, volumes, and env vars + - Service creation/update + - ServiceAccount and RBAC setup + - Health monitoring and status condition updates +2. Refactor `MCPRemoteProxyReconciler` to use the extracted package. +3. Verify all existing MCPRemoteProxy tests pass unchanged. + +This is a **refactoring-only** phase — no new features, no API changes. ### Phase 1: CRD and Controller 1. Define `MCPRemoteEndpoint` CRD types with `type`, shared fields, and `proxyConfig` -2. Implement controller with both code paths (`type: proxy` reusing MCPRemoteProxy - controller logic; `type: direct` validation only) -3. Generate CRD manifests and update Helm chart -4. Update MCPGroup controller to watch MCPRemoteEndpoint + - Include all three CEL validation rules (direct rejects proxyConfig, + proxy requires proxyConfig, proxyConfig requires oidcConfig) + - Include `spec.type` immutability guard + - Include short names (`mcpre`, `remoteendpoint`) and printer columns +2. Implement controller with both code paths (`type: proxy` using the + extracted shared package from Phase 0.5; `type: direct` validation only) +3. Generate CRD manifests and update Helm chart (include default NetworkPolicy) +4. Update MCPGroup controller (all 9 code changes listed in the MCPGroup + Controller Update section above) 5. Add unit tests for both controller paths +6. Add CEL validation tests ### Phase 2: Static Mode Integration -1. Update VirtualMCPServer controller to discover MCPRemoteEndpoint resources -2. Update ConfigMap generation to include both proxy and direct backend types -3. Mount CA bundle ConfigMaps as volumes for `type: direct` endpoints with `caBundleRef` -4. Update vMCP static config parser to deserialise both backend types -5. Implement external TLS transport for `type: direct` backends -6. Integration tests with envtest +1. Add `Type`, `CABundlePath`, and `Headers` fields to `StaticBackendConfig` + in `pkg/vmcp/config/config.go` (must be deployed **before** the operator + starts writing these fields) +2. Update VirtualMCPServer controller to discover MCPRemoteEndpoint resources + - Add `listMCPRemoteEndpointsAsMap()` function + - Update `getExternalAuthConfigNameFromWorkload()` for MCPRemoteEndpoint +3. Update ConfigMap generation to include direct backend entries +4. Implement `addHeadersFromSecret` resolution at ConfigMap generation time +5. Mount CA bundle ConfigMaps as volumes for `type: direct` endpoints with `caBundleRef` +6. Implement external TLS transport for `type: direct` backends +7. Extend vMCP audit middleware for `type: direct` (log remote URL, auth outcome, + remote HTTP status) +8. Integration tests with envtest ### Phase 3: Dynamic Mode Integration -1. Extend `BackendReconciler` to watch MCPRemoteEndpoint -2. Extend `ListWorkloadsInGroup()` and `GetWorkloadAsVMCPBackend()` in +1. Add `WorkloadTypeMCPRemoteEndpoint` to `pkg/vmcp/workloads/discoverer.go` +2. Extend `BackendReconciler` to watch MCPRemoteEndpoint + - Update `fetchBackendResource()` with third resource type and document + resolution order +3. Extend `ListWorkloadsInGroup()` and `GetWorkloadAsVMCPBackend()` in `pkg/vmcp/workloads/k8s.go` -3. Register watcher in the K8s manager -4. Integration tests for dynamic discovery +4. Implement MCP-Protocol-Version header injection for direct mode HTTP client +5. Implement reconnection handling with backoff and session re-initialisation +6. Register watcher in the K8s manager +7. Integration tests for dynamic discovery ### Phase 4: Documentation and E2E 1. CRD reference documentation for MCPRemoteEndpoint 2. Migration guide: MCPRemoteProxy → MCPRemoteEndpoint -3. User guide covering both modes with examples -4. E2E Chainsaw tests for full lifecycle (both modes) -5. E2E tests for mixed MCPServer + MCPRemoteEndpoint groups +3. User guide covering both modes with mode selection guide +4. Document single-replica constraint for `type: direct` with session state +5. E2E Chainsaw tests for full lifecycle (both modes) +6. E2E tests for mixed MCPServer + MCPRemoteEndpoint groups ### Dependencies -- THV-0014 (K8s-Aware vMCP) for dynamic mode support (Phase 3) +- THV-0014 (K8s-Aware vMCP) — already merged; needed for dynamic mode (Phase 3) - Broader CRD revamp / v1beta1 graduation work ## Testing Strategy @@ -706,11 +1061,18 @@ separation. ### Unit Tests - Controller validation for both modes -- CEL validation rules (proxyConfig rejected when type: direct) +- CEL validation rules: + - `type: direct` with `proxyConfig` → rejected + - `type: proxy` without `proxyConfig` → rejected + - `type: proxy` with `proxyConfig` but without `oidcConfig` → rejected + - `spec.type` mutation → rejected (immutability guard) - CRD type serialisation/deserialisation - Backend conversion for both types - External TLS transport creation with and without custom CA bundles -- Static config parsing for both backend types +- Static config parsing for both backend types (including new `Type` field) +- MCP-Protocol-Version header injection +- Reconnection handling with session re-initialisation on HTTP 404 +- Name collision detection and logging in `fetchBackendResource()` ### Integration Tests (envtest) @@ -733,34 +1095,43 @@ separation. ## Open Questions 1. **Should `groupRef` be required on MCPRemoteEndpoint?** - Recommendation: Yes, consistent with the reasoning that an endpoint without + **Resolved: Yes.** Consistent with the reasoning that an endpoint without a group is unreachable. As a follow-up, consider making `groupRef` required on MCPServer and MCPRemoteProxy too for consistency. 2. **When should MCPRemoteProxy be removed?** - Recommendation: After two releases post-MCPRemoteEndpoint GA. Track as a - separate issue once Phase 1 is merged. + **Resolved:** After two minor releases post-MCPRemoteEndpoint GA. Track as + a separate issue once Phase 1 is merged. See the [Deprecation](#deprecation) + timeline table for details. 3. **Should `toolConfigRef` be in the shared fields or mode-specific?** - Recommendation: Shared top-level field. Tool filtering applies equally to + **Resolved: Shared top-level field.** Tool filtering applies equally to both modes and is already supported in `VirtualMCPServer.spec.aggregation.tools` as a fallback. 4. **Should there be a `disabled` field?** - Recommendation: Defer. Users can remove the resource or change `groupRef` - (which, unlike the original MCPServerEntry proposal, is not required to be - non-empty once the resource is created — removal from the group is a valid - operation). Revisit based on post-implementation feedback. + **Resolved: Defer.** Since `groupRef` is required and immutable-in-practice + (it is a required string field — the API server rejects empty-string + updates), disabling an endpoint requires deleting the resource. A future + `disabled: true` field could be added as an additive change if + post-implementation feedback shows deletion is too disruptive. + +5. **Should multi-replica vMCP be supported for `type: direct`?** + Recommendation: Start with single-replica constraint. Add shared session + store (Redis) as a follow-up if demand exists. See [Session Constraints](#session-constraints-in-direct-mode). ## References -- [THV-0055: MCPServerEntry CRD](./THV-0055-mcpserverentry-direct-remote-backends.md) — - superseded by this RFC +- THV-0055: MCPServerEntry CRD — superseded by this RFC (removed from repo; + see git history) - [THV-0008: Virtual MCP Server](./THV-0008-virtual-mcp-server.md) - [THV-0009: Remote MCP Server Proxy](./THV-0009-remote-mcp-proxy.md) - [THV-0010: MCPGroup CRD](./THV-0010-kubernetes-mcpgroup-crd.md) -- [THV-0014: K8s-Aware vMCP](./THV-0014-vmcp-k8s-aware-refactor.md) +- [THV-0014: K8s-Aware vMCP](./THV-0014-vmcp-k8s-aware-refactor.md) (merged) - [THV-0026: Header Passthrough](./THV-0026-header-passthrough.md) +- [MCP Specification 2025-11-25: Transports](https://modelcontextprotocol.io/specification/2025-11-25/basic/transports) +- [MCP Specification 2025-11-25: Lifecycle](https://modelcontextprotocol.io/specification/2025-11-25/basic/lifecycle) +- [RFC 8693: OAuth 2.0 Token Exchange](https://datatracker.ietf.org/doc/html/rfc8693) - [Kubernetes API Conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md) - [toolhive#3104](https://github.com/stacklok/toolhive/issues/3104) - [toolhive#4109](https://github.com/stacklok/toolhive/issues/4109) @@ -772,3 +1143,4 @@ separation. | Date | Reviewer | Decision | Notes | |------|----------|----------|-------| | 2026-03-18 | @ChrisJBurns, @jaosorior | Draft | Initial submission, supersedes THV-0055 | +| 2026-03-18 | Review agents | Revision | Address review feedback: CEL validation, type immutability, auth flows, session constraints, security hardening, implementation detail | From 34a463924ba65092c722f4c4a86b31b733baa71e Mon Sep 17 00:00:00 2001 From: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Date: Wed, 18 Mar 2026 18:30:19 +0000 Subject: [PATCH 09/15] Address review feedback on RFC-0057 MCPRemoteEndpoint - Fix CEL validation rule placement (struct-level, not field-level) - Add oldSelf null guard to type immutability rule - Correct auth flow: accurately describe dual-consumer externalAuthConfigRef in proxy mode and single-boundary direct mode - Remove incorrect Redis session store claim; single-replica is the only supported constraint for type:direct - Fix reconnection to require full MCP initialization handshake on HTTP 404, with Last-Event-ID resumption attempt before re-init - Add server-initiated notifications section (persistent GET stream) - Restrict embeddedAuthServer and awsSts for type:direct - Enumerate all MCPGroup controller changes including field indexer and RBAC markers - Remove false audience validation claim; add as Phase 2 requirement - Fix broken anchor cross-references - Document HTTPS enforcement as controller-side only - Specify allow-insecure annotation value - Add inline warning to addPlaintextHeaders YAML example - Add CA bundle MITM trust store warning - Add emergency credential rotation guidance - Remove remoteendpoint short name; keep mcpre only Co-Authored-By: Claude Sonnet 4.6 (1M context) --- ...premoteendpoint-unified-remote-backends.md | 1172 ++++++++--------- 1 file changed, 523 insertions(+), 649 deletions(-) diff --git a/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md b/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md index 34ee881..271463d 100644 --- a/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md +++ b/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md @@ -88,33 +88,32 @@ should be a configuration choice within it. - **Supporting stdio or container-based transports**: `MCPRemoteEndpoint` is exclusively for remote HTTP-based MCP servers. - **CLI mode support**: `MCPRemoteEndpoint` is a Kubernetes-only CRD. +- **Multi-replica vMCP with `type: direct`**: Session state is in-process only. + See [Session Constraints](#session-constraints-in-direct-mode). ## Mode Selection Guide -Use this table to choose between `type: proxy` and `type: direct`: - | Scenario | Recommended Mode | Why | -|----------|-----------------|-----| -| Public, unauthenticated remote (e.g., context7) | `direct` | No auth middleware needed; avoid unnecessary pod | -| Remote requiring only token exchange auth | `direct` | vMCP handles token exchange directly; one fewer hop | -| Remote requiring its own OIDC validation boundary | `proxy` | Proxy pod validates tokens independently of vMCP | +|---|---|---| +| Public, unauthenticated remote (e.g., context7) | `direct` | No auth middleware needed; no pod required | +| Remote requiring only token exchange auth | `direct` | vMCP handles token exchange; one fewer hop | +| Remote requiring its own OIDC validation boundary | `proxy` | Proxy pod validates tokens independently | | Remote requiring Cedar authz policies per-endpoint | `proxy` | Authz policies run in the proxy pod | | Remote needing audit logging at the endpoint level | `proxy` | Proxy pod has its own audit middleware | | Standalone use without VirtualMCPServer | `proxy` | Direct mode requires vMCP to function | | Many remotes where pod-per-remote is too costly | `direct` | No Deployment/Service/Pod per remote | -| Remotes behind strict egress NetworkPolicy | `proxy` | Blast radius is limited to the proxy pod's credentials | -**Rule of thumb:** Use `direct` for simple, public, or token-exchange-only remotes. -Use `proxy` when you need an independent auth/authz/audit boundary per remote. +**Rule of thumb:** Use `direct` for simple, public, or token-exchange-only +remotes. Use `proxy` when you need an independent auth/authz/audit boundary +per remote, or when the backend needs to be accessible standalone. ## Proposed Solution ### High-Level Design -`MCPRemoteEndpoint` is a single CRD with a `type` discriminator field. Fields -that are shared across both modes sit at the top level. Fields that only apply -to the proxy pod deployment are grouped under `proxyConfig` and are ignored -when `type: direct`. +`MCPRemoteEndpoint` is a single CRD with a `type` discriminator field. Shared +fields sit at the top level. Fields only applicable to the proxy pod are grouped +under `proxyConfig`. ```mermaid graph TB @@ -144,10 +143,8 @@ graph TB Client -->|Token: aud=vmcp| InAuth InAuth --> Router Router --> AuthMgr - AuthMgr -->|Via proxy pod| ProxyPod ProxyPod -->|Authenticated HTTPS| Remote1 - AuthMgr -->|Direct HTTPS| Remote2 DirectEntry -.->|Declares endpoint| Remote2 @@ -162,64 +159,119 @@ graph TB | Deploys proxy pod | Yes | No | | Own OIDC validation | Yes | No (vMCP handles this) | | Own authz policy | Yes | No | -| Own audit logging | Yes (proxy-level) | No (vMCP's audit middleware only — see [Audit Limitations](#audit-limitations-in-direct-mode)) | -| Standalone use (without vMCP) | Yes (proxy pod exposes its own Service and can be accessed directly by MCP clients; useful for single-remote deployments without vMCP aggregation) | No (requires vMCP to route traffic) | +| Own audit logging | Yes (proxy-level) | No (vMCP's audit middleware; see [Audit Limitations](#audit-limitations-in-direct-mode)) | +| Standalone use (without vMCP) | Yes | No | | Outgoing auth to remote | Yes (`externalAuthConfigRef`) | Yes (`externalAuthConfigRef`) | | Header forwarding | Yes (`headerForward`) | Yes (`headerForward`) | | Custom CA bundle | Yes (`caBundleRef`) | Yes (`caBundleRef`) | | Tool filtering | Yes (`toolConfigRef`) | Yes (`toolConfigRef`) | | GroupRef support | Yes | Yes | -| Multi-replica vMCP | Yes (session state is in the proxy pod) | Requires single-replica vMCP or shared session store (see [Session Constraints](#session-constraints-in-direct-mode)) | -| Credential blast radius | Isolated per proxy pod | All credentials in the vMCP pod (see [Security Considerations](#credential-blast-radius-in-direct-mode)) | +| Multi-replica vMCP | Yes | No — see [Session Constraints](#session-constraints-in-direct-mode) | +| Credential blast radius | Isolated per proxy pod | All credentials in vMCP pod — see [Security Considerations](#security-considerations) | ### Auth Flow Comparison -**`type: proxy` — vMCP routes through proxy pod:** +**`type: proxy` — two independent auth legs:** ``` -Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] - -> vMCP forwards client's validated token to proxy pod Service URL - -> Proxy pod [own OIDC validation + Cedar authz] - -> If externalAuthConfigRef (tokenExchange type): - proxy uses validated incoming client token as RFC 8693 - subject_token to obtain a service token for the remote - -> Proxy sends service token to Remote Server +Client --[aud=vmcp token]--> vMCP [validates token at incoming boundary] + --[externalAuthConfigRef credential]--> Proxy Pod + [proxy pod oidcConfig validates the incoming request] + [proxy pod applies externalAuthConfigRef as outgoing middleware] + --> Remote Server ``` -**Important: vMCP-to-proxy pod token flow.** vMCP forwards the client's -original `aud=vmcp` token to the proxy pod. The proxy pod independently -validates this token via its own `oidcConfig`. This is **not** token -passthrough in the MCP auth spec sense — the proxy pod is a separate -trust boundary that performs its own validation. The proxy's OIDC -configuration must accept the same issuer and audience as vMCP's incoming -auth, or the proxy must be configured with a compatible trust relationship. +`externalAuthConfigRef` on a `type: proxy` endpoint is read by two separate +consumers: -**`type: direct` — vMCP connects directly:** +1. **vMCP** reads it at backend discovery time (`discoverRemoteProxyAuthConfig()` + in `pkg/vmcp/workloads/k8s.go`). The resolved strategy is applied by vMCP's + `authRoundTripper` when making outgoing calls **to the proxy pod**. +2. **The proxy pod** reads the same field via the operator-generated RunConfig + (`AddExternalAuthConfigOptions()` in `mcpremoteproxy_runconfig.go`). The pod + applies it as outgoing middleware when forwarding requests **to the remote server**. + +`proxyConfig.oidcConfig` is a third, separate concern — it validates tokens +arriving at the proxy pod from vMCP. It is entirely independent of +`externalAuthConfigRef`. + +**`type: direct` — single auth boundary:** ``` -Client -> (token: aud=vmcp) -> vMCP [incoming auth boundary] - -> If externalAuthConfigRef (tokenExchange type): - vMCP uses the client's validated aud=vmcp token as the - RFC 8693 subject_token in a token exchange request - to obtain a service token for the remote - -> If externalAuthConfigRef (other types): - vMCP resolves credentials from the referenced config - -> vMCP sends service token / credentials to Remote Server +Client --[aud=vmcp token]--> vMCP [validates token at incoming boundary] + [vMCP applies externalAuthConfigRef as outgoing auth] + --> Remote Server ``` -**Token exchange operational requirements for `type: direct`:** -- The token exchange server (STS) must trust the IdP that issued the - client's `aud=vmcp` token (i.e., there must be a federation or trust - relationship between vMCP's IdP and the STS). -- The client token must remain valid for the duration of the token - exchange request. Short-lived tokens may expire during the exchange - window; configure sufficient token lifetime or use refresh tokens. -- The `audience` parameter in the token exchange request targets the - remote server. The STS must be configured to issue tokens for the - requested audience. +vMCP reads `externalAuthConfigRef` and applies it when calling the remote +server directly. For `type: tokenExchange`, the client's validated incoming +token is used as the RFC 8693 `subject_token` to obtain a service token for +the remote. The token exchange server must trust the IdP that issued the +client's token. + +**Token exchange operational requirements (`type: direct`):** +- The STS must be configured to accept subject tokens from vMCP's IdP. +- Configure `audience` in the `MCPExternalAuthConfig` to match the remote + server's expected audience claim. +- Client token lifetime should exceed the expected duration of the exchange + request. Exchanged tokens are managed by the `golang.org/x/oauth2` token + source and refreshed automatically on expiry per connection. + +**Unsupported `externalAuthConfigRef` types for `type: direct`:** + +The following types are **not valid** when `type: direct`: + +- **`embeddedAuthServer`**: Requires a running pod to host the OAuth2 server. + No pod exists in direct mode. +- **`awsSts`**: No converter is registered in vMCP's DefaultRegistry + (`pkg/vmcp/auth/converters`). The registry only registers `tokenExchange`, + `headerInjection`, and `unauthenticated`. Using `awsSts` in direct mode will + cause backend discovery to fail at runtime. + +The controller MUST reject these combinations at admission time via a CEL rule +or webhook. ### Detailed Design +#### CRD Validation Rules + +CEL `XValidation` rules in Kubebuilder are **struct-level** markers — placed on +the type being validated, not on a field within it. The pattern (from +`virtualmcpserver_types.go:88`): + +```go +// +kubebuilder:validation:XValidation:rule="...",message="..." +type StructName struct { ... } +``` + +The four rules for `MCPRemoteEndpoint`, placed on their correct owning types: + +```go +// MCPRemoteEndpointSpec struct-level rules: +// +// +kubebuilder:validation:XValidation:rule="self.type != 'direct' || !has(self.proxyConfig)",message="spec.proxyConfig must not be set when type is direct" +// +kubebuilder:validation:XValidation:rule="self.type != 'proxy' || has(self.proxyConfig)",message="spec.proxyConfig is required when type is proxy" +// +kubebuilder:validation:XValidation:rule="oldSelf == null || self.type == oldSelf.type",message="spec.type is immutable after creation" +// +//nolint:lll +type MCPRemoteEndpointSpec struct { ... } + +// MCPRemoteEndpointProxyConfig struct-level rule: +// +// +kubebuilder:validation:XValidation:rule="has(self.oidcConfig)",message="spec.proxyConfig.oidcConfig is required" +type MCPRemoteEndpointProxyConfig struct { ... } +``` + +**Important:** The `oldSelf == null` guard is required so the immutability rule +passes on object creation (when no previous state exists). Without it, the rule +will panic or be silently skipped on create depending on Kubernetes version. + +**HTTPS enforcement is controller-side only.** The `remoteURL` pattern marker +(`^https?://`) accepts both HTTP and HTTPS at admission time — consistent with +the existing pattern on `MCPRemoteProxy`. HTTP URLs are rejected by the +controller, which sets `ConfigurationValid=False` with reason `RemoteURLInvalid` +and emits a Warning event. This does NOT produce an admission error. + #### MCPRemoteEndpoint CRD ```yaml @@ -229,42 +281,43 @@ metadata: name: context7 namespace: default spec: - # REQUIRED: Connectivity mode — IMMUTABLE after creation - # - proxy: deploy a proxy pod with full auth middleware - # - direct: no pod; vMCP connects directly to remoteURL + # REQUIRED: Connectivity mode — IMMUTABLE after creation. + # Delete and recreate to change type. # +kubebuilder:validation:Enum=proxy;direct # +kubebuilder:default=proxy - # +kubebuilder:validation:XValidation:rule="self.type == oldSelf.type",message="spec.type is immutable" + # (immutability enforced by struct-level CEL rule, not here) type: direct - # REQUIRED: URL of the remote MCP server - # Must use HTTPS unless toolhive.stacklok.dev/allow-insecure annotation is set + # REQUIRED: URL of the remote MCP server. + # Must use HTTPS. HTTP accepted at admission but rejected by the controller + # unless toolhive.stacklok.dev/allow-insecure: "true" is set. # +kubebuilder:validation:Pattern=`^https?://` remoteURL: https://mcp.context7.com/mcp - # REQUIRED: Transport protocol - # streamable-http is RECOMMENDED for new deployments. - # sse is DEPRECATED (MCP spec 2025-11-25) and retained for legacy compatibility only. + # REQUIRED: Transport protocol. + # streamable-http is RECOMMENDED. sse is the legacy 2024-11-05 transport, + # retained for backwards compatibility with servers that have not yet migrated. # +kubebuilder:validation:Enum=streamable-http;sse transport: streamable-http - # REQUIRED: Group membership - # An MCPRemoteEndpoint without a group cannot be discovered by any - # VirtualMCPServer. + # REQUIRED: Group membership. groupRef: engineering-team # OPTIONAL: Auth for outgoing requests to the remote server. - # Applies to both modes: - # proxy: auth from the proxy pod to the remote - # direct: auth from vMCP to the remote - # Omit entirely for unauthenticated public remotes. + # In proxy mode: vMCP reads this for vMCP->proxy auth AND the proxy pod + # reads it for proxy->remote auth (two separate consumers, same field). + # In direct mode: vMCP reads this for vMCP->remote auth only. + # Omit for unauthenticated public remotes. + # NOT valid in direct mode: embeddedAuthServer, awsSts (see Auth Flow section). externalAuthConfigRef: name: salesforce-auth - # OPTIONAL: Header forwarding configuration. - # Applies to both modes. + # OPTIONAL: Header forwarding. Applies to both modes. headerForward: addPlaintextHeaders: + # WARNING: values stored in plaintext in etcd and visible via kubectl. + # Never put API keys, tokens, or secrets here. + # Use addHeadersFromSecret for sensitive values. X-Tenant-ID: "tenant-123" addHeadersFromSecret: - headerName: X-API-Key @@ -272,11 +325,10 @@ spec: name: remote-api-credentials key: api-key - # OPTIONAL: Custom CA bundle for private remote servers using - # internal or self-signed certificates. References a ConfigMap - # (not Secret) because CA certificates are public data, consistent - # with the kube-root-ca.crt pattern. - # Applies to both modes. + # OPTIONAL: Custom CA bundle (ConfigMap) for private remote servers. + # NOTE: CA bundle ConfigMaps are trust anchors. Protect them with RBAC — + # anyone with ConfigMap write access in the namespace can inject a malicious + # CA and intercept TLS traffic to this backend. caBundleRef: name: internal-ca-bundle key: ca.crt @@ -285,67 +337,37 @@ spec: toolConfigRef: name: my-tool-config - # Proxy pod configuration. + # OPTIONAL: Proxy pod configuration. # REQUIRED when type: proxy. MUST NOT be set when type: direct. - # - # CEL validation rules (on MCPRemoteEndpointSpec): - # +kubebuilder:validation:XValidation:rule="self.type == 'direct' ? !has(self.proxyConfig) : true",message="proxyConfig must not be set when type is direct" - # +kubebuilder:validation:XValidation:rule="self.type == 'proxy' ? has(self.proxyConfig) : true",message="proxyConfig is required when type is proxy" - # - # CEL validation rule (on ProxyConfig struct, following VirtualMCPServer IncomingAuthConfig pattern): - # +kubebuilder:validation:XValidation:rule="has(self.oidcConfig)",message="proxyConfig.oidcConfig is required" + # Validation is enforced by struct-level CEL rules on MCPRemoteEndpointSpec + # and MCPRemoteEndpointProxyConfig — not by field-level markers here. proxyConfig: - # REQUIRED within proxyConfig: OIDC for validating incoming tokens - oidcConfig: + oidcConfig: # REQUIRED within proxyConfig type: kubernetes - - # OPTIONAL: Authorization policy authzConfig: type: inline inline: policies: [...] - - # OPTIONAL: Audit logging audit: enabled: true - - # OPTIONAL: Observability telemetry: openTelemetry: enabled: true - - # OPTIONAL: Container resource limits resources: limits: cpu: "500m" memory: "128Mi" - - # OPTIONAL: Service account serviceAccount: my-service-account - - # OPTIONAL: Port to expose the proxy on # +kubebuilder:default=8080 proxyPort: 8080 - - # OPTIONAL: Session affinity for the proxy Service. - # NOTE: ClientIP is a rough approximation that breaks under NAT. - # The MCP Streamable HTTP spec requires session affinity based on - # the Mcp-Session-Id header. Future work should implement header-based - # affinity (e.g., via Ingress annotations or a Service mesh). - # ClientIP is the default because it is the only option supported - # natively by Kubernetes Services without an Ingress controller. # +kubebuilder:validation:Enum=ClientIP;None # +kubebuilder:default=ClientIP + # NOTE: ClientIP affinity is a rough approximation; Mcp-Session-Id + # header-based affinity is spec-correct but requires an ingress controller. sessionAffinity: ClientIP - - # OPTIONAL: Trust X-Forwarded-* headers from reverse proxies # +kubebuilder:default=false trustProxyHeaders: false - - # OPTIONAL: Path prefix for ingress routing scenarios endpointPrefix: "" - - # OPTIONAL: Metadata overrides for created resources resourceOverrides: {} ``` @@ -361,10 +383,9 @@ spec: remoteURL: https://mcp.context7.com/mcp transport: streamable-http groupRef: engineering-team - # No externalAuthConfigRef - public endpoint, no auth needed ``` -**Example: Authenticated remote with token exchange (direct mode):** +**Example: Token exchange auth (direct mode):** ```yaml apiVersion: toolhive.stacklok.dev/v1alpha1 @@ -377,10 +398,10 @@ spec: transport: streamable-http groupRef: engineering-team externalAuthConfigRef: - name: salesforce-token-exchange + name: salesforce-token-exchange # type: tokenExchange ``` -**Example: Standalone proxy with full auth middleware (proxy mode):** +**Example: Standalone proxy with auth middleware (proxy mode):** ```yaml apiVersion: toolhive.stacklok.dev/v1alpha1 @@ -390,10 +411,8 @@ metadata: spec: type: proxy remoteURL: https://internal-mcp.corp.example.com/mcp - transport: sse + transport: streamable-http groupRef: engineering-team - externalAuthConfigRef: - name: internal-api-auth proxyConfig: oidcConfig: type: kubernetes @@ -407,475 +426,418 @@ spec: #### CRD Metadata -``` -// +kubebuilder:resource:shortName=mcpre;remoteendpoint -// +kubebuilder:printcolumn:name="Type",type="string",JSONPath=".spec.type",description="Connectivity mode (proxy or direct)" +```go +// +kubebuilder:resource:shortName=mcpre +// +kubebuilder:printcolumn:name="Type",type="string",JSONPath=".spec.type" // +kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase" // +kubebuilder:printcolumn:name="Remote URL",type="string",JSONPath=".spec.remoteURL" // +kubebuilder:printcolumn:name="URL",type="string",JSONPath=".status.url" // +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" ``` -Short names: `mcpre`, `remoteendpoint` (following the pattern of `mcpg` for -MCPGroup, `vmcp` for VirtualMCPServer, `extauth` for MCPExternalAuthConfig). +Short name: `mcpre` (consistent with `mcpg` for MCPGroup, `vmcp` for +VirtualMCPServer, `extauth` for MCPExternalAuthConfig). + +#### The `toolhive.stacklok.dev/allow-insecure` Annotation + +Set `toolhive.stacklok.dev/allow-insecure: "true"` to bypass the HTTPS check +for development or cluster-internal HTTP endpoints. + +- **Accepted value:** `"true"` exactly. Any other value (or absence) enforces HTTPS. +- **Scope:** Per-resource. +- **Audit trail:** The controller emits a `Warning` event with reason + `InsecureTransport` on every reconciliation while the annotation is present. +- **Not in production:** HTTP traffic exposes credentials from `externalAuthConfigRef` + to interception. This annotation MUST NOT be used in production environments. #### Spec Fields **Top-level (both modes):** | Field | Type | Required | Description | -|-------|------|----------|-------------| -| `type` | enum | Yes | `proxy` or `direct`. Default: `proxy`. **Immutable after creation** — changing type would orphan infrastructure. Delete and recreate to change mode. | -| `remoteURL` | string | Yes | URL of the remote MCP server. Must use HTTPS unless `toolhive.stacklok.dev/allow-insecure` annotation is set. The `allow-insecure` annotation is intended only for development/testing with local MCP servers; it MUST NOT be used in production. When set, the controller emits a Warning event. | -| `transport` | enum | Yes | MCP transport protocol: `streamable-http` (recommended) or `sse` (deprecated per MCP spec 2025-11-25, retained for legacy compatibility). | -| `groupRef` | string | Yes | Name of the MCPGroup this endpoint belongs to. Plain string, consistent with `MCPServer.spec.groupRef` and `MCPRemoteProxy.spec.groupRef`. | -| `externalAuthConfigRef` | object | No | References an `MCPExternalAuthConfig` for outgoing auth to the remote server. For `tokenExchange` type configs, this is **not** a simple credential — it is middleware that uses the validated incoming client token as the RFC 8693 `subject_token` to obtain a service token for the remote. In `proxy` mode, the proxy pod performs the exchange. In `direct` mode, vMCP performs the exchange. Omit for unauthenticated endpoints. | -| `headerForward` | object | No | Header forwarding configuration. Reuses existing `HeaderForwardConfig` type. Applies to both modes. | -| `caBundleRef` | object | No | ConfigMap containing a custom CA certificate bundle for TLS verification. New field (not present on MCPRemoteProxy). See [CA Bundle Security](#ca-bundle-trust-store-considerations) for trust implications. Applies to both modes. | -| `toolConfigRef` | object | No | Tool filtering configuration. Applies to both modes. | +|---|---|---|---| +| `type` | enum | Yes | `proxy` or `direct`. Default: `proxy`. **Immutable after creation.** | +| `remoteURL` | string | Yes | URL of the remote MCP server. HTTP accepted at admission; rejected by controller unless `allow-insecure` annotation is set. | +| `transport` | enum | Yes | `streamable-http` (recommended) or `sse` (legacy 2024-11-05 transport). | +| `groupRef` | string | Yes | Name of the MCPGroup. | +| `externalAuthConfigRef` | object | No | Outgoing auth config. In proxy mode: read by both vMCP (vMCP→proxy) and the proxy pod (proxy→remote). In direct mode: read by vMCP only (vMCP→remote). Types `embeddedAuthServer` and `awsSts` are invalid in direct mode. | +| `headerForward` | object | No | Header injection. `addPlaintextHeaders` values are stored in plaintext in etcd — use `addHeadersFromSecret` for secrets. | +| `caBundleRef` | object | No | ConfigMap containing a custom CA bundle. Protect with RBAC — write access enables MITM. | +| `toolConfigRef` | object | No | Tool filtering. | **`proxyConfig` (only when `type: proxy`):** | Field | Type | Required | Description | -|-------|------|----------|-------------| -| `oidcConfig` | object | Yes | OIDC configuration for validating incoming tokens on the proxy pod. | -| `authzConfig` | object | No | Authorization policy for the proxy pod. | -| `audit` | object | No | Audit logging configuration. | +|---|---|---|---| +| `oidcConfig` | object | Yes | Validates tokens arriving at the proxy pod. | +| `authzConfig` | object | No | Cedar authorization policy. | +| `audit` | object | No | Audit logging for the proxy pod. | | `telemetry` | object | No | Observability configuration. | -| `resources` | object | No | Container resource requirements. | -| `serviceAccount` | string | No | Existing service account to use. Auto-created if unset. | -| `proxyPort` | int | No | Port to expose the proxy on. Default: 8080. | -| `sessionAffinity` | enum | No | `ClientIP` or `None`. Default: `ClientIP`. | +| `resources` | object | No | Container resource limits. | +| `serviceAccount` | string | No | Existing SA to use; auto-created if unset. | +| `proxyPort` | int | No | Port to expose. Default: 8080. | +| `sessionAffinity` | enum | No | `ClientIP` (default) or `None`. | | `trustProxyHeaders` | bool | No | Trust X-Forwarded-* headers. Default: false. | | `endpointPrefix` | string | No | Path prefix for ingress routing. | | `resourceOverrides` | object | No | Metadata overrides for created resources. | -#### The `toolhive.stacklok.dev/allow-insecure` Annotation - -By default, the controller rejects `remoteURL` values using plain HTTP. -Setting the annotation `toolhive.stacklok.dev/allow-insecure: "true"` on -the MCPRemoteEndpoint resource overrides this check. - -- **Accepted values:** `"true"` (any other value or absence enforces HTTPS). -- **Scope:** Per-resource. There is no cluster-wide override. -- **Audit trail:** The controller emits a Warning event with reason - `InsecureTransport` whenever it reconciles a resource with this annotation. -- **Security note:** HTTP URLs expose traffic to man-in-the-middle attacks. - This annotation is intended only for development/testing environments or - cluster-internal URLs where TLS termination happens at a load balancer. -- **Precedent:** This annotation is new — it does not exist on MCPRemoteProxy - or any other ToolHive CRD today. - #### Status Fields | Field | Type | Description | -|-------|------|-------------| +|---|---|---| | `conditions` | []Condition | Standard Kubernetes conditions. | -| `phase` | string | Current phase: `Pending`, `Ready`, `Failed`, `Terminating`. | -| `url` | string | URL where the endpoint can be accessed. For `type: proxy`, the internal cluster URL of the proxy service (set once the Deployment is ready). For `type: direct`, set to `spec.remoteURL` immediately upon successful validation. | -| `observedGeneration` | int64 | Most recent generation observed by the controller. | - -**`status.url` lifecycle:** For `type: proxy`, `status.url` is empty until the -proxy Deployment reaches `Ready` state. Backend discoverers (both static and -dynamic mode) MUST treat an empty `status.url` as "backend not yet available" -and skip the backend rather than removing it from the registry. This prevents -backends from being briefly removed during proxy pod startup or rolling -updates. For `type: direct`, `status.url` is set immediately after validation -succeeds, so this race does not apply. +| `phase` | string | `Pending`, `Ready`, `Failed`, `Terminating`. | +| `url` | string | For `type: proxy`: cluster-internal Service URL (set once Deployment is ready). For `type: direct`: set to `spec.remoteURL` immediately upon validation. | +| `observedGeneration` | int64 | Most recent generation reconciled. | + +**`status.url` lifecycle note:** For `type: proxy`, `status.url` is empty until +the proxy Deployment becomes ready. Backend discoverers (static and dynamic) +MUST treat an empty `status.url` as "backend not yet available" and skip the +backend — not remove it from the registry. For `type: direct`, `status.url` is +set immediately after validation, so this race does not apply. **Condition types:** | Type | Purpose | When Set | -|------|---------|----------| +|---|---|---| | `Ready` | Overall readiness | Always | -| `GroupRefValid` | Referenced MCPGroup exists | Always | -| `AuthConfigValid` | Referenced MCPExternalAuthConfig exists | When `externalAuthConfigRef` is set | -| `CABundleValid` | Referenced CA bundle ConfigMap exists | When `caBundleRef` is set | -| `DeploymentReady` | Proxy deployment is healthy | Only when `type: proxy` | -| `ConfigurationValid` | Spec has passed all validation checks | Always | +| `GroupRefValid` | MCPGroup exists | Always | +| `AuthConfigValid` | MCPExternalAuthConfig exists | When `externalAuthConfigRef` is set | +| `CABundleValid` | CA bundle ConfigMap exists | When `caBundleRef` is set | +| `DeploymentReady` | Proxy deployment healthy | Only when `type: proxy` | +| `ConfigurationValid` | All validation checks passed | Always | -There is intentionally **no `RemoteReachable` condition**. The controller should -NOT probe remote URLs because reachability from the operator pod does not imply -reachability from the vMCP pod, and probing external URLs expands the operator's -attack surface. +No `RemoteReachable` condition — the controller never probes remote URLs. #### Component Changes ##### Operator: MCPRemoteEndpoint Controller -**Pre-requisite: MCPRemoteProxy controller refactoring.** The existing -`mcpremoteproxy_controller.go` is 1,125 lines with all reconciliation logic -bound to `*mcpv1alpha1.MCPRemoteProxy` receiver methods. None of this logic -is directly extractable without refactoring. Before Phase 1 implementation, -the proxy reconciliation logic (Deployment/Service/ServiceAccount creation, -RBAC setup, health monitoring, status updates) must be extracted into a -shared `pkg/operator/remoteproxy/` package with functions that accept -interfaces or generic parameters rather than concrete CRD types. This -refactoring is scoped as Phase 0.5 in the implementation plan below. - -The controller has two code paths based on `spec.type`: +**Pre-requisite: extract shared proxy logic.** `mcpremoteproxy_controller.go` +is ~1,125 lines with all proxy reconciliation logic bound to +`*mcpv1alpha1.MCPRemoteProxy` methods. Before Phase 1, extract the +Deployment/Service/ServiceAccount/RBAC creation functions into a shared +`pkg/operator/remoteproxy/` package that accepts an interface rather than the +concrete type. `MCPRemoteProxyReconciler` is then refactored to use it. +This is a refactoring-only step with no API changes — all existing tests must +pass unchanged. This is scoped as Phase 0 step 4. -**`type: proxy` path** — uses the extracted shared proxy reconciliation logic: +**`type: proxy` path** — uses the extracted shared package: 1. Validates spec (OIDC config, group ref, auth config ref, CA bundle ref) 2. Ensures Deployment, Service, ServiceAccount, RBAC -3. Monitors deployment health and updates `Ready` condition -4. Sets `status.url` to the internal cluster service URL - -**`type: direct` path** — validation only, no infrastructure created: -1. Validates that the referenced MCPGroup exists. Sets `GroupRefValid` condition. -2. If `externalAuthConfigRef` is set, validates the referenced MCPExternalAuthConfig exists. Sets `AuthConfigValid` condition. -3. If `caBundleRef` is set, validates the referenced ConfigMap exists. Sets `CABundleValid` condition. -4. Validates HTTPS requirement; checks for `toolhive.stacklok.dev/allow-insecure` annotation if `remoteURL` uses HTTP. -5. If all validations pass, sets `Ready` to true and sets `status.url = spec.remoteURL`. +3. Monitors deployment health, updates `Ready` condition +4. Sets `status.url` to the cluster-internal Service URL -No finalizers are needed for `type: direct` because no infrastructure is created. -`type: proxy` uses the same finalizer pattern as the existing MCPRemoteProxy controller. +**`type: direct` path** — validation only, no infrastructure: +1. Validates MCPGroup exists; sets `GroupRefValid` +2. If `externalAuthConfigRef` set, validates it exists; sets `AuthConfigValid` +3. If `externalAuthConfigRef` type is `embeddedAuthServer` or `awsSts`, sets + `ConfigurationValid=False` with reason `UnsupportedAuthTypeForDirectMode` +4. If `caBundleRef` set, validates ConfigMap exists; sets `CABundleValid` +5. Validates HTTPS requirement; checks `allow-insecure` annotation +6. Sets `Ready=True` and `status.url = spec.remoteURL` -The controller watches MCPGroup, MCPExternalAuthConfig, and (for `type: proxy`) -Deployment resources via `EnqueueRequestsFromMapFunc` handlers, so that changes -to referenced resources trigger re-reconciliation. +No finalizers for `type: direct`. `type: proxy` uses the same finalizer pattern +as the existing MCPRemoteProxy controller. ##### Operator: MCPGroup Controller Update The MCPGroup controller currently watches MCPServer and MCPRemoteProxy. It must -be updated to also watch MCPRemoteEndpoint resources. The MCPRemoteProxy watch -is retained during the deprecation window and removed only when MCPRemoteProxy -is removed. - -**Status field changes (additive, not renaming):** New fields -`status.remoteEndpoints` and `status.remoteEndpointCount` are added alongside -the existing `status.remoteProxies` and `status.remoteProxyCount`. Both old -and new fields are populated during the deprecation window. The old fields are -removed only when MCPRemoteProxy support is removed. This preserves backward -compatibility for GitOps pipelines, monitoring dashboards, and jsonpath queries -that depend on the existing field names. - -**Required code changes (this is not a single bullet point):** -1. Register a field indexer for `MCPRemoteEndpoint.spec.groupRef` in the - operator's `main.go` `SetupFieldIndexers()`, following the existing pattern - for MCPServer and MCPRemoteProxy field indexers. -2. Add a new `findReferencingMCPRemoteEndpoints()` method, mirroring +be updated to also watch MCPRemoteEndpoint. The following changes are required +(this is not a single bullet point): + +1. Register a field indexer for `MCPRemoteEndpoint.spec.groupRef` in + `SetupFieldIndexers()` at manager startup — without this, `MatchingFields` + queries for MCPRemoteEndpoint silently return empty results. +2. Add `findReferencingMCPRemoteEndpoints()` mirroring the existing `findReferencingMCPRemoteProxies()`. -3. Add a new `findMCPGroupForMCPRemoteEndpoint()` watch mapper function. -4. Register the MCPRemoteEndpoint watch in `SetupWithManager()` via - `Watches(&mcpv1alpha1.MCPRemoteEndpoint{}, handler.EnqueueRequestsFromMapFunc(...))`. -5. Update `updateGroupMemberStatus()` to call `findReferencingMCPRemoteEndpoints()` - and populate the new status fields. -6. Update `handleListFailure()` and `handleDeletion()` to account for - MCPRemoteEndpoint members. -7. Add new RBAC markers: `+kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteendpoints,verbs=get;list;watch`. -8. Add `+kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteendpoints/status,verbs=get;update;patch`. -9. Update integration test suites (`mcp-group/suite_test.go` etc.) to register - the new field indexer. +3. Add `findMCPGroupForMCPRemoteEndpoint()` watch mapper. +4. Register the watch in `SetupWithManager()` via + `Watches(&mcpv1alpha1.MCPRemoteEndpoint{}, ...)`. +5. Update `updateGroupMemberStatus()` to call the new function and populate + new status fields. +6. Update `handleListFailure()` and `handleDeletion()` for MCPRemoteEndpoint + membership. +7. Add RBAC markers — without these the operator gets a Forbidden error at + runtime: + ``` + // +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteendpoints,verbs=get;list;watch + // +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteendpoints/status,verbs=get;update;patch + ``` + +**Status fields (additive — no renames):** New fields `status.remoteEndpoints` +and `status.remoteEndpointCount` are added alongside the existing +`status.remoteProxies` and `status.remoteProxyCount`. Both are populated during +the deprecation window. Old fields are removed only when MCPRemoteProxy is +removed. This preserves backward compatibility for existing jsonpath queries +and monitoring dashboards. ##### Operator: VirtualMCPServer Controller Update -**Static mode:** The ConfigMap generated by the operator must include -MCPRemoteEndpoint backends. For `type: proxy`, the backend URL is the proxy -service URL (same as MCPRemoteProxy today). For `type: direct`, the backend URL -is `spec.remoteURL`. +**`StaticBackendConfig` schema change required.** The current +`StaticBackendConfig` struct in `pkg/vmcp/config/config.go` has only `Name`, +`URL`, `Transport`, and `Metadata`. The vMCP binary uses `KnownFields(true)` +strict YAML parsing. Writing new fields (`Type`, `CABundlePath`, `Headers`) +to the ConfigMap before updating the vMCP binary will cause a startup failure. + +Implementation order: +1. Add `Type`, `CABundlePath`, and `Headers` fields to `StaticBackendConfig` +2. Update the vMCP binary and the roundtrip test in + `pkg/vmcp/config/crd_cli_roundtrip_test.go` +3. Deploy the updated vMCP image **before** the operator starts writing these + fields — co-ordinate Helm chart version bumping accordingly + +Additional touch points: +- `listMCPRemoteEndpointsAsMap()` — new function for ConfigMap generation +- `getExternalAuthConfigNameFromWorkload()` — add MCPRemoteEndpoint case +- `headerForward.addHeadersFromSecret` requires K8s Secret resolution at + ConfigMap generation time (static mode has no K8s API access at runtime) +- `vmcp.Backend` needs a `Headers` field for resolved header values +- Deployment volume mount logic for `caBundleRef` ConfigMaps + +**CA bundle in static mode:** The operator mounts the `caBundleRef` ConfigMap +as a volume into the vMCP pod at `/etc/toolhive/ca-bundles//ca.crt`. +The generated backend ConfigMap includes the mount path so vMCP can construct +the correct `tls.Config`. Pod restart is required when a CA bundle changes in +static mode. + +##### vMCP: Backend Discovery Update -**StaticBackendConfig schema change required:** The current `StaticBackendConfig` -struct in `pkg/vmcp/config/config.go` has only `Name`, `URL`, `Transport`, and -`Metadata` fields. It does **not** have a `Type` field. Since vMCP uses -`KnownFields(true)` strict YAML parsing (see `pkg/vmcp/config/yaml_loader.go:44`), -writing a `type` field to the ConfigMap before updating the vMCP binary will -cause a startup failure. The implementation must: -1. Add a `Type` field to `StaticBackendConfig` (values: `container`, `proxy`, `direct`) -2. Add optional `CABundlePath`, `Headers`, and auth-related fields -3. Update the vMCP binary **before** the operator starts writing these fields -4. Phase the rollout: operator Helm chart and vMCP image must be updated together +Add `WorkloadTypeMCPRemoteEndpoint` to `pkg/vmcp/workloads/discoverer.go`. -Updated `StaticBackendConfig` fields needed: +Extend `ListWorkloadsInGroup()` and `GetWorkloadAsVMCPBackend()` in +`pkg/vmcp/workloads/k8s.go`. For MCPRemoteEndpoint: +- `type: proxy` — uses `status.url` (proxy Service URL), same as MCPRemoteProxy +- `type: direct` — uses `spec.remoteURL` directly -```yaml -backends: - # From MCPServer resources (unchanged) - - name: github-mcp - url: http://github-mcp.default.svc:8080 - transport: sse - # type field omitted for backward compat with existing MCPServer backends - - # From MCPRemoteEndpoint type: proxy - - name: internal-api-mcp - url: http://internal-api-mcp.default.svc:8080 - transport: sse - - # From MCPRemoteEndpoint type: direct - - name: context7 - url: https://mcp.context7.com/mcp - transport: streamable-http - # No auth - public endpoint - - - name: salesforce-mcp - url: https://mcp.salesforce.com - transport: streamable-http - # Auth resolved by operator and embedded in config -``` +**Name collision handling:** `fetchBackendResource()` in +`pkg/vmcp/k8s/backend_reconciler.go` tries resources in order: MCPServer → +MCPRemoteProxy → MCPRemoteEndpoint. Same-name resources across types in the +same namespace always resolve to the first match. Log a warning when a +collision is detected. -**Additional touch points for static mode:** -- `listMCPRemoteEndpointsAsMap()` — new function to list MCPRemoteEndpoint - resources for ConfigMap generation. -- `getExternalAuthConfigNameFromWorkload()` — must handle MCPRemoteEndpoint - in addition to MCPServer and MCPRemoteProxy. -- `addHeadersFromSecret` in `headerForward` requires K8s Secret resolution at - ConfigMap generation time. The operator must resolve secrets and embed header - values (or mount them as environment variables) since vMCP in static mode - cannot access the K8s API. Neither `vmcp.Backend` nor `StaticBackendConfig` - currently has a `Headers` field — this must be added. -- Deployment volume mount logic must be updated for CA bundles. - -**CA bundle propagation in static mode:** For `type: direct` endpoints with -`caBundleRef`, the operator mounts the referenced ConfigMap as a volume into the -vMCP pod at `/etc/toolhive/ca-bundles//ca.crt`. The generated -backend ConfigMap includes the mount path so vMCP can construct the correct -`tls.Config` at startup. +##### vMCP: HTTP Client for Direct Mode -##### vMCP: Backend Discovery Update +For `type: direct` backends: +1. Use system CA pool by default; optionally append `caBundleRef` CA bundle +2. Enforce TLS 1.2 minimum +3. Apply `externalAuthConfigRef` credentials via `authRoundTripper` +4. Inject `MCP-Protocol-Version: ` on every HTTP request + after initialization — this is a MUST per MCP spec 2025-11-25 and applies + to both POST (tool calls) and GET (server notification stream) requests -A new `WorkloadTypeMCPRemoteEndpoint` constant must be added to -`pkg/vmcp/workloads/discoverer.go` alongside the existing -`WorkloadTypeMCPServer` and `WorkloadTypeMCPRemoteProxy`. +##### vMCP: Reconnection Handling for Direct Mode -`ListWorkloadsInGroup()` in `pkg/vmcp/workloads/k8s.go` is extended to list -MCPRemoteEndpoint resources alongside MCPServer resources. `GetWorkloadAsVMCPBackend()` -gains a new `WorkloadTypeMCPRemoteEndpoint` case that: +When a `type: direct` backend connection drops, vMCP follows this sequence +per MCP spec 2025-11-25: -1. Fetches the MCPRemoteEndpoint resource. -2. Converts based on type: - - `type: proxy` — uses `status.url` (the proxy service URL), same as current MCPRemoteProxy handling. - - `type: direct` — uses `spec.remoteURL` directly, resolves auth config, header forwarding, and CA bundle. -3. Applies `externalAuthConfigRef`, `headerForward`, and `caBundleRef` at the - vMCP layer (for `type: direct`) or leaves them for the proxy pod (for `type: proxy`). +1. **Attempt stream resumption (SHOULD).** If the backend previously issued SSE + event IDs, vMCP SHOULD issue an HTTP GET with `Last-Event-ID` set to the + last received event ID before re-initializing. If the connection recovers + and the session remains valid, no re-initialization is needed. -**Name collision handling:** `fetchBackendResource()` in -`pkg/vmcp/k8s/backend_reconciler.go` currently tries MCPServer first, then -MCPRemoteProxy, returning the first match. With MCPRemoteEndpoint added as a -third type, the resolution order becomes: MCPServer → MCPRemoteProxy → -MCPRemoteEndpoint. Resources with the same `metadata.name` across different -types in the same namespace will always resolve to the first match in this -priority order. This is a known limitation. Operators should use distinct names -for resources across types. The controller should log a warning if a name -collision is detected during reconciliation. +2. **Exponential backoff.** Initial: 1s, cap: 30s, jitter recommended. + If the backend sends a `retry` field in an SSE event, that value overrides + the local backoff for that attempt. -##### vMCP: HTTP Client for Direct Mode +3. **Full re-initialization on HTTP 404 or session loss.** If HTTP 404 is + returned on a request carrying an `MCP-Session-Id`, discard all session state + and execute the full handshake: + ``` + POST initialize request → InitializeResult (new MCP-Session-Id) + POST notifications/initialized + ``` + After initialization, re-discover ALL capabilities advertised in the new + `InitializeResult` (tools, resources, prompts as applicable). Results from + the prior session MUST NOT be reused. -Backends of `type: direct` connect to external URLs over HTTPS. The vMCP HTTP -client is updated to: +4. **Re-establish GET stream.** See section below. -1. Use the system CA certificate pool by default. -2. Optionally append a custom CA bundle from `caBundleRef` to the system pool. -3. Enforce a minimum TLS version of 1.2. -4. Apply `externalAuthConfigRef` credentials to outgoing requests. -5. Inject the `MCP-Protocol-Version` header on all post-initialisation requests - with the version negotiated during the MCP `initialize` handshake. This is - required by the MCP Streamable HTTP specification. -6. Track and send the `Mcp-Session-Id` header received from the remote server - on all subsequent requests within the same session. +5. **Circuit breaker.** After 5 consecutive failed attempts, mark the backend + `unavailable` and open the circuit breaker. The resource transitions to the + `Failed` phase. A half-open probe at 60-second intervals tests recovery. -##### vMCP: Reconnection Handling for Direct Mode +##### vMCP: Server-Initiated Notifications in Direct Mode -When vMCP's connection to a `type: direct` remote server is interrupted, the -following reconnection behaviour applies (per MCP Streamable HTTP spec): +vMCP acts as an MCP **client** toward each `type: direct` backend and MUST +maintain a persistent HTTP GET SSE stream to each backend for server-initiated +messages. -1. vMCP detects connection loss (HTTP error, timeout, or stream termination). -2. vMCP retries the request with exponential backoff (initial: 1s, max: 30s, - jitter: +/- 25%). -3. If the remote server responds with HTTP 404 to a request carrying an - `Mcp-Session-Id`, this indicates the session has expired. vMCP MUST: - a. Discard the existing session state. - b. Re-run the `initialize` → `initialized` handshake. - c. Re-fetch the tool list via `tools/list`. - d. Update the backend's tool inventory in the dynamic registry. -4. If the remote server is unreachable after max retries, the backend is marked - as `unavailable` in vMCP's health status and the circuit breaker opens. +After initialization (after sending `notifications/initialized`), vMCP MUST +issue: +``` +GET +Accept: text/event-stream +MCP-Session-Id: (if the server issued one) +MCP-Protocol-Version: 2025-11-25 (MUST be included per spec) +``` -This reconnection logic reuses the existing circuit breaker and health check -infrastructure in `pkg/vmcp/`. +Notifications vMCP MUST handle: -##### Session Constraints in Direct Mode +| Notification | Action | +|---|---| +| `notifications/tools/list_changed` | Re-fetch `tools/list`, update routing table | +| `notifications/resources/list_changed` | Re-fetch `resources/list`, update routing table | +| `notifications/prompts/list_changed` | Re-fetch `prompts/list`, update routing table | + +vMCP MUST only act on notifications for capabilities advertised with +`listChanged: true` in the `InitializeResult`. Other notifications should be +logged and discarded. -MCP Streamable HTTP sessions are stateful: the remote server issues an -`Mcp-Session-Id` header that must be sent on all subsequent requests. vMCP's -session manager (`pkg/transport/session/`) currently uses local in-process -storage (`storage_local.go`). - -**Constraint: `type: direct` requires single-replica vMCP or a shared session -store.** Multiple vMCP replicas cannot share MCP session state with the current -local storage implementation. If a different replica handles a follow-up -request, the remote server will reject it (HTTP 404). - -**Mitigation options (choose one at implementation time):** -- **Single-replica (default):** Document that `type: direct` backends require - `replicas: 1` on the VirtualMCPServer Deployment. This is acceptable for - many deployments and avoids infrastructure complexity. -- **Shared session store:** Implement a Redis-backed session storage backend - in `pkg/transport/session/`. The infrastructure pattern already exists - (the storage interface in `storage.go` supports alternative implementations). - This is a follow-up if multi-replica `type: direct` is needed. +The GET stream MUST be re-established as step 4 of the reconnection sequence +above. If it cannot be established, the backend follows the circuit breaker path. ##### vMCP: Dynamic Mode Reconciler Update -The `BackendReconciler` in `pkg/vmcp/k8s/backend_reconciler.go` currently -watches MCPServer and MCPRemoteProxy. It is extended to also watch -MCPRemoteEndpoint, following the same `EnqueueRequestsFromMapFunc` pattern. -The `fetchBackendResource()` method gains a third resource type to try -(see [Name collision handling](#vmcp-backend-discovery-update) above for -resolution order). +Extend `BackendReconciler` in `pkg/vmcp/k8s/backend_reconciler.go` to watch +MCPRemoteEndpoint using the same `EnqueueRequestsFromMapFunc` pattern. +`fetchBackendResource()` gains a third type to try (see resolution order above). + +##### Session Constraints in Direct Mode + +**Why multi-replica fails.** The MCP `Mcp-Session-Id` is stored in +`LocalStorage`, which is a `sync.Map` held entirely in process memory +(`pkg/transport/session/storage_local.go`). A second vMCP replica has no +knowledge of sessions established by the first, causing HTTP 400 or 404 errors +on routed requests. + +**Single-replica is the only supported constraint.** `type: direct` endpoints +MUST be deployed with `replicas: 1` on the VirtualMCPServer Deployment. + +**No distributed session backend exists.** `pkg/transport/session/storage.go` +defines a `Storage` interface that is Redis-compatible. The serialization +helpers in `serialization.go` are explicitly marked +`// nolint:unused // Will be used in Phase 4 for Redis/Valkey storage`. However, +no Redis implementation of `session.Storage` exists in the codebase — the Redis +code in `pkg/authserver/storage/redis.go` is for a different purpose (OAuth +server state via fosite) and is unrelated. A distributed session backend would +need to be built from scratch as a new `session.Storage` implementation and is +out of scope for this RFC. ## Security Considerations ### Threat Model | Threat | Description | Mitigation | -|--------|-------------|------------| +|---|---|---| | MITM on remote connection | Attacker intercepts vMCP-to-remote traffic | HTTPS required by default; custom CA bundles for private CAs | -| Credential exposure | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets, referenced via `externalAuthConfigRef` and `headerForward.addHeadersFromSecret`; never inline | -| SSRF via remoteURL | Attacker (or compromised workload with CRD write access) sets `remoteURL` to internal targets | See [SSRF Mitigation](#ssrf-mitigation) below | -| Auth config confusion | Wrong tokens sent to wrong endpoints | Eliminated: `externalAuthConfigRef` at the top level has one unambiguous purpose — auth to the remote server | -| Operator probing external URLs | Controller makes network requests to untrusted URLs | Eliminated: controller performs validation only, no network probing | -| Expanded vMCP egress surface | vMCP pod makes outbound calls to arbitrary URLs in `type: direct` mode | Acknowledged trade-off. See [Credential Blast Radius](#credential-blast-radius-in-direct-mode) below | -| Credential blast radius | Single vMCP pod compromise yields all backend credentials | See [Credential Blast Radius](#credential-blast-radius-in-direct-mode) below | -| Trust store injection via CA bundle | ConfigMap write access allows injecting a malicious CA | See [CA Bundle Trust Store](#ca-bundle-trust-store-considerations) below | - -#### SSRF Mitigation - -RBAC alone is insufficient when the threat model includes a compromised workload -that has been granted CRD write access (e.g., a CI pipeline service account). -The following mitigations are **required**: - -1. **NetworkPolicy (REQUIRED):** The Helm chart MUST include a default - NetworkPolicy for the vMCP pod that blocks egress to: +| Credential exposure | Auth secrets visible in CRD manifest | Credentials stored in K8s Secrets; never inline. `addPlaintextHeaders` stores values in plaintext in etcd — use `addHeadersFromSecret` for sensitive values | +| SSRF via remoteURL | Compromised workload with CRD write access sets `remoteURL` to internal targets | RBAC + NetworkPolicy (see below) | +| Auth config confusion | Wrong credentials sent to wrong backend | Eliminated in direct mode: `externalAuthConfigRef` has one purpose (vMCP→remote). In proxy mode: see Auth Flow for the dual-consumer behaviour | +| Operator probing external URLs | Controller makes network requests to untrusted URLs | Eliminated: validation only, no probing | +| Expanded vMCP egress | vMCP pod makes outbound calls in direct mode | Acknowledged trade-off. See Credential Blast Radius below | +| Trust store injection | ConfigMap write access allows injecting malicious CA | CA bundle ConfigMaps are trust anchors; protect with RBAC | +| Token audience confusion | Exchanged token has broader scope than intended | Post-exchange audience validation MUST be implemented — see Phase 2 | + +### SSRF Mitigation + +RBAC alone is insufficient when threat actors include compromised workloads with +CRD write access. The following are required: + +1. **NetworkPolicy (REQUIRED):** Default egress rules for the vMCP pod MUST + block: - `169.254.169.254/32` (AWS/GCP/Azure IMDS) - `fd00:ec2::254/128` (AWS IMDSv2 IPv6) - - `kubernetes.default.svc` and `kubernetes.default.svc.cluster.local` - - The operator pod's Service CIDR (to prevent CRD self-modification loops) + - The Kubernetes API server (`kubernetes.default.svc` and its cluster IP) - These rules MUST be applied by default with an opt-out annotation for - environments that intentionally target these addresses. + These specific targets are high-value for credential theft and have no + legitimate use as MCP backends. RFC 1918 ranges (`10.x`, `172.16-31.x`, + `192.168.x`) are deliberately NOT blocked because `type: direct` legitimately + targets internal/corporate MCP servers. RBAC is the appropriate control for + general RFC 1918 access. -2. **RBAC (REQUIRED):** Only cluster administrators or platform team service +2. **RBAC (REQUIRED):** Only cluster administrators or trusted platform service accounts should have `create`/`update` permissions on MCPRemoteEndpoint. -3. **CEL-based IP blocking (intentionally omitted):** `type: direct` - legitimately targets internal/corporate MCP servers on RFC 1918 addresses. - Blocking all private IPs at the admission level would break valid use cases. - NetworkPolicy is the correct layer for targeted SSRF protection. +### Credential Blast Radius in Direct Mode -#### Credential Blast Radius in Direct Mode +In `type: proxy` mode, each proxy pod holds credentials for exactly one backend. +A compromised proxy pod yields credentials for one service. -In `type: proxy` mode, each proxy pod holds only its own backend's credentials. -A compromised proxy pod yields credentials for one backend. +In `type: direct` mode, the vMCP pod holds credentials for every direct backend +simultaneously. A compromised vMCP pod yields credentials for all backends. -In `type: direct` mode, the vMCP pod holds credentials for **every** direct -backend simultaneously (resolved via `externalAuthConfigRef` at runtime). A -single vMCP pod compromise yields all backend credentials. +**Recommendation for high-security environments:** Use `type: proxy` for +sensitive-credential backends. Reserve `type: direct` for unauthenticated or +low-sensitivity backends. Consider dedicated VirtualMCPServer instances (and +therefore dedicated MCPGroups) to isolate high-sensitivity backends. -**Additional risk — credential confusion:** If two `type: direct` backends -share the same token exchange server (STS) but target different audiences, a -bug in audience parameter handling could cause vMCP to send a token scoped for -backend A to backend B. Each `externalAuthConfigRef` must specify the target -audience explicitly, and the token exchange implementation must validate that -the returned token's audience matches the intended backend. +### CA Bundle Trust Store Considerations -**Recommendation for high-security environments:** Use `type: proxy` for -backends with sensitive credentials. Use `type: direct` only for -unauthenticated or low-sensitivity backends. Document this trade-off in the -user guide. - -#### CA Bundle Trust Store Considerations - -`caBundleRef` references a ConfigMap containing CA certificates. While CA -certificates are public data (following the `kube-root-ca.crt` pattern), -ConfigMap write access is a trust decision: anyone who can write to the -referenced ConfigMap can inject a malicious CA certificate, enabling MITM -attacks against the remote server. - -**Mitigations:** -- The `caBundleRef` ConfigMap SHOULD be in a restricted namespace or protected - by RBAC so that only trusted operators can modify it. -- The controller SHOULD emit a Warning event if the referenced ConfigMap is - in a different namespace than the MCPRemoteEndpoint (not currently possible - since cross-namespace references are not supported, but worth noting for - future-proofing). -- Documentation MUST note that CA bundle ConfigMaps are security-sensitive - despite containing "public" data. - -#### Audit Limitations in Direct Mode - -In `type: proxy` mode, the proxy pod's audit middleware logs: -- Incoming request details (client identity, tool invoked, timestamp) -- Outgoing request to the remote (URL, auth outcome, response status) - -In `type: direct` mode, vMCP's existing audit middleware covers incoming client -requests but does **not** currently log: -- The remote URL contacted for each backend request -- The outgoing auth outcome (token exchange success/failure) -- The remote server's HTTP response status - -**Required enhancement:** vMCP's audit middleware must be extended for -`type: direct` backends to log the remote URL, outgoing auth method, and -remote HTTP status code. This is scoped as part of Phase 2 implementation. +CA bundle ConfigMaps are trust anchors, not merely public data. Anyone with +`configmaps:update` in the namespace can inject a malicious CA certificate, +enabling MITM attacks against all `type: direct` backends referencing that +ConfigMap. CA bundle ConfigMaps MUST be protected with the same RBAC rigour as +the MCPRemoteEndpoint resource itself. -### Authentication and Authorization +### Audit Limitations in Direct Mode -- **No new auth primitives**: `MCPRemoteEndpoint` reuses the existing - `MCPExternalAuthConfig` CRD and `externalAuthConfigRef` pattern. -- **Single boundary in direct mode**: vMCP's incoming auth validates client - tokens. `externalAuthConfigRef` handles outgoing auth to the remote. For - `tokenExchange` type configs, this is middleware that uses the validated - client token as the RFC 8693 `subject_token` — not a static credential. - See the [Auth Flow Comparison](#auth-flow-comparison) for the full flow. -- **Full auth stack in proxy mode**: identical to existing MCPRemoteProxy — - OIDC validation, authz policy, token exchange all apply. vMCP forwards the - client's original `aud=vmcp` token to the proxy pod's Service URL; the proxy - pod independently validates this token via its own `oidcConfig`. +In `type: proxy` mode, the proxy pod logs: incoming request details, outgoing +URL, auth outcome, and remote response status. + +In `type: direct` mode, vMCP's existing audit middleware logs incoming client +requests but does **not** currently log: the remote URL contacted, outgoing auth +outcome, or remote HTTP response status. This is a known gap. + +**Required enhancement (Phase 2):** vMCP's audit middleware must be extended for +`type: direct` backends to log the remote URL, auth method, and remote HTTP +status code. ### Secrets Management -- **Dynamic mode**: vMCP reads secrets at runtime via K8s API (namespace-scoped RBAC). -- **Static mode**: Operator mounts credential secrets as environment variables; - CA bundle ConfigMaps are mounted as volumes at well-known paths - (`/etc/toolhive/ca-bundles//ca.crt`). -- Secret rotation in dynamic mode: watch-based propagation, no pod restart needed. -- Secret rotation in static mode: requires Deployment rollout. +- **Dynamic mode**: vMCP reads secrets at runtime via K8s API. +- **Static mode**: Credentials mounted as environment variables; CA bundles + mounted as volumes. +- **Routine secret rotation** (static mode): Deployment rollout — old pods + continue serving until replaced. +- **Emergency revocation** (compromised credential): Use `strategy: Recreate` + on the VirtualMCPServer Deployment, or trigger `kubectl rollout restart` + immediately. RollingUpdate leaves old pods running with the revoked credential + until replacement completes. + +### Authentication and Authorization + +- **No new auth primitives**: Reuses existing `MCPExternalAuthConfig` CRD. +- **Direct mode**: vMCP validates incoming client tokens; `externalAuthConfigRef` + handles outgoing auth to the remote. Single, unambiguous boundary. +- **Proxy mode**: Two independent boundaries — see Auth Flow Comparison for + the dual-consumer behaviour of `externalAuthConfigRef`. +- **Post-exchange audience validation**: The current token exchange implementation + (`pkg/auth/tokenexchange/exchange.go`) does not validate the `aud` claim of + the returned token against the configured `audience` parameter. This MUST be + implemented before `type: direct` is considered secure for multi-backend + deployments. Scoped to Phase 2. ## Deprecation `MCPRemoteProxy` is deprecated as of this RFC. -**Note on deprecation mechanism:** The `+kubebuilder:deprecatedversion` -annotation only works for deprecating API versions within the same CRD (e.g., -`v1alpha1` → `v1beta1` of MCPRemoteEndpoint). It cannot deprecate one CRD in -favour of a different CRD. The deprecation of MCPRemoteProxy in favour of -MCPRemoteEndpoint is communicated through: -1. Controller-emitted Warning events on every MCPRemoteProxy reconciliation -2. A deprecation annotation on the MCPRemoteProxy CRD (`deprecated: "true"`) -3. Documentation updates directing users to MCPRemoteEndpoint +**Note on deprecation mechanism:** `+kubebuilder:deprecatedversion` only +deprecates API versions within the same CRD. It cannot deprecate one CRD in +favour of a different CRD. The deprecation is communicated via: +1. Warning events emitted on every MCPRemoteProxy `Reconcile()` call +2. A `deprecated: "true"` field in the CRD description +3. Documentation updates **Timeline:** | Phase | Trigger | What Happens | -|-------|---------|-------------| -| Deprecation announced | This RFC merges | MCPRemoteProxy controller emits Warning events on every `Reconcile()`. CRD description updated. Documentation updated. | -| Feature freeze | MCPRemoteEndpoint Phase 1 merged | MCPRemoteProxy receives no new features. Bug fixes and security patches only. | -| Migration window | MCPRemoteEndpoint reaches GA | Minimum 2 minor releases (e.g., v0.X → v0.X+2) for users to migrate. Migration guide published. | -| Removal | After migration window | MCPRemoteProxy CRD, controller, Helm templates, and RBAC rules removed. | - -The removal date will be set to a specific release version once MCPRemoteEndpoint -reaches GA. "Two-release window" means two minor version releases of the -ToolHive operator, not calendar time. +|---|---|---| +| Announced | This RFC merges | Warning events on every reconcile; CRD description updated | +| Feature freeze | MCPRemoteEndpoint Phase 1 merged | Bug fixes and security patches only for MCPRemoteProxy | +| Migration window | MCPRemoteEndpoint reaches GA | Minimum 2 minor ToolHive operator releases | +| Removal | After migration window | CRD, controller, Helm templates, RBAC removed | ### Migration: MCPRemoteProxy → MCPRemoteEndpoint | `MCPRemoteProxy` field | `MCPRemoteEndpoint` equivalent | Notes | |---|---|---| | `spec.remoteURL` | `spec.remoteURL` | | -| `spec.port` | `spec.proxyConfig.proxyPort` | `spec.port` is deprecated on MCPRemoteProxy; use `proxyPort` | +| `spec.port` (deprecated) | `spec.proxyConfig.proxyPort` | Use `proxyPort` | | `spec.proxyPort` | `spec.proxyConfig.proxyPort` | | | `spec.transport` | `spec.transport` | | | `spec.groupRef` | `spec.groupRef` | | -| `spec.externalAuthConfigRef` | `spec.externalAuthConfigRef` | | +| `spec.externalAuthConfigRef` | `spec.externalAuthConfigRef` | See Auth Flow — dual-consumer behaviour preserved | | `spec.headerForward` | `spec.headerForward` | | | `spec.toolConfigRef` | `spec.toolConfigRef` | | | `spec.oidcConfig` | `spec.proxyConfig.oidcConfig` | | @@ -888,242 +850,153 @@ ToolHive operator, not calendar time. | `spec.trustProxyHeaders` | `spec.proxyConfig.trustProxyHeaders` | | | `spec.endpointPrefix` | `spec.proxyConfig.endpointPrefix` | | | `spec.resourceOverrides` | `spec.proxyConfig.resourceOverrides` | | -| *(not present)* | `spec.type` | Set to `proxy` for equivalent behaviour | -| *(not present)* | `spec.caBundleRef` | New field; not available on MCPRemoteProxy | +| *(not present)* | `spec.type` | Set to `proxy` | +| *(not present)* | `spec.caBundleRef` | New field; not on MCPRemoteProxy | ## Alternatives Considered -### Alternative 1: Keep MCPServerEntry Alongside MCPRemoteProxy (THV-0055) - -Introduce a new `MCPServerEntry` CRD for `type: direct` behaviour while -retaining `MCPRemoteProxy` unchanged. - -**Pros:** -- No breaking changes, no deprecation work -- MCPRemoteProxy remains "focused" on proxy pod management - -**Cons:** -- Two CRDs serving the same high-level goal ("connect to a remote server") with - overlapping fields (`remoteURL`, `groupRef`, `externalAuthConfigRef`, - `headerForward`) -- Users must choose between them before writing YAML; the distinction is subtle - and non-obvious -- Grows the long-term CRD surface area -- Requires new controller, new RBAC rules, Helm updates, MCPGroup controller - changes, BackendReconciler changes, new tests — roughly 20 files - -**Why not chosen:** Two CRDs with overlapping goals increases cognitive load -rather than reducing it. The goal of connecting to a remote server should be -one resource; the mechanism is a configuration choice within it. - -### Alternative 2: Add `direct: true` Flag to MCPRemoteProxy +### Alternative 1: MCPServerEntry Alongside MCPRemoteProxy (THV-0055) -Add a boolean field to MCPRemoteProxy that skips pod creation and uses -`remoteURL` as the backend URL directly. +**Why not chosen:** Two CRDs with overlapping goals (`remoteURL`, `groupRef`, +`externalAuthConfigRef`, `headerForward` on both) increases cognitive load and +long-term CRD surface area. -**Pros:** -- No new CRD -- Smallest implementation footprint +### Alternative 2: `direct: true` Flag on MCPRemoteProxy -**Cons:** -- MCPRemoteProxy has ~9 fields that are pod-deployment-specific - (`proxyPort`, `resources`, `serviceAccount`, `sessionAffinity`, - `trustProxyHeaders`, `endpointPrefix`, `audit`, `telemetry`, - `resourceOverrides`). In `direct: true` mode these are all inapplicable and - confusing -- A resource named "MCPRemoteProxy" that doesn't deploy a proxy is semantically - misleading -- Field pollution is high enough that it would effectively require a v1alpha2 - -**Why not chosen:** Field pollution creates exactly the "which fields apply -when" confusion the RFC aims to avoid. The typed sub-config approach in -`MCPRemoteEndpoint` solves this cleanly. +**Why not chosen:** MCPRemoteProxy has ~9 pod-deployment-specific fields +that become inapplicable and confusing with a direct flag. Field pollution is +too high. The typed `proxyConfig` sub-object in MCPRemoteEndpoint solves this +cleanly. ### Alternative 3: Inline Remote Backends in VirtualMCPServer -Embed remote server configuration directly in the VirtualMCPServer spec. - -**Pros:** -- No new CRD needed - -**Cons:** -- Violates separation of concerns -- Prevents fine-grained RBAC: only VirtualMCPServer editors can manage remote backends -- Editing a remote backend requires touching VirtualMCPServer, which triggers - reconciliation of unrelated configuration - -**Why not chosen:** Inconsistent with existing patterns and prevents RBAC -separation. +**Why not chosen:** Prevents RBAC separation (only VirtualMCPServer editors +can manage backends) and couples backend lifecycle to vMCP reconciliation. ## Compatibility ### Backward Compatibility -- `MCPRemoteProxy` continues to function during the deprecation window. -- `MCPServer` is unchanged. -- `VirtualMCPServer`, `MCPGroup`, and `MCPExternalAuthConfig` receive additive - changes only (new watches, new status fields populated alongside old ones). -- No migration is required immediately. +- `MCPRemoteProxy` continues to function during the deprecation window +- `MCPServer` is unchanged +- `VirtualMCPServer`, `MCPGroup`, `MCPExternalAuthConfig` receive additive + changes only (new watches, new status fields alongside existing ones) ### Forward Compatibility -- `MCPRemoteEndpoint` starts at `v1alpha1`, on a graduation path to `v1beta1` - as part of the broader CRD revamp. -- The `type` field and typed sub-configs allow future modes to be added without - breaking changes. -- Removing `MCPRemoteProxy` follows the standard two-release deprecation window. +- Starts at `v1alpha1`, graduation path to `v1beta1` as part of broader CRD + revamp +- `type` field and typed sub-configs allow future modes without breaking changes ## Implementation Plan -### Phase 0: MCPRemoteProxy Deprecation Markers - -1. Add a deprecation annotation (`deprecated: "true"`) to the MCPRemoteProxy - CRD description in `mcpremoteproxy_types.go`. -2. Emit a Kubernetes Warning event on every MCPRemoteProxy `Reconcile()` call - (not just create/update) directing users to MCPRemoteEndpoint. -3. Update documentation to direct users to MCPRemoteEndpoint. - -### Phase 0.5: MCPRemoteProxy Controller Refactoring (Pre-requisite) +### Phase 0: MCPRemoteProxy Deprecation + Controller Refactoring -Extract shared proxy reconciliation logic from `mcpremoteproxy_controller.go` -(1,125 lines) into a reusable package: - -1. Create `pkg/operator/remoteproxy/` (or similar) with functions for: - - Deployment creation/update with container spec, volumes, and env vars - - Service creation/update - - ServiceAccount and RBAC setup - - Health monitoring and status condition updates -2. Refactor `MCPRemoteProxyReconciler` to use the extracted package. -3. Verify all existing MCPRemoteProxy tests pass unchanged. - -This is a **refactoring-only** phase — no new features, no API changes. +1. Add `deprecated: "true"` to MCPRemoteProxy CRD description +2. Emit Warning events on every MCPRemoteProxy `Reconcile()` call +3. Update documentation +4. Extract shared proxy reconciliation logic from `mcpremoteproxy_controller.go` + into `pkg/operator/remoteproxy/` — refactoring only, no API changes, + all existing MCPRemoteProxy tests must pass ### Phase 1: CRD and Controller -1. Define `MCPRemoteEndpoint` CRD types with `type`, shared fields, and `proxyConfig` - - Include all three CEL validation rules (direct rejects proxyConfig, - proxy requires proxyConfig, proxyConfig requires oidcConfig) - - Include `spec.type` immutability guard - - Include short names (`mcpre`, `remoteendpoint`) and printer columns -2. Implement controller with both code paths (`type: proxy` using the - extracted shared package from Phase 0.5; `type: direct` validation only) -3. Generate CRD manifests and update Helm chart (include default NetworkPolicy) -4. Update MCPGroup controller (all 9 code changes listed in the MCPGroup - Controller Update section above) -5. Add unit tests for both controller paths -6. Add CEL validation tests +1. Define `MCPRemoteEndpoint` CRD types with struct-level CEL rules (see CRD + Validation Rules section for correct placement) +2. Implement controller with both code paths using the Phase 0 shared package +3. Generate CRD manifests; update Helm chart with default NetworkPolicy +4. Update MCPGroup controller — all 7 code changes listed above, including + field indexer registration and RBAC markers +5. Unit tests for both controller paths; CEL rule tests ### Phase 2: Static Mode Integration -1. Add `Type`, `CABundlePath`, and `Headers` fields to `StaticBackendConfig` - in `pkg/vmcp/config/config.go` (must be deployed **before** the operator - starts writing these fields) -2. Update VirtualMCPServer controller to discover MCPRemoteEndpoint resources - - Add `listMCPRemoteEndpointsAsMap()` function - - Update `getExternalAuthConfigNameFromWorkload()` for MCPRemoteEndpoint -3. Update ConfigMap generation to include direct backend entries -4. Implement `addHeadersFromSecret` resolution at ConfigMap generation time -5. Mount CA bundle ConfigMaps as volumes for `type: direct` endpoints with `caBundleRef` -6. Implement external TLS transport for `type: direct` backends -7. Extend vMCP audit middleware for `type: direct` (log remote URL, auth outcome, +1. Add `Type`, `CABundlePath`, `Headers` fields to `StaticBackendConfig`; + update vMCP binary and roundtrip test BEFORE operator starts writing them +2. Update VirtualMCPServer controller: `listMCPRemoteEndpointsAsMap()`, + `getExternalAuthConfigNameFromWorkload()`, ConfigMap generation +3. Implement header secret resolution at ConfigMap generation time +4. Mount CA bundle ConfigMaps as volumes +5. Implement post-exchange audience validation in token exchange strategy +6. Extend vMCP audit middleware for `type: direct` (remote URL, auth outcome, remote HTTP status) -8. Integration tests with envtest +7. Integration tests with envtest ### Phase 3: Dynamic Mode Integration 1. Add `WorkloadTypeMCPRemoteEndpoint` to `pkg/vmcp/workloads/discoverer.go` -2. Extend `BackendReconciler` to watch MCPRemoteEndpoint - - Update `fetchBackendResource()` with third resource type and document - resolution order -3. Extend `ListWorkloadsInGroup()` and `GetWorkloadAsVMCPBackend()` in - `pkg/vmcp/workloads/k8s.go` -4. Implement MCP-Protocol-Version header injection for direct mode HTTP client -5. Implement reconnection handling with backoff and session re-initialisation -6. Register watcher in the K8s manager -7. Integration tests for dynamic discovery +2. Extend `BackendReconciler` and `ListWorkloadsInGroup()` / `GetWorkloadAsVMCPBackend()` +3. Implement `MCP-Protocol-Version` header injection for direct mode HTTP client +4. Implement reconnection handling (stream resumption, full re-init on 404, + circuit breaker) +5. Implement persistent GET stream for server-initiated notifications +6. Integration tests for dynamic discovery ### Phase 4: Documentation and E2E -1. CRD reference documentation for MCPRemoteEndpoint +1. CRD reference documentation 2. Migration guide: MCPRemoteProxy → MCPRemoteEndpoint -3. User guide covering both modes with mode selection guide -4. Document single-replica constraint for `type: direct` with session state -5. E2E Chainsaw tests for full lifecycle (both modes) -6. E2E tests for mixed MCPServer + MCPRemoteEndpoint groups +3. User guide with mode selection guidance +4. Document single-replica constraint for `type: direct` +5. E2E Chainsaw tests for both modes and mixed groups ### Dependencies -- THV-0014 (K8s-Aware vMCP) — already merged; needed for dynamic mode (Phase 3) -- Broader CRD revamp / v1beta1 graduation work +- THV-0014 (K8s-Aware vMCP) — already merged; dynamic mode (Phase 3) is unblocked ## Testing Strategy ### Unit Tests -- Controller validation for both modes -- CEL validation rules: +- CEL rule coverage: - `type: direct` with `proxyConfig` → rejected - `type: proxy` without `proxyConfig` → rejected - `type: proxy` with `proxyConfig` but without `oidcConfig` → rejected - `spec.type` mutation → rejected (immutability guard) -- CRD type serialisation/deserialisation + - `type: direct` with `embeddedAuthServer` or `awsSts` auth → rejected - Backend conversion for both types -- External TLS transport creation with and without custom CA bundles -- Static config parsing for both backend types (including new `Type` field) -- MCP-Protocol-Version header injection -- Reconnection handling with session re-initialisation on HTTP 404 -- Name collision detection and logging in `fetchBackendResource()` +- `fetchBackendResource()` three-way resolution order +- Post-exchange audience validation +- Reconnection: backoff, HTTP 404 → full re-init, circuit breaker trip +- Server-initiated notification handling ### Integration Tests (envtest) -- MCPRemoteEndpoint controller reconciliation for `type: proxy` -- MCPRemoteEndpoint controller reconciliation for `type: direct` -- VirtualMCPServer ConfigMap generation with both types +- Controller reconciliation for both types +- ConfigMap generation with both types (including `StaticBackendConfig` Type field) - MCPGroup status update with MCPRemoteEndpoint members -- Dynamic mode: BackendReconciler handling MCPRemoteEndpoint - MCPRemoteProxy deprecation warning event emission +- `type: direct` with `embeddedAuthServer` rejected at admission ### End-to-End Tests (Chainsaw) -- Full lifecycle: `type: proxy` (functional parity with MCPRemoteProxy) -- Full lifecycle: `type: direct` with unauthenticated public remote -- Full lifecycle: `type: direct` with token exchange auth -- Mixed group: MCPServer + MCPRemoteEndpoint (both types) in same group -- MCPRemoteEndpoint deletion removes backend from vMCP +- Full lifecycle: `type: proxy` (parity with MCPRemoteProxy) +- Full lifecycle: `type: direct` unauthenticated +- Full lifecycle: `type: direct` with token exchange +- Mixed group: MCPServer + MCPRemoteEndpoint (both types) - CA bundle configuration for private remotes ## Open Questions -1. **Should `groupRef` be required on MCPRemoteEndpoint?** - **Resolved: Yes.** Consistent with the reasoning that an endpoint without - a group is unreachable. As a follow-up, consider making `groupRef` required - on MCPServer and MCPRemoteProxy too for consistency. - -2. **When should MCPRemoteProxy be removed?** - **Resolved:** After two minor releases post-MCPRemoteEndpoint GA. Track as - a separate issue once Phase 1 is merged. See the [Deprecation](#deprecation) - timeline table for details. - -3. **Should `toolConfigRef` be in the shared fields or mode-specific?** - **Resolved: Shared top-level field.** Tool filtering applies equally to - both modes and is already supported in `VirtualMCPServer.spec.aggregation.tools` - as a fallback. - -4. **Should there be a `disabled` field?** - **Resolved: Defer.** Since `groupRef` is required and immutable-in-practice - (it is a required string field — the API server rejects empty-string - updates), disabling an endpoint requires deleting the resource. A future - `disabled: true` field could be added as an additive change if +1. **`groupRef` required?** Resolved: Yes. Follow-up: consider requiring it on + MCPServer and MCPRemoteProxy too for consistency. + +2. **MCPRemoteProxy removal timing?** Resolved: After two minor releases + post-MCPRemoteEndpoint GA. + +3. **`toolConfigRef` placement?** Resolved: Shared top-level field. + +4. **`disabled` field?** Deferred. `groupRef` is required; disabling an endpoint + requires deletion. A `disabled: true` field can be added additively later if post-implementation feedback shows deletion is too disruptive. -5. **Should multi-replica vMCP be supported for `type: direct`?** - Recommendation: Start with single-replica constraint. Add shared session - store (Redis) as a follow-up if demand exists. See [Session Constraints](#session-constraints-in-direct-mode). +5. **Multi-replica for `type: direct`?** Resolved as out-of-scope. Single-replica + is the constraint. Redis session storage is a future follow-up requiring a new + `session.Storage` implementation. ## References -- THV-0055: MCPServerEntry CRD — superseded by this RFC (removed from repo; - see git history) +- THV-0055: MCPServerEntry CRD — superseded by this RFC (removed; see git history) - [THV-0008: Virtual MCP Server](./THV-0008-virtual-mcp-server.md) - [THV-0009: Remote MCP Server Proxy](./THV-0009-remote-mcp-proxy.md) - [THV-0010: MCPGroup CRD](./THV-0010-kubernetes-mcpgroup-crd.md) @@ -1141,6 +1014,7 @@ This is a **refactoring-only** phase — no new features, no API changes. ## RFC Lifecycle | Date | Reviewer | Decision | Notes | -|------|----------|----------|-------| +|---|---|---|---| | 2026-03-18 | @ChrisJBurns, @jaosorior | Draft | Initial submission, supersedes THV-0055 | -| 2026-03-18 | Review agents | Revision | Address review feedback: CEL validation, type immutability, auth flows, session constraints, security hardening, implementation detail | +| 2026-03-18 | Review agents | Revision 1 | Addressed: CEL rules, type immutability, auth flows, session constraints, security hardening | +| 2026-03-18 | Review agents | Revision 2 | Fixed: CEL placement, auth flow accuracy, Redis claim, reconnection protocol, GET stream, embeddedAuthServer/awsSts restriction, MCPGroup RBAC markers, audience validation, broken anchors, CA bundle warning, emergency rotation, short name | From 9876b98eae6683da1a7bf4e85a1b85b761b5f016 Mon Sep 17 00:00:00 2001 From: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Date: Wed, 18 Mar 2026 18:58:11 +0000 Subject: [PATCH 10/15] Simplify remoteURL: remove HTTPS must and allow-insecure annotation Remove the HTTPS enforcement requirement and the toolhive.stacklok.dev/allow-insecure annotation section. remoteURL is now simply "URL of the remote MCP server". Co-Authored-By: Claude Opus 4.6 (1M context) --- ...premoteendpoint-unified-remote-backends.md | 41 ++----------------- 1 file changed, 4 insertions(+), 37 deletions(-) diff --git a/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md b/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md index 271463d..902012f 100644 --- a/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md +++ b/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md @@ -228,8 +228,7 @@ The following types are **not valid** when `type: direct`: `headerInjection`, and `unauthenticated`. Using `awsSts` in direct mode will cause backend discovery to fail at runtime. -The controller MUST reject these combinations at admission time via a CEL rule -or webhook. +The controller MUST reject these combinations and set ConfigurationValid=False with reason UnsupportedAuthTypeForDirectMode. ### Detailed Design @@ -266,11 +265,6 @@ type MCPRemoteEndpointProxyConfig struct { ... } passes on object creation (when no previous state exists). Without it, the rule will panic or be silently skipped on create depending on Kubernetes version. -**HTTPS enforcement is controller-side only.** The `remoteURL` pattern marker -(`^https?://`) accepts both HTTP and HTTPS at admission time — consistent with -the existing pattern on `MCPRemoteProxy`. HTTP URLs are rejected by the -controller, which sets `ConfigurationValid=False` with reason `RemoteURLInvalid` -and emits a Warning event. This does NOT produce an admission error. #### MCPRemoteEndpoint CRD @@ -289,8 +283,6 @@ spec: type: direct # REQUIRED: URL of the remote MCP server. - # Must use HTTPS. HTTP accepted at admission but rejected by the controller - # unless toolhive.stacklok.dev/allow-insecure: "true" is set. # +kubebuilder:validation:Pattern=`^https?://` remoteURL: https://mcp.context7.com/mcp @@ -438,18 +430,6 @@ spec: Short name: `mcpre` (consistent with `mcpg` for MCPGroup, `vmcp` for VirtualMCPServer, `extauth` for MCPExternalAuthConfig). -#### The `toolhive.stacklok.dev/allow-insecure` Annotation - -Set `toolhive.stacklok.dev/allow-insecure: "true"` to bypass the HTTPS check -for development or cluster-internal HTTP endpoints. - -- **Accepted value:** `"true"` exactly. Any other value (or absence) enforces HTTPS. -- **Scope:** Per-resource. -- **Audit trail:** The controller emits a `Warning` event with reason - `InsecureTransport` on every reconciliation while the annotation is present. -- **Not in production:** HTTP traffic exposes credentials from `externalAuthConfigRef` - to interception. This annotation MUST NOT be used in production environments. - #### Spec Fields **Top-level (both modes):** @@ -457,7 +437,7 @@ for development or cluster-internal HTTP endpoints. | Field | Type | Required | Description | |---|---|---|---| | `type` | enum | Yes | `proxy` or `direct`. Default: `proxy`. **Immutable after creation.** | -| `remoteURL` | string | Yes | URL of the remote MCP server. HTTP accepted at admission; rejected by controller unless `allow-insecure` annotation is set. | +| `remoteURL` | string | Yes | URL of the remote MCP server. | | `transport` | enum | Yes | `streamable-http` (recommended) or `sse` (legacy 2024-11-05 transport). | | `groupRef` | string | Yes | Name of the MCPGroup. | | `externalAuthConfigRef` | object | No | Outgoing auth config. In proxy mode: read by both vMCP (vMCP→proxy) and the proxy pod (proxy→remote). In direct mode: read by vMCP only (vMCP→remote). Types `embeddedAuthServer` and `awsSts` are invalid in direct mode. | @@ -534,8 +514,7 @@ pass unchanged. This is scoped as Phase 0 step 4. 3. If `externalAuthConfigRef` type is `embeddedAuthServer` or `awsSts`, sets `ConfigurationValid=False` with reason `UnsupportedAuthTypeForDirectMode` 4. If `caBundleRef` set, validates ConfigMap exists; sets `CABundleValid` -5. Validates HTTPS requirement; checks `allow-insecure` annotation -6. Sets `Ready=True` and `status.url = spec.remoteURL` +5. Sets `Ready=True` and `status.url = spec.remoteURL` No finalizers for `type: direct`. `type: proxy` uses the same finalizer pattern as the existing MCPRemoteProxy controller. @@ -734,19 +713,7 @@ out of scope for this RFC. RBAC alone is insufficient when threat actors include compromised workloads with CRD write access. The following are required: -1. **NetworkPolicy (REQUIRED):** Default egress rules for the vMCP pod MUST - block: - - `169.254.169.254/32` (AWS/GCP/Azure IMDS) - - `fd00:ec2::254/128` (AWS IMDSv2 IPv6) - - The Kubernetes API server (`kubernetes.default.svc` and its cluster IP) - - These specific targets are high-value for credential theft and have no - legitimate use as MCP backends. RFC 1918 ranges (`10.x`, `172.16-31.x`, - `192.168.x`) are deliberately NOT blocked because `type: direct` legitimately - targets internal/corporate MCP servers. RBAC is the appropriate control for - general RFC 1918 access. - -2. **RBAC (REQUIRED):** Only cluster administrators or trusted platform service +1. **RBAC (REQUIRED):** Only cluster administrators or trusted platform service accounts should have `create`/`update` permissions on MCPRemoteEndpoint. ### Credential Blast Radius in Direct Mode From 1752a392a05f82c4c161680f8ca4600cc2c5368c Mon Sep 17 00:00:00 2001 From: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Date: Wed, 18 Mar 2026 19:03:40 +0000 Subject: [PATCH 11/15] cleans up rfc Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> --- ...premoteendpoint-unified-remote-backends.md | 35 +------------------ 1 file changed, 1 insertion(+), 34 deletions(-) diff --git a/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md b/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md index 902012f..34919c8 100644 --- a/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md +++ b/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md @@ -710,8 +710,7 @@ out of scope for this RFC. ### SSRF Mitigation -RBAC alone is insufficient when threat actors include compromised workloads with -CRD write access. The following are required: +When threat actors include compromised workloads with CRD write access. The following are required: 1. **RBAC (REQUIRED):** Only cluster administrators or trusted platform service accounts should have `create`/`update` permissions on MCPRemoteEndpoint. @@ -911,38 +910,6 @@ can manage backends) and couples backend lifecycle to vMCP reconciliation. - THV-0014 (K8s-Aware vMCP) — already merged; dynamic mode (Phase 3) is unblocked -## Testing Strategy - -### Unit Tests - -- CEL rule coverage: - - `type: direct` with `proxyConfig` → rejected - - `type: proxy` without `proxyConfig` → rejected - - `type: proxy` with `proxyConfig` but without `oidcConfig` → rejected - - `spec.type` mutation → rejected (immutability guard) - - `type: direct` with `embeddedAuthServer` or `awsSts` auth → rejected -- Backend conversion for both types -- `fetchBackendResource()` three-way resolution order -- Post-exchange audience validation -- Reconnection: backoff, HTTP 404 → full re-init, circuit breaker trip -- Server-initiated notification handling - -### Integration Tests (envtest) - -- Controller reconciliation for both types -- ConfigMap generation with both types (including `StaticBackendConfig` Type field) -- MCPGroup status update with MCPRemoteEndpoint members -- MCPRemoteProxy deprecation warning event emission -- `type: direct` with `embeddedAuthServer` rejected at admission - -### End-to-End Tests (Chainsaw) - -- Full lifecycle: `type: proxy` (parity with MCPRemoteProxy) -- Full lifecycle: `type: direct` unauthenticated -- Full lifecycle: `type: direct` with token exchange -- Mixed group: MCPServer + MCPRemoteEndpoint (both types) -- CA bundle configuration for private remotes - ## Open Questions 1. **`groupRef` required?** Resolved: Yes. Follow-up: consider requiring it on From 601967abd69ee4f759be6993029b243e5910f059 Mon Sep 17 00:00:00 2001 From: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Date: Thu, 19 Mar 2026 16:06:45 +0000 Subject: [PATCH 12/15] rename rfc file Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> --- ...s.md => THV-0057-mcpremoteendpoint-unified-remote-backends.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename rfcs/{RFC-0057-mcpremoteendpoint-unified-remote-backends.md => THV-0057-mcpremoteendpoint-unified-remote-backends.md} (100%) diff --git a/rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md b/rfcs/THV-0057-mcpremoteendpoint-unified-remote-backends.md similarity index 100% rename from rfcs/RFC-0057-mcpremoteendpoint-unified-remote-backends.md rename to rfcs/THV-0057-mcpremoteendpoint-unified-remote-backends.md From c7afeabb31c3c1d4f5709913e7918a27b527f261 Mon Sep 17 00:00:00 2001 From: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Date: Thu, 19 Mar 2026 16:52:06 +0000 Subject: [PATCH 13/15] rename RFC file to match PR number (THV-0057 -> THV-0055) Co-Authored-By: Claude Opus 4.6 (1M context) --- ...s.md => THV-0055-mcpremoteendpoint-unified-remote-backends.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename rfcs/{THV-0057-mcpremoteendpoint-unified-remote-backends.md => THV-0055-mcpremoteendpoint-unified-remote-backends.md} (100%) diff --git a/rfcs/THV-0057-mcpremoteendpoint-unified-remote-backends.md b/rfcs/THV-0055-mcpremoteendpoint-unified-remote-backends.md similarity index 100% rename from rfcs/THV-0057-mcpremoteendpoint-unified-remote-backends.md rename to rfcs/THV-0055-mcpremoteendpoint-unified-remote-backends.md From 8a53d4673b514abeb1198f8eff801e6bf99f3cfe Mon Sep 17 00:00:00 2001 From: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Date: Fri, 20 Mar 2026 15:31:38 +0000 Subject: [PATCH 14/15] Address Jhrozek review feedback on RFC-0055 - Generalize mode selection guide to cover all auth types, not just token exchange - Reword rule of thumb to use "fronted by vMCP" vs "standalone" framing - Replace ASCII auth flow diagrams with mermaid sequence diagrams - Clarify dual-consumer auth behavior per mode - Remove implementation-level token lifetime detail - Add groupRef rationale inline comment - Replace name collision fallback with admission-time rejection - Replace CEL oidcConfig rule with standard kubebuilder Required marker Co-Authored-By: Claude Opus 4.6 (1M context) --- ...premoteendpoint-unified-remote-backends.md | 81 ++++++++++++------- 1 file changed, 54 insertions(+), 27 deletions(-) diff --git a/rfcs/THV-0055-mcpremoteendpoint-unified-remote-backends.md b/rfcs/THV-0055-mcpremoteendpoint-unified-remote-backends.md index 34919c8..8a78631 100644 --- a/rfcs/THV-0055-mcpremoteendpoint-unified-remote-backends.md +++ b/rfcs/THV-0055-mcpremoteendpoint-unified-remote-backends.md @@ -96,16 +96,17 @@ should be a configuration choice within it. | Scenario | Recommended Mode | Why | |---|---|---| | Public, unauthenticated remote (e.g., context7) | `direct` | No auth middleware needed; no pod required | -| Remote requiring only token exchange auth | `direct` | vMCP handles token exchange; one fewer hop | +| Remote with outgoing auth handled by vMCP (token exchange, header injection, etc.) | `direct` | vMCP applies outgoing auth directly; one fewer hop | | Remote requiring its own OIDC validation boundary | `proxy` | Proxy pod validates tokens independently | | Remote requiring Cedar authz policies per-endpoint | `proxy` | Authz policies run in the proxy pod | | Remote needing audit logging at the endpoint level | `proxy` | Proxy pod has its own audit middleware | | Standalone use without VirtualMCPServer | `proxy` | Direct mode requires vMCP to function | | Many remotes where pod-per-remote is too costly | `direct` | No Deployment/Service/Pod per remote | -**Rule of thumb:** Use `direct` for simple, public, or token-exchange-only -remotes. Use `proxy` when you need an independent auth/authz/audit boundary -per remote, or when the backend needs to be accessible standalone. +**Rule of thumb:** Use `direct` for simple, public remotes or any remote +fronted by vMCP where vMCP handles outgoing auth. Use `proxy` when you need an +independent auth/authz/audit boundary per remote, or when the backend needs to +be accessible standalone. ## Proposed Solution @@ -173,12 +174,21 @@ graph TB **`type: proxy` — two independent auth legs:** -``` -Client --[aud=vmcp token]--> vMCP [validates token at incoming boundary] - --[externalAuthConfigRef credential]--> Proxy Pod - [proxy pod oidcConfig validates the incoming request] - [proxy pod applies externalAuthConfigRef as outgoing middleware] - --> Remote Server +```mermaid +sequenceDiagram + participant C as Client + participant V as vMCP + participant P as Proxy Pod + participant R as Remote Server + + C->>V: Request (aud=vmcp token) + V->>V: Validate incoming token + V->>P: Forward (externalAuthConfigRef credential) + P->>P: oidcConfig validates incoming request + P->>R: Forward (externalAuthConfigRef as outgoing middleware) + R-->>P: Response + P-->>V: Response + V-->>C: Response ``` `externalAuthConfigRef` on a `type: proxy` endpoint is read by two separate @@ -191,16 +201,26 @@ consumers: (`AddExternalAuthConfigOptions()` in `mcpremoteproxy_runconfig.go`). The pod applies it as outgoing middleware when forwarding requests **to the remote server**. +In direct mode, only consumer 1 applies — there is no proxy pod. + `proxyConfig.oidcConfig` is a third, separate concern — it validates tokens arriving at the proxy pod from vMCP. It is entirely independent of `externalAuthConfigRef`. **`type: direct` — single auth boundary:** -``` -Client --[aud=vmcp token]--> vMCP [validates token at incoming boundary] - [vMCP applies externalAuthConfigRef as outgoing auth] - --> Remote Server +```mermaid +sequenceDiagram + participant C as Client + participant V as vMCP + participant R as Remote Server + + C->>V: Request (aud=vmcp token) + V->>V: Validate incoming token + V->>V: Apply externalAuthConfigRef as outgoing auth + V->>R: Request (with outgoing credentials) + R-->>V: Response + V-->>C: Response ``` vMCP reads `externalAuthConfigRef` and applies it when calling the remote @@ -213,9 +233,6 @@ client's token. - The STS must be configured to accept subject tokens from vMCP's IdP. - Configure `audience` in the `MCPExternalAuthConfig` to match the remote server's expected audience claim. -- Client token lifetime should exceed the expected duration of the exchange - request. Exchanged tokens are managed by the `golang.org/x/oauth2` token - source and refreshed automatically on expiry per connection. **Unsupported `externalAuthConfigRef` types for `type: direct`:** @@ -255,10 +272,12 @@ The four rules for `MCPRemoteEndpoint`, placed on their correct owning types: //nolint:lll type MCPRemoteEndpointSpec struct { ... } -// MCPRemoteEndpointProxyConfig struct-level rule: -// -// +kubebuilder:validation:XValidation:rule="has(self.oidcConfig)",message="spec.proxyConfig.oidcConfig is required" -type MCPRemoteEndpointProxyConfig struct { ... } +// MCPRemoteEndpointProxyConfig — oidcConfig uses standard required marker: +type MCPRemoteEndpointProxyConfig struct { + // +kubebuilder:validation:Required + OIDCConfig OIDCConfigRef `json:"oidcConfig"` + // ... +} ``` **Important:** The `oldSelf == null` guard is required so the immutability rule @@ -292,7 +311,8 @@ spec: # +kubebuilder:validation:Enum=streamable-http;sse transport: streamable-http - # REQUIRED: Group membership. + # REQUIRED: Group membership. MCPRemoteEndpoint only functions as part of + # an MCPGroup (aggregated by VirtualMCPServer), so groupRef is always required. groupRef: engineering-team # OPTIONAL: Auth for outgoing requests to the remote server. @@ -589,11 +609,18 @@ Extend `ListWorkloadsInGroup()` and `GetWorkloadAsVMCPBackend()` in - `type: proxy` — uses `status.url` (proxy Service URL), same as MCPRemoteProxy - `type: direct` — uses `spec.remoteURL` directly -**Name collision handling:** `fetchBackendResource()` in -`pkg/vmcp/k8s/backend_reconciler.go` tries resources in order: MCPServer → -MCPRemoteProxy → MCPRemoteEndpoint. Same-name resources across types in the -same namespace always resolve to the first match. Log a warning when a -collision is detected. +**Name collision prevention:** The MCPRemoteEndpoint controller MUST reject +creation if an MCPServer or MCPRemoteProxy with the same name already exists in +the namespace, setting `ConfigurationValid=False` with reason +`NameCollision`. Likewise, the MCPServer and MCPRemoteProxy controllers MUST +be updated to reject collisions with MCPRemoteEndpoint. This prevents +surprising fallback behaviour where deleting one resource type silently +activates a different resource with the same name. + +`fetchBackendResource()` in `pkg/vmcp/k8s/backend_reconciler.go` retains its +existing resolution order (MCPServer → MCPRemoteProxy → MCPRemoteEndpoint) as +a defensive fallback, but the admission-time rejection above makes same-name +collisions a user error rather than an implicit resolution policy. ##### vMCP: HTTP Client for Direct Mode From dcc74bad33ea988b20f4fd2aa5503e6622bb96f6 Mon Sep 17 00:00:00 2001 From: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Date: Fri, 20 Mar 2026 15:58:58 +0000 Subject: [PATCH 15/15] Address header secret handling in static mode for type: direct Use SecretKeyRef env vars on the vMCP Deployment instead of inlining secret values into the backend ConfigMap. Mirrors the existing pattern from MCPRemoteProxy. ConfigMap stores only env var names, vMCP resolves values at runtime via secrets.EnvironmentProvider. Co-Authored-By: Claude Opus 4.6 (1M context) --- ...premoteendpoint-unified-remote-backends.md | 30 +++++++++++++++---- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/rfcs/THV-0055-mcpremoteendpoint-unified-remote-backends.md b/rfcs/THV-0055-mcpremoteendpoint-unified-remote-backends.md index 8a78631..4d03186 100644 --- a/rfcs/THV-0055-mcpremoteendpoint-unified-remote-backends.md +++ b/rfcs/THV-0055-mcpremoteendpoint-unified-remote-backends.md @@ -580,7 +580,7 @@ strict YAML parsing. Writing new fields (`Type`, `CABundlePath`, `Headers`) to the ConfigMap before updating the vMCP binary will cause a startup failure. Implementation order: -1. Add `Type`, `CABundlePath`, and `Headers` fields to `StaticBackendConfig` +1. Add `Type`, `CABundlePath`, and `HeaderEnvVars` fields to `StaticBackendConfig` 2. Update the vMCP binary and the roundtrip test in `pkg/vmcp/config/crd_cli_roundtrip_test.go` 3. Deploy the updated vMCP image **before** the operator starts writing these @@ -589,11 +589,27 @@ Implementation order: Additional touch points: - `listMCPRemoteEndpointsAsMap()` — new function for ConfigMap generation - `getExternalAuthConfigNameFromWorkload()` — add MCPRemoteEndpoint case -- `headerForward.addHeadersFromSecret` requires K8s Secret resolution at - ConfigMap generation time (static mode has no K8s API access at runtime) -- `vmcp.Backend` needs a `Headers` field for resolved header values - Deployment volume mount logic for `caBundleRef` ConfigMaps +**Header secret handling in static mode:** Secret values MUST NOT be inlined +into the backend ConfigMap. Instead, the operator uses the same `SecretKeyRef` +pattern that MCPRemoteProxy already uses: + +1. For each `type: direct` endpoint with `addHeadersFromSecret` entries, the + operator adds `SecretKeyRef` environment variables to the **vMCP Deployment** + (e.g. `TOOLHIVE_SECRET_HEADER_FORWARD_X_API_KEY_`). +2. The static backend ConfigMap stores only the env var names — never the + secret values themselves. +3. At runtime, vMCP resolves header values via the existing + `secrets.EnvironmentProvider`, identical to how MCPRemoteProxy pods handle + this today. + +This ensures no key material is written to ConfigMaps or stored in etcd in +plaintext. The trade-off is that adding or removing `addHeadersFromSecret` +entries on a direct endpoint triggers a vMCP Deployment update (and therefore +a pod restart), consistent with how CA bundle changes already behave in static +mode. + **CA bundle in static mode:** The operator mounts the `caBundleRef` ConfigMap as a volume into the vMCP pod at `/etc/toolhive/ca-bundles//ca.crt`. The generated backend ConfigMap includes the mount path so vMCP can construct @@ -904,11 +920,13 @@ can manage backends) and couples backend lifecycle to vMCP reconciliation. ### Phase 2: Static Mode Integration -1. Add `Type`, `CABundlePath`, `Headers` fields to `StaticBackendConfig`; +1. Add `Type`, `CABundlePath`, `HeaderEnvVars` fields to `StaticBackendConfig`; update vMCP binary and roundtrip test BEFORE operator starts writing them 2. Update VirtualMCPServer controller: `listMCPRemoteEndpointsAsMap()`, `getExternalAuthConfigNameFromWorkload()`, ConfigMap generation -3. Implement header secret resolution at ConfigMap generation time +3. Add `SecretKeyRef` env vars to vMCP Deployment for `addHeadersFromSecret` + entries on `type: direct` endpoints; store env var names (not values) in + backend ConfigMap 4. Mount CA bundle ConfigMaps as volumes 5. Implement post-exchange audience validation in token exchange strategy 6. Extend vMCP audit middleware for `type: direct` (remote URL, auth outcome,