Skip to content

Cannot run a Snowflake MCP server behind ToolHive's MCPRemoteProxy #5150

@jhrozek

Description

@jhrozek

Cannot run a Snowflake MCP server behind ToolHive's MCPRemoteProxy

User Story

As a ToolHive operator,
I want to put MCPRemoteProxy (or VirtualMCPServer) in front of
a Snowflake MCP server (Snowflake-Labs/mcp self-hosted, or the
Snowflake-managed <account>/api/v2/.../mcp-servers/... variant) using
OAuth against Snowflake as the upstream IdP,
so that end users can authenticate with their Snowflake account
and the proxy issues session JWTs that carry the Snowflake login_name
in audit logs.

Context

MCPExternalAuthConfig's OAuth2 upstream type marks userInfo as
required. Snowflake does not expose a userinfo endpoint that ToolHive
can call — its REST API has only /api/v2/users/{name} (which needs
the username up front) and /api/v2/users (list, not "current user").
There is no OIDC discovery for custom OAuth clients either, so the
existing OIDC upstream type doesn't fit.

kubectl apply of a sensible-looking config is rejected at admission:

spec.embeddedAuthServer.upstreamProviders[0].oauth2Config.userInfo: Required value

Forcing the issue with a fake userInfo URL produces a runtime failure
during the OAuth callback. PR #5094 added a synthesis fallback
(tk-<hash> derived from the access token) for OAuth2 upstreams without
userInfo — that unblocks the flow but loses real identity entirely:
audit logs become useless for human correlation, rate-limit buckets
keyed on subject reset on every re-auth, and there's no way back to a
Snowflake login from a tk-… hash.

What Snowflake DOES return on its token endpoint (verified against a
real trial account, documented at
https://docs.snowflake.com/en/user-guide/oauth-custom):

{
  "access_token":             "<opaque ~700-byte Snowflake-proprietary blob>",
  "token_type":               "Bearer",
  "expires_in":               599,
  "refresh_token":            "<opaque>",
  "refresh_token_expires_in": 86399,
  "scope":                    "refresh_token session:role:<ROLE>",
  "username":                 "JHROZEK",
  "user_first_name":          "Jakub",
  "user_last_name":           "Hrozek",
  "idpInitiated":             false
}

The access token is opaque (not a JWT) so we can't decode claims out of
it. But the response envelope itself carries the user identity.
Snowflake's docs note username is omitted on refresh-token grant
responses, so identity has to be captured at auth-code time.

This story closes the gap by adding an IdentityFromToken block on
OAuth2UpstreamConfig that extracts identity (subject / name / email)
from gjson dot-notation paths into the OAuth2 token-endpoint response
body, sibling to the existing userInfo (HTTP) and PR #5094 (synthesis)
paths. Slack v2 (whose oauth.v2.access response nests user identity
under authed_user.id) benefits from the same mechanism.

The fix is identity-resolution work and is independent of which
Snowflake MCP downstream is in use: the same identityFromToken
configuration drives the OAuth flow against Snowflake regardless of
whether the proxy forwards to a self-hosted Snowflake-Labs/mcp or
to the Snowflake-managed <account>/api/v2/.../mcp-servers/...
variant. Choice of downstream is just MCPRemoteProxy.spec.remoteUrl
and transport; this story does not constrain that.

Dependencies: PR #5094 (already merged) — provides the synthesis
fallback that this story integrates with as the lowest-priority path
in the identity-resolution chain.

Acceptance Criteria

Capability-level outcomes a ToolHive operator (or e2e check) can
observe once every sub-issue under this story has merged:

  • An operator can configure an OAuth2 upstream in
    MCPExternalAuthConfig with an identityFromToken block (and no
    userInfo block) and the manifest is admitted.
  • At runtime, the embedded auth server resolves user identity from
    the token-endpoint response body using configured gjson paths and
    does NOT call any userinfo endpoint.
  • The IdentitySynthesized advisory condition stays False (with
    reason AllUpstreamsHaveIdentityResolution) for configs that use
    identityFromToken — the operator status surface accurately reflects
    that the upstream is NOT in synthesis mode.
  • Audit logs surface the upstream user identity in
    subjects.user (e.g. the Snowflake login_name JHROZEK) for
    configurations that point namePath at the response field carrying
    the human identifier.
  • Refresh-token flow does not silently lose identity: the JWT
    name claim is unchanged across a refresh boundary even though
    Snowflake omits username on refresh responses.
  • An MCP client session can complete the full OAuth flow end-to-end
    through ngrok-exposed MCPRemoteProxy against a real Snowflake trial
    account (see reproducer runbook attached as a child issue).

Deferred (not handled by this story)

  • A separate accessTokenClaims extraction mode for providers that
    issue JWT access tokens with identity claims (Glean is the motivating
    case). Different mechanism (JWT decode + JWKS-based signature
    verification), separate trust model, and the audience-confusion
    question (a JWT issued for the upstream resource server isn't, in
    general, intended as identity for ToolHive) needs its own design
    pass. Tracked separately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    authenhancementNew feature or requestgoPull requests that update go codeoauth

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions