Fix Cedar upstream-claim evaluation on VirtualMCPServer#5002
Conversation
VirtualMCPServer's operator converter never set AuthzConfig.PrimaryUpstreamProvider, so Cedar policies that referenced upstream claims (e.g. principal.claim_department) failed at runtime. Cedar evaluated against the ToolHive-issued AS token rather than the upstream IDP token, and the claim was missing. Derive the field from Spec.AuthServerConfig.UpstreamProviders[0].Name in convertIncomingAuth when an embedded auth server with upstream providers is configured. Mirrors injectSubjectProviderIfNeeded in virtualmcpserver_controller.go (outgoing auth) and injectUpstreamProviderIfNeeded in pkg/runner/middleware.go (thv run path). Leaves the field empty when no embedded AS or no upstreams so Cedar correctly falls back to ToolHive-issued claims in those modes. Fixes #4997
When IncomingAuth.AuthzConfig is set but no upstream IDP is configured, Cedar silently evaluates policies against the ToolHive-issued AS token. That token's claim namespace (sub, aud, tsid) can overlap upstream claims and authorize against the wrong identity, so the misconfig must be surfaced rather than deployed. Add validateAuthzUpstreamAvailable to the VirtualMCPServer reconciler chain. When AuthzConfig is set but AuthServerConfig is nil or UpstreamProviders is empty, mark the server Failed and set AuthServerConfigValidated=False with reason AuthzRequiresUpstream. The user-facing message points at spec.authServerConfig.upstreamProviders, which is where the fix belongs. Extract runAuthValidations from runValidations so the auth-related checks live together and gocyclo stays happy. No behavior change in the moved block. Belt-and-suspenders companion to the converter fix in the previous commit: the converter wires the provider name when upstreams exist; this validator makes the absence of upstreams an explicit failure. Refs #4997
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5002 +/- ##
==========================================
+ Coverage 69.02% 69.05% +0.02%
==========================================
Files 554 554
Lines 73083 73160 +77
==========================================
+ Hits 50445 50520 +75
+ Misses 19631 19628 -3
- Partials 3007 3012 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Behavioral regression guard for the converter fix in the preceding commit. With the same identity, upstream token, and Cedar policy, flip PrimaryUpstreamProvider between "okta" and "" and observe the outcome change from permit to deny through the real middleware stack. This pins the runtime contract below the operator-layer fix: when the converter populates PrimaryUpstreamProvider, Cedar resolves principal.claim_* from the upstream IDP token; when it is empty, Cedar falls back to the AS-issued token's claims (which do not carry upstream profile attributes). Uses has-attribute rather than equality so a failure uniquely signals "the claim source Cedar read from lacked this attribute" — the exact #4997 regression shape. Refs #4997
tgrunnagle
left a comment
There was a problem hiding this comment.
Also trying the /dev:pr-review skill on this, so there will be some AI-generated comments
tgrunnagle
left a comment
There was a problem hiding this comment.
Multi-Agent Consensus Review
Agents consulted: security-reviewer, k8s-operator-reviewer, test-coverage-reviewer, general-quality-reviewer
Consensus Summary
| # | Finding | Consensus | Severity | Action |
|---|---|---|---|---|
| F1 | ConditionTypeAuthServerConfigValidated reused for AuthzConfig misconfiguration | 10/10 | MEDIUM | Fix |
| F2 | First-upstream-wins: no signal in multi-upstream deployments | 10/10 | MEDIUM | Discuss |
| F3 | fmt.Errorf("%s", msg) → errors.New |
10/10 | LOW | Fix |
| F4 | Unreachable guard branch — ordering dependency undocumented | 7/10 | LOW | Document |
| F5 | No log emitted on AuthzRequiresUpstream failure | 7/10 | LOW | Fix |
| F6 | No positive condition after successful validation (undocumented) | 7/10 | INFO | Document |
| F7 | runValidations doc comment stale after refactor |
10/10 | LOW | Fix |
| F9 | UpdateStatus return value discarded in test |
7/10 | MEDIUM | Fix |
| F10 | Integration test 403 vs 500 Cedar diagnostic ambiguity | 7/10 | LOW | Clarify |
| F11 | No test exercises both validators in sequence via runAuthValidations |
10/10 | LOW | Fix |
| F12 | No double-nil (authServerConfig=nil + authzConfig=nil) converter test | 7/10 | LOW | Fix |
| F13 | No end-to-end test for runAuthValidations through runValidations | 7/10 | LOW | Fix |
| F14 | context.Background() → t.Context() in new tests |
5/10 | LOW | Fix |
Overall
The core fix is correct and well-motivated: populating PrimaryUpstreamProvider in convertIncomingAuth mirrors the existing injectUpstreamProviderIfNeeded pattern on the thv run path precisely. The converter change is minimal and the validator adds genuine security value by surfacing a previously silent misconfiguration. Testing is thorough at the unit level with comprehensive table-driven cases.
The two MEDIUM findings are not blocking but worth addressing. F1 (condition type semantic mismatch) is the most actionable — ConditionTypeAuthServerConfigValidated=False fires when AuthServerConfig is absent, which points users at the wrong field. Since ConditionReasonAuthzRequiresUpstream is already a new constant in this PR, introducing a corresponding ConditionTypeAuthzConfigValidated would be a clean addition. F2 (first-upstream-wins with no signal) is acknowledged in the special notes as accepted debt with a follow-up RFC planned. The test suite has a structural gap: all new tests call private methods directly, so if runAuthValidations were accidentally dropped from runValidations, no test would catch it (F11/F13). Inline comments are posted only for MEDIUM findings; see the table above for LOW-level items.
Documentation
No documentation files are changed by this PR. The new ConditionReasonAuthzRequiresUpstream failure mode should be noted in operator troubleshooting docs and release notes — existing VirtualMCPServers with authzConfig set but no upstream IDP will transition to Failed phase on their first reconcile after the operator upgrade.
Generated with Claude Code
Key change: narrow validateAuthzUpstreamAvailable to only reject when
an embedded auth server is configured but has no upstream providers.
Previously the validator rejected any authz configuration without an
embedded auth server, which false-positived direct-IdP deployments
where the client presents an Okta/Entra/etc token directly and Cedar
evaluates against identity.Claims via the default branch. Thanks to
Trey for catching this regression during review.
Additional review feedback addressed:
- Advisory status condition AuthzUpstreamSelectionWarning surfaces
which upstream was auto-selected when multiple are configured (F2).
Condition is set only on the applicable path; stale conditions are
removed via RemoveConditionsWithPrefix on non-applicable paths so
normal VMCPs do not carry a False/NotApplicable advisory row.
- Use errors.New instead of fmt.Errorf("%s", msg) (F3).
- Emit a WARN-equivalent log (logr.Info at V=0, matching the file
idiom) when rejecting so operators have a grep-able signal (F5).
- Update runValidations doc comment after the earlier runAuthValidations
extraction (F7).
- Assert UpdateStatus bool return in error-path test subtests (F9).
- Add double-nil (AuthServerConfig nil and AuthzConfig nil) converter
subtest (F12).
- Switch new tests to t.Context() (F14).
Refs #4997
Summary
principal.claim_departmentfail at runtime withdoes not have the attribute claim_department. The operator's converter never populatedAuthzConfig.PrimaryUpstreamProvider, so Cedar evaluated against the ToolHive-issued AS token instead of the upstream IDP token. The same setup works on thethv runpath, which already has the wiring.PrimaryUpstreamProviderfromSpec.AuthServerConfig.UpstreamProviders[0].NameinconvertIncomingAuth, mirroringinjectSubjectProviderIfNeeded(outgoing auth) andinjectUpstreamProviderIfNeeded(thv run path). Leaves the field empty when no embedded AS or no upstreams are configured so Cedar correctly falls back to ToolHive-issued claims in those modes.validateAuthzUpstreamAvailableto the reconciler chain. WhenAuthzConfigis set but no upstream IDP is configured, the VirtualMCPServer goes toFailedwithAuthServerConfigValidated=False, Reason=AuthzRequiresUpstream. The user-facing message points atspec.authServerConfig.upstreamProviders.PrimaryUpstreamProviderbetween"okta"and""with the same identity and policy, asserting the expected permit/deny outcome through the middleware stack. Pins the contract the operator fix relies on.Fixes #4997
Type of change
Test plan
task test)TestIntegrationPrimaryUpstreamProviderClaimAttributeAccessinpkg/authz/integration_test.gotask lint-fix)Changes
cmd/thv-operator/pkg/vmcpconfig/converter.goAuthzConfig.PrimaryUpstreamProviderfrom the first upstream provider when an embedded auth server is configuredcmd/thv-operator/pkg/vmcpconfig/converter_test.gocmd/thv-operator/api/v1beta1/virtualmcpserver_types.goConditionReasonAuthzRequiresUpstreamcmd/thv-operator/controllers/virtualmcpserver_controller.gorunAuthValidationsand addvalidateAuthzUpstreamAvailablecmd/thv-operator/controllers/virtualmcpserver_controller_test.goanonymous + authzConfig + no upstreamscenario that motivated the safety checkpkg/authz/integration_test.goPrimaryUpstreamProviderwith identical identity and policyDoes this introduce a user-facing change?
Yes. VirtualMCPServer deployments that combine an embedded auth server, upstream OIDC providers, and Cedar
authzConfigpolicies referencing upstream claims now work correctly. Previously, those policies failed withdoes not have the attribute claim_*. Separately, VirtualMCPServers that declareauthzConfigwithout any upstream provider are now rejected with aFailedphase rather than silently authorizing against the ToolHive-issued token.Verified
Manual end-to-end verification against a real Okta tenant on a kind cluster.
VirtualMCPServer shape used:
The Cedar policy deliberately references
principal.claim_groups— a claim that lives on the Okta access token but not on the ToolHive-issued AS token. Without the fix, Cedar would read from the AS token (nogroups) and deny. With the fix, Cedar reads from the upstream Okta token (groups: [Everyone, engineering]) and permits.Evidence from the pod logs after completing an OAuth flow through Okta and issuing
tools/list:source=upstream(frompkg/authz/authorizers/cedar/core.go:452) proves the runtime took the upstream branch atcore.go:421— the exact code path enabled by Commit 1. The claim key set matches Okta's access token shape (uid,cid,aws/.../Groups), distinct from the AS-issued token shape which would includetsidandclient_id. Pre-fix this log would have readsource=tokenand Cedar would have evaluated against the AS-issued token.PR 2 validator observed by applying a VirtualMCPServer with
authzConfigset but no upstream providers:Implementation plan
Approved implementation plan
Scope B + D from the planning discussion:
AuthzConfig.PrimaryUpstreamProviderfromSpec.AuthServerConfig.UpstreamProviders[0].Name. Mirrors the existinginjectUpstreamProviderIfNeededhelper on thethv runpath. Leaves the field empty when no embedded AS or no upstreams, so Cedar falls back to ToolHive claims in those modes.AuthzConfigis set but no upstream is available, surfacingConfigurationValid=False, Reason=AuthzRequiresUpstream. Protects against silent authorization-on-wrong-identity when the AS and upstream claim namespaces overlap (sub,aud,tsid).toolhive-rfcsproposing explicitauthn: truediscriminator or per-upstream claim mappers to resolve the modeling gap the oauth-expert flagged (UpstreamProviders[]conflates authN IdP with resource-access OAuth upstreams). Not part of this fix.Selection for this PR is first-upstream-wins, consistent with all four existing call sites (
pkg/runner/middleware.go:391,pkg/runner/middleware.go:355,pkg/vmcp/config/defaults.go:110,cmd/thv-operator/controllers/virtualmcpserver_controller.go:2099). The broader modeling question is tracked for PR 4.Special notes for reviewers
toolhive-rfcswill propose an explicit discriminator; this PR is not the right place to introduce that CRD surface.AuthServerConfigValidatedcondition type rather than introducing a newAuthzConfigValidated. Rationale: the fix the user must make is onspec.authServerConfig.upstreamProviders, so surfacing under that condition points at the right field. A newConditionReasonAuthzRequiresUpstreamdisambiguates.runAuthValidationsis extracted fromrunValidationsbecausegocyclotripped on the added branch. The moved block is byte-for-byte identical to what was there before; no behavior change.Generated with Claude Code