Skip to content

fix(oauth): auto-learn IdP audience and persist as resource for token validation#4404

Merged
brian-hussey merged 4 commits intomainfrom
bugfix/oauth-resource-persistence
Apr 27, 2026
Merged

fix(oauth): auto-learn IdP audience and persist as resource for token validation#4404
brian-hussey merged 4 commits intomainfrom
bugfix/oauth-resource-persistence

Conversation

@madhav165
Copy link
Copy Markdown
Collaborator

@madhav165 madhav165 commented Apr 23, 2026

🐛 Bug-fix PR

📌 Summary

OAuth token audience validation fails for IdPs (ServiceNow, Authentik, etc.) that do not honor RFC 8707 and set the aud claim to an abstract identifier (e.g. client_id) rather than the resource URL sent in the authorization request. Users see "Token audience mismatch" errors after a successful OAuth flow, with no way to fix it from the UI.

RFC 8707 Section 2 explicitly allows this behavior:

The authorization server may use the exact resource value as the audience or it may map from that value to a more general URI or abstract identifier for the given resource.

🔁 Reproduction Steps

Closes #4384
Related: #4171

  1. Register a gateway with OAuth (e.g. ServiceNow StreamableHTTP)
  2. Complete OAuth authorization flow — succeeds
  3. Click "Fetch Tools" — fails with: Token audience mismatch: token aud=[<client_id>], expected '<gateway_url>'

🐞 Root Cause

Two disconnected code paths:

  1. /oauth/authorize and /oauth/callback (oauth_router.py) inject resource = _normalize_resource_url(gateway.url) into a transient copy of oauth_config and send it to the IdP. This copy is never persisted.

  2. _validate_audience (token_validation_service.py:155) reads oauth_config.get("resource") from the persisted config (which has no resource), falls back to gateway_url, and fails because the IdP's aud doesn't match the gateway URL.

The system sends a resource to the IdP but never learns what the IdP actually mapped it to.

💡 Fix Description

Auto-learn the audience from the token at callback time and persist it as resource.

  1. oauth_manager.py: Added _extract_token_audience() — decodes the access token (best-effort, no signature verification) and extracts the aud claim. Returns only the claim value, never the raw token. complete_authorization_code_flow now returns token_aud in its result dict.

  2. oauth_router.py: Added _persist_learned_audience() — after complete_authorization_code_flow succeeds in the callback, persists the token_aud as resource in gateway.oauth_config. Skips gracefully for opaque tokens, missing aud, or when resource already matches.

  3. oauth_router.py: Simplified resource injection in both /oauth/authorize and /oauth/callback — if resource is already set (learned from a previous token), use it as-is. If not set (first auth), use _normalize_resource_url(gateway.url).

  4. token_validation_service.py: Simplified _validate_audienceexpected = resource or gateway_url, normalize both sides to lists, simple membership check. Removed token audience and expected values from warning messages to avoid leaking sensitive identifiers (client IDs, etc.) into logs and error responses.

🧪 Verification

Check Command Status
Unit tests (6 affected files) uv run pytest tests/unit/mcpgateway/routers/test_oauth_router.py tests/unit/mcpgateway/services/test_token_validation_service.py tests/unit/mcpgateway/services/test_gateway_service_oauth_comprehensive.py tests/unit/mcpgateway/test_oauth_manager.py tests/unit/mcpgateway/services/test_oauth_manager_pkce.py ✅ 385 passed
Manual regression (ServiceNow) OAuth flow + Fetch Tools ✅ Fixed via DB update, code fix matches

📐 MCP Compliance (if relevant)

  • Matches current MCP spec
  • No breaking change to MCP clients

✅ Checklist

  • Code formatted (make black isort pre-commit)
  • No secrets/credentials committed
  • Sensitive token values removed from log/error messages

@madhav165 madhav165 force-pushed the bugfix/oauth-resource-persistence branch 4 times, most recently from 0a63fc8 to 25da177 Compare April 23, 2026 06:13
@madhav165
Copy link
Copy Markdown
Collaborator Author

Self-Review Notes

Reviewed the following areas that are likely to come up in review — documenting findings here to avoid re-investigation:

1. Could _persist_learned_audience overwrite a user-configured resource?
No. There is no UI field or documented API parameter for setting resource in oauth_config. The admin UI (both create and edit forms) does not expose it, and the admin backend never assembles it. While oauth_config is a freeform Dict[str, Any] at the schema level, resource is exclusively auto-derived from gateway.url or auto-learned from the IdP's aud claim. Overwrite of user intent is not a realistic concern.

2. Should db.flush() in _persist_learned_audience be wrapped in try/except?
No. The codebase has 70+ db.flush() calls in production code, none of which are wrapped in try/except. The consistent pattern is to let flush failures propagate. Adding a guard here would be inconsistent with the rest of the codebase. If the DB is in a state where flush fails, that indicates a larger issue.

3. Is removal of pre-existing resource normalization a problem?
No. The fallback path (deriving resource from gateway.url) still goes through _normalize_resource_url(), which strips fragments and query per RFC 8707. Pre-existing/learned resource values are intentionally not normalized — they come from the IdP's aud claim and the client expects them verbatim. Re-normalizing an IdP-authoritative value would break the audience match this PR is fixing.

4. Are the redacted error messages in token_validation_service.py a debuggability concern?
The generic message ("token aud does not match expected resource or gateway URL") provides enough context to identify where to investigate without leaking client IDs or other sensitive identifiers into logs and error responses.

@madhav165 madhav165 added bug Something isn't working ica ICA related issues api REST API Related item release-fix Critical bugfix required for the release labels Apr 23, 2026
@brian-hussey brian-hussey added the SHOULD P2: Important but not vital; high-value items that are not crucial for the immediate release label Apr 23, 2026
@MohanLaksh MohanLaksh self-assigned this Apr 23, 2026
MohanLaksh
MohanLaksh previously approved these changes Apr 23, 2026
Copy link
Copy Markdown
Collaborator

@MohanLaksh MohanLaksh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approve

Excellent fix for a critical OAuth bug. The solution is elegant and well-executed.

What's excellent:

  • Smart approach: Auto-learning the IdP's actual aud claim at callback and persisting it solves the root disconnect cleanly
  • RFC-compliant: Properly implements RFC 8707 Section 2's allowance for IdPs to map resource to different audiences
  • Security-conscious: Removes sensitive token values from error messages (prevents client ID leaks)
  • Code simplification: Replaced 20+ lines of complex resource normalization with simple learned-value logic
  • Outstanding test coverage: 116 new test lines across 6 files, all 385 tests passing, full CI green (28/28 checks)
  • Best-effort pattern: Gracefully handles opaque tokens and missing aud claims without breaking flows

Technical highlights:

  • _extract_token_audience() does best-effort decode without signature verification (appropriate for learning vs. validation)
  • _persist_learned_audience() is idempotent and respects transaction boundaries (flush not commit)
  • Simplified _validate_audience() now handles both string and list forms cleanly

Verdict: Ready to merge.

This will unblock real-world OAuth deployments with ServiceNow, Authentik, and other IdPs that don't honor RFC 8707 strictly. Great work! 🎉

brian-hussey
brian-hussey previously approved these changes Apr 27, 2026
Copy link
Copy Markdown
Member

@brian-hussey brian-hussey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

madhav165 and others added 4 commits April 27, 2026 15:11
… validation

OAuth token audience validation fails for IdPs (ServiceNow, Authentik,
etc.) that do not honor RFC 8707 and set the aud claim to an abstract
identifier (e.g. client_id) rather than the resource URL sent in the
authorization request. RFC 8707 Section 2 explicitly allows this: the
AS may map the resource value to a different audience identifier.

After a successful OAuth callback, extract the aud claim from the access
token (best-effort, no signature verification) inside oauth_manager and
return it as token_aud. Persist it as resource in the gateway's
oauth_config. On subsequent flows, use the persisted resource as-is
instead of re-deriving from gateway.url. Update _validate_audience to
accept both resource (string or list) and gateway_url via set
intersection.

Closes #4384
Related: #4171

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
@brian-hussey brian-hussey dismissed stale reviews from MohanLaksh and themself via 65fcd4e April 27, 2026 14:15
@brian-hussey brian-hussey force-pushed the bugfix/oauth-resource-persistence branch from bff9f6e to 65fcd4e Compare April 27, 2026 14:15
Copy link
Copy Markdown
Member

@brian-hussey brian-hussey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, thank you for the submission.

@brian-hussey
Copy link
Copy Markdown
Member

Failed tests are due to issue on main, I have validated all tests as part of this PR. Merging.

@brian-hussey brian-hussey merged commit 8c87ff3 into main Apr 27, 2026
29 of 30 checks passed
@brian-hussey brian-hussey deleted the bugfix/oauth-resource-persistence branch April 27, 2026 14:39
jonpspri added a commit that referenced this pull request Apr 27, 2026
…4475)

* fix(oauth): preserve opaque audience identifiers during token refresh

The refresh path's normalize_resource() previously dropped any non-URL
resource value as 'invalid', which discarded audiences learned at OAuth
callback time from IdPs that do not honor RFC 8707 (ServiceNow, Authentik,
etc. set aud=client_id rather than the requested resource URL).  This
caused validation to regress to the unfixed-bug state on the first token
refresh after callback, defeating the audience-learning fix.

Treat values lacking a URL scheme as opaque audience identifiers and pass
them through verbatim so they round-trip through refresh.  RFC 8707 §2
explicitly permits the AS to map resource to an abstract identifier.

Update existing 'invalid_resource' tests to assert the new pass-through
behavior on the wire, and rename them to reflect the corrected semantics.

Signed-off-by: Jonathan Springer <jps@s390x.com>

* fix(oauth): restrict learned-audience persistence to first-write only

The /oauth/callback handler runs _persist_learned_audience() after every
successful authorization-code exchange, mutating gateway.oauth_config to
record the IdP's audience claim.  However, the route only enforces
gateway access (read-equivalent), not gateways.update -- so any user with
gateway access could silently overwrite shared global configuration on
behalf of all other users on every callback.  On a public/shared gateway
this is a 'last user to OAuth wins' shared-state mutation.

Restrict persistence to first-write-only semantics: the learned audience
is now written only when oauth_config['resource'] is currently unset.
To re-learn after an IdP change, an admin must clear the resource field
via the gateway update API, which does enforce gateways.update.

Promote the persistence log line to INFO so the privileged mutation is
visible in production logs.

Tests:
- Rename test_skips_when_resource_already_matches to ..._set_to_same_value
  to clarify the new semantics (skip on any set value, not just match).
- Add test_skips_when_resource_already_set_to_different_value as a
  regression guard for the authorization gap closure.

Signed-off-by: Jonathan Springer <jps@s390x.com>

* fix(oauth): treat falsy persisted resource as unset for re-learning

_persist_learned_audience compared the existing resource with 'is not None',
which treated empty strings and empty lists as 'already set' and blocked
re-learning forever.  Use Python truthiness so an empty string or empty
list counts as unset, allowing an admin to clear the field via the gateway
update API and trigger re-learning on the next callback (recovery path
after stale config or IdP migration).

Adds a parametrized regression test for empty string and empty list as
persisted-resource values.

Signed-off-by: Jonathan Springer <jps@s390x.com>

---------

Signed-off-by: Jonathan Springer <jps@s390x.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api REST API Related item bug Something isn't working ica ICA related issues release-fix Critical bugfix required for the release SHOULD P2: Important but not vital; high-value items that are not crucial for the immediate release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG][AUTH]: OAuth resource field missing from UI causes audience validation failures with real OAuth providers

3 participants