fix(oauth): auto-learn IdP audience and persist as resource for token validation#4404
Conversation
0a63fc8 to
25da177
Compare
Self-Review NotesReviewed the following areas that are likely to come up in review — documenting findings here to avoid re-investigation: 1. Could 2. Should 3. Is removal of pre-existing 4. Are the redacted error messages in |
There was a problem hiding this comment.
✅ Approve
Excellent fix for a critical OAuth bug. The solution is elegant and well-executed.
What's excellent:
- Smart approach: Auto-learning the IdP's actual aud claim at callback and persisting it solves the root disconnect cleanly
- RFC-compliant: Properly implements RFC 8707 Section 2's allowance for IdPs to map resource to different audiences
- Security-conscious: Removes sensitive token values from error messages (prevents client ID leaks)
- Code simplification: Replaced 20+ lines of complex resource normalization with simple learned-value logic
- Outstanding test coverage: 116 new test lines across 6 files, all 385 tests passing, full CI green (28/28 checks)
- Best-effort pattern: Gracefully handles opaque tokens and missing aud claims without breaking flows
Technical highlights:
- _extract_token_audience() does best-effort decode without signature verification (appropriate for learning vs. validation)
- _persist_learned_audience() is idempotent and respects transaction boundaries (flush not commit)
- Simplified _validate_audience() now handles both string and list forms cleanly
Verdict: Ready to merge.
This will unblock real-world OAuth deployments with ServiceNow, Authentik, and other IdPs that don't honor RFC 8707 strictly. Great work! 🎉
… validation OAuth token audience validation fails for IdPs (ServiceNow, Authentik, etc.) that do not honor RFC 8707 and set the aud claim to an abstract identifier (e.g. client_id) rather than the resource URL sent in the authorization request. RFC 8707 Section 2 explicitly allows this: the AS may map the resource value to a different audience identifier. After a successful OAuth callback, extract the aud claim from the access token (best-effort, no signature verification) inside oauth_manager and return it as token_aud. Persist it as resource in the gateway's oauth_config. On subsequent flows, use the persisted resource as-is instead of re-deriving from gateway.url. Update _validate_audience to accept both resource (string or list) and gateway_url via set intersection. Closes #4384 Related: #4171 Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
bff9f6e to
65fcd4e
Compare
brian-hussey
left a comment
There was a problem hiding this comment.
Approved, thank you for the submission.
|
Failed tests are due to issue on main, I have validated all tests as part of this PR. Merging. |
…4475) * fix(oauth): preserve opaque audience identifiers during token refresh The refresh path's normalize_resource() previously dropped any non-URL resource value as 'invalid', which discarded audiences learned at OAuth callback time from IdPs that do not honor RFC 8707 (ServiceNow, Authentik, etc. set aud=client_id rather than the requested resource URL). This caused validation to regress to the unfixed-bug state on the first token refresh after callback, defeating the audience-learning fix. Treat values lacking a URL scheme as opaque audience identifiers and pass them through verbatim so they round-trip through refresh. RFC 8707 §2 explicitly permits the AS to map resource to an abstract identifier. Update existing 'invalid_resource' tests to assert the new pass-through behavior on the wire, and rename them to reflect the corrected semantics. Signed-off-by: Jonathan Springer <jps@s390x.com> * fix(oauth): restrict learned-audience persistence to first-write only The /oauth/callback handler runs _persist_learned_audience() after every successful authorization-code exchange, mutating gateway.oauth_config to record the IdP's audience claim. However, the route only enforces gateway access (read-equivalent), not gateways.update -- so any user with gateway access could silently overwrite shared global configuration on behalf of all other users on every callback. On a public/shared gateway this is a 'last user to OAuth wins' shared-state mutation. Restrict persistence to first-write-only semantics: the learned audience is now written only when oauth_config['resource'] is currently unset. To re-learn after an IdP change, an admin must clear the resource field via the gateway update API, which does enforce gateways.update. Promote the persistence log line to INFO so the privileged mutation is visible in production logs. Tests: - Rename test_skips_when_resource_already_matches to ..._set_to_same_value to clarify the new semantics (skip on any set value, not just match). - Add test_skips_when_resource_already_set_to_different_value as a regression guard for the authorization gap closure. Signed-off-by: Jonathan Springer <jps@s390x.com> * fix(oauth): treat falsy persisted resource as unset for re-learning _persist_learned_audience compared the existing resource with 'is not None', which treated empty strings and empty lists as 'already set' and blocked re-learning forever. Use Python truthiness so an empty string or empty list counts as unset, allowing an admin to clear the field via the gateway update API and trigger re-learning on the next callback (recovery path after stale config or IdP migration). Adds a parametrized regression test for empty string and empty list as persisted-resource values. Signed-off-by: Jonathan Springer <jps@s390x.com> --------- Signed-off-by: Jonathan Springer <jps@s390x.com>
🐛 Bug-fix PR
📌 Summary
OAuth token audience validation fails for IdPs (ServiceNow, Authentik, etc.) that do not honor RFC 8707 and set the
audclaim to an abstract identifier (e.g.client_id) rather than the resource URL sent in the authorization request. Users see "Token audience mismatch" errors after a successful OAuth flow, with no way to fix it from the UI.RFC 8707 Section 2 explicitly allows this behavior:
🔁 Reproduction Steps
Closes #4384
Related: #4171
Token audience mismatch: token aud=[<client_id>], expected '<gateway_url>'🐞 Root Cause
Two disconnected code paths:
/oauth/authorizeand/oauth/callback(oauth_router.py) injectresource = _normalize_resource_url(gateway.url)into a transient copy ofoauth_configand send it to the IdP. This copy is never persisted._validate_audience(token_validation_service.py:155) readsoauth_config.get("resource")from the persisted config (which has noresource), falls back togateway_url, and fails because the IdP'sauddoesn't match the gateway URL.The system sends a
resourceto the IdP but never learns what the IdP actually mapped it to.💡 Fix Description
Auto-learn the audience from the token at callback time and persist it as
resource.oauth_manager.py: Added_extract_token_audience()— decodes the access token (best-effort, no signature verification) and extracts theaudclaim. Returns only the claim value, never the raw token.complete_authorization_code_flownow returnstoken_audin its result dict.oauth_router.py: Added_persist_learned_audience()— aftercomplete_authorization_code_flowsucceeds in the callback, persists thetoken_audasresourceingateway.oauth_config. Skips gracefully for opaque tokens, missing aud, or when resource already matches.oauth_router.py: Simplified resource injection in both/oauth/authorizeand/oauth/callback— ifresourceis already set (learned from a previous token), use it as-is. If not set (first auth), use_normalize_resource_url(gateway.url).token_validation_service.py: Simplified_validate_audience—expected = resource or gateway_url, normalize both sides to lists, simple membership check. Removed token audience and expected values from warning messages to avoid leaking sensitive identifiers (client IDs, etc.) into logs and error responses.🧪 Verification
uv run pytest tests/unit/mcpgateway/routers/test_oauth_router.py tests/unit/mcpgateway/services/test_token_validation_service.py tests/unit/mcpgateway/services/test_gateway_service_oauth_comprehensive.py tests/unit/mcpgateway/test_oauth_manager.py tests/unit/mcpgateway/services/test_oauth_manager_pkce.py📐 MCP Compliance (if relevant)
✅ Checklist
make black isort pre-commit)