fix(oauth): skip audience enforcement for virtual servers when resource is not configured by madhav165 · Pull Request #4410 · IBM/mcp-context-forge

madhav165 · 2026-04-23T09:48:11Z

🐛 Bug-fix PR

📌 Summary

OAuth audience verification always fails for virtual servers when the IdP does not support RFC 8707 Resource Indicators (e.g. Authentik, some Keycloak configurations). These IdPs set the token aud claim to an abstract identifier (typically the OAuth client_id) rather than the resource URL the client requested. Since virtual servers only have authorization_servers in their oauth_config (no resource), the only expected audience is the canonical resource URL derived from APP_DOMAIN — which never matches.

This is the virtual-server counterpart to #4404, which applies the same auto-learn pattern to the gateway (OAuth callback) path.

🔁 Reproduction Steps

Closes #4171

Create a virtual server with oauth_enabled=true and an authorization server pointing to an IdP without RFC 8707 support (e.g. Authentik)
Set APP_DOMAIN to the gateway's internal domain
Have an MCP client connect to /servers/{server_id}/mcp — the client completes the OAuth flow with the IdP
The IdP issues a token with aud: "my-oauth-client-id" (ignoring the resource parameter)
ContextForge builds expected_audiences = ["{APP_DOMAIN}/servers/{server_id}/mcp"]
Audience mismatch → 401 on every request

🐞 Root Cause

_try_oauth_access_token in streamablehttp_transport.py always enforced audience verification using the canonical resource URL from _build_server_resource_url(). When oauth_config has no resource (which is the case for all virtual servers created via the admin UI), the expected audience list contained only this URL. Non-RFC-8707 IdPs set aud to an abstract identifier instead, so every token failed audience verification — even though signature and issuer were valid.

The admin UI for virtual servers does not expose resource fields, so there was no way to configure the expected audience without direct DB manipulation.

💡 Fix Description

Replace the old audience-list-building logic with a two-branch strategy in _try_oauth_access_token:

When resource IS configured (learned or manually set): Pass it directly to verify_oauth_access_token as the expected audience. No canonical resource URL, no list building — the configured value is used as-is (string or list).
When resource is NOT configured: Skip audience enforcement (expected_audience=None). Signature + issuer are still verified. After successful verification, extract the aud claim from the verified token and persist it as resource in server.oauth_config via the new _persist_learned_server_audience() helper. All subsequent requests then get strict audience enforcement.

The old client_id fallback and canonical resource URL derivation (_build_server_resource_url) are removed from the audience path entirely — they were the source of the bug for non-RFC-8707 IdPs.

This is not a security regression: without this change, every non-RFC-8707 IdP token is rejected outright (the feature is completely broken). The fix maintains signature + issuer verification on the first request and adds strict audience enforcement from the second request onward.

Why auto-learn instead of a UI toggle (as suggested in #4171)?

The issue proposed a per-server toggle or UI field to manually configure the expected audience. But the operator only controls which IdP to use (authorization_servers) — the aud claim value is determined entirely by the IdP's behavior, which varies across providers (it could be the client_id, an application URI, or something else entirely). Asking the operator to pre-configure it would require them to know what the IdP will put in aud before any token has been issued. Auto-learning from the first verified token eliminates this guesswork and works correctly regardless of the IdP's audience strategy.

Key implementation details:

_persist_learned_server_audience() is best-effort — DB failures are caught both internally and at the call site, never breaking the auth flow
The aud claim is stored as-is: string values stay strings, list values stay lists
Persist is called on every successful auth so that IdP-side changes (e.g. client_id rotation) are picked up automatically; a no-op early return skips the write when the stored value already matches
SQLAlchemy mutation tracking is handled correctly (new dict object assigned)

🧪 Verification

Check	Command	Status
Unit tests (test_auth.py)	`uv run pytest tests/unit/mcpgateway/test_auth.py`	✅ 237 passed
Related OAuth tests	`uv run pytest tests/unit/mcpgateway/test_auth.py tests/unit/mcpgateway/routers/test_oauth_router.py tests/unit/mcpgateway/services/test_token_validation_service.py`	✅ 362 passed
Doctests	`uv run pytest --doctest-modules mcpgateway/transports/streamablehttp_transport.py`	✅ 22 passed
Ruff	`make ruff`	✅ All checks passed
Formatting	`make autoflake isort black`	✅ Clean

📐 MCP Compliance (if relevant)

Matches current MCP spec
No breaking change to MCP clients

✅ Checklist

Code formatted (make black isort pre-commit)
No secrets/credentials committed

madhav165 · 2026-04-23T10:44:31Z

Self-Review Notes

Reviewed the following areas that are likely to come up in automated or human review — documenting findings here to avoid re-investigation:

1. Is removing the client_id fallback a regression?
No. The old code used oauth_config.get("resource") or oauth_config.get("client_id") as a static fallback for audience enforcement. This PR replaces that with a dynamic learning mechanism: if no resource is configured, skip audience enforcement on the first request (signature + issuer still verified), learn the IdP's actual aud, and persist it as resource. The client_id fallback was a workaround for the exact problem this PR solves properly. Existing deployments with client_id set will auto-learn the correct audience on the next verified token.

2. Is removing the canonical resource URL (_build_server_resource_url) from audience checks a security regression?
No. The canonical URL derivation was the root cause of the bug — non-RFC-8707 IdPs never set aud to the resource URL, so every token failed. Removing it from the audience path is the fix. After the first successful verification, the learned aud is persisted and enforced strictly on all subsequent requests. This mirrors the approach in #4404 for the gateway path.

3. Does the always-overwrite behavior in _persist_learned_server_audience allow audience drift?
This is a deliberate design choice, consistent with #4404. The persist function overwrites resource when the token's aud differs from the stored value, so that IdP-side changes (client_id rotation, application migration) are picked up automatically. A no-op early return skips the write when the stored value already matches. The threat scenario (attacker with a validly-signed token from the same IdP but different aud) requires the attacker to already have a valid token from the configured authorization server.

4. Is the double DB read for the same server a performance concern?
The first session (line 5067) is intentionally short-lived — it closes before verify_oauth_access_token, which performs async network I/O (JWKS fetch). Holding a DB connection open across that call risks connection pool starvation. The second read inside _persist_learned_server_audience (line 943) needs an active session for the write. This is structurally necessary given the current architecture. The persist path is also effectively a no-op after the first request (early return when resource == raw_aud), so the second read only does real work once per server.

5. Does _persist_learned_server_audience need locking or SELECT ... FOR UPDATE?
No. Concurrent requests that both read resource=None and both write the same learned aud produce the same result — the write is idempotent. There is no TOCTOU risk because the persisted value is the same regardless of which request wins.

brian-hussey

Hey Madhav, thank you for the submission.
There are a few issues with this PR as commented below.
Also please consider updating the docs/docs/architecture/oauth-design.md with updates to the oauth design that will come through with this change.

Thanks,
Brian

brian-hussey · 2026-04-27T16:49:53Z

+        verified_claims: Decoded and *signature-verified* JWT claims.
+        db: Active database session.
+    """
+    raw_aud = verified_claims.get("aud")


Can we validate the structure of the 'aud' claim here? At the moment it is accepted and trusted by default.
The token is verified but it could still be complex, empty or an unexpected type.
We could validate that it is a non-empty string or a list of non-empty strings before continuing.

brian-hussey · 2026-04-27T16:55:08Z

+
+    try:
+        server = db.execute(select(DbServer).where(DbServer.id == server_id)).scalar_one_or_none()
+        if server is None or not server.oauth_config:


The function returns early if server.oauth_config is falsy, but this means a server with oauth_enabled=True but no oauth_config will silently skip audience learning. This could hide configuration issues. Consider logging a warning when oauth_enabled is True but oauth_config is missing.

brian-hussey · 2026-04-27T17:34:55Z

+        updated_config = dict(server.oauth_config)
+        updated_config["resource"] = raw_aud
+        server.oauth_config = updated_config
+        db.flush()


There is a potential race condition here: Using db.flush() without explicit transaction isolation.
If multiple requests with different aud claims arrive concurrently for the same server, the last write wins without any conflict detection. Consider adding optimistic locking (version field) or documenting that the last-verified-token-wins behavior is intentional.

…ce is not configured IdPs that do not support RFC 8707 (e.g. Authentik) set the token aud claim to the OAuth client_id rather than the resource URL the client requested. Virtual servers with only authorization_servers configured (no resource or client_id in oauth_config) always fail audience verification because the only expected audience is the canonical resource URL derived from APP_DOMAIN. When resource is absent from oauth_config, skip audience enforcement (signature + issuer are still verified) and persist the aud claim from the first verified token as resource for strict enforcement on all subsequent requests. The aud value is stored as-is (string or list per RFC 7519) without normalization. Closes #4171 Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

…way in streamablehttp_transport Move Gateway as DbGateway to top-level import and remove all local reimports of select, DbServer, and DbGateway that shadow the existing module-level imports. Resolves pylint W0404 (reimported) and W0621 (redefined-outer-name) warnings. Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

…onfusion When the auto-learn audience pattern was introduced, three issues remained that this commit addresses: * First-request authentication was effectively issuer-only when no resource was configured. Any valid token from an allowed issuer would authenticate and pin its aud as resource, enabling cross-resource token confusion in shared-IdP deployments. The handler now falls back to a list of acceptable audiences derived from the canonical RFC 8707/9728 resource URL plus the legacy client_id field, and fails closed when neither anchor is available. * The verified aud claim was persisted with no shape validation. A misconfigured IdP minting a token with a malformed aud (dict, int, empty list, etc.) would persist that value as resource and then crash PyJWT inside verify_oauth_access_token on the next request. A new _is_valid_audience helper enforces RFC 7519 §4.1.3 (string or non-empty list of non-empty strings) before persisting. * The persist policy was 'always overwrite', which could silently collapse an operator-configured multi-audience list down to a single value, and could mask IdP-side audience changes that should surface as an explicit auth failure. The policy is now learn-once: existing resource values (operator-set or previously learned) are preserved. IdP rotation requires the operator to clear the field before the new value is learned. Adds regression coverage for: stateful learn-then-enforce, fallback audience derivation, malformed aud rejection (parametrized over dict/int/empty/whitespace shapes), multi-audience list preservation, oauth_config=None handling, no-aud-claim handler success, and the fail-closed branch when no audience anchor can be derived. Signed-off-by: Jonathan Springer <jps@s390x.com>

Follow-up coverage additions for the audience-learning hardening introduced in the previous commit. Covers branches and assertions that the second-pass review identified as gaps: * C4: canonical URL unavailable but client_id configured → fallback uses client_id alone (the #4171 use case when app_domain is unset). * C7/C8: parametrized matrix for non-string and whitespace-only client_id values (whitespace, empty, int, list, dict, None) all excluded from the fallback and producing a 401. * B4: db.execute(...).scalar_one_or_none() returning None is handled gracefully (no flush, no error). * B5: empty oauth_config dict at helper level (previously covered only for None). * B12: db.flush() raising is caught by the outer except, swallowed, and the warning log is asserted via caplog. * A1-A10: direct parametrized unit test for _is_valid_audience covering every shape (None, empty/whitespace string, valid string, empty list, list with None/empty/int, valid lists, dict, int, bytes). Existing tests strengthened with previously-missing assertions: * test_skips_when_aud_is_malformed now asserts the warning log emitted on rejection (previously only verified no DB calls). * test_db_error_is_swallowed now asserts the warning log emitted on DB execute failure. * test_no_resource_no_client_id_no_canonical_fails_closed now asserts the 401 status, WWW-Authenticate: Bearer header, and JSON detail body produced by the fail-closed branch (previously only verified the return value). Signed-off-by: Jonathan Springer <jps@s390x.com>

madhav165 requested review from brian-hussey, crivetimihai and kevalmahajan as code owners April 23, 2026 09:48

madhav165 force-pushed the bugfix/virtual-server-oauth-audience branch 6 times, most recently from e363e1d to 49fb4fc Compare April 23, 2026 10:24

madhav165 mentioned this pull request Apr 23, 2026

[BUG]: OAuth audience verification always fails with IdPs that do not support RFC 8707 Resource Indicators (e.g. Authentik) #4171

Closed

7 tasks

madhav165 added bug Something isn't working api REST API Related item labels Apr 23, 2026

jonpspri added the release-fix Critical bugfix required for the release label Apr 23, 2026

brian-hussey added the SHOULD P2: Important but not vital; high-value items that are not crucial for the immediate release label Apr 23, 2026

brian-hussey requested changes Apr 27, 2026

View reviewed changes

madhav165 and others added 5 commits April 28, 2026 06:24

isort fix

238b672

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

jonpspri force-pushed the bugfix/virtual-server-oauth-audience branch from 7918c4b to a9117ad Compare April 28, 2026 05:54

jonpspri merged commit 6ec1992 into main Apr 28, 2026
30 checks passed

jonpspri deleted the bugfix/virtual-server-oauth-audience branch April 28, 2026 06:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(oauth): skip audience enforcement for virtual servers when resource is not configured#4410

fix(oauth): skip audience enforcement for virtual servers when resource is not configured#4410
jonpspri merged 5 commits intomainfrom
bugfix/virtual-server-oauth-audience

madhav165 commented Apr 23, 2026 •

edited

Loading

Uh oh!

madhav165 commented Apr 23, 2026

Uh oh!

brian-hussey left a comment

Uh oh!

brian-hussey Apr 27, 2026

Uh oh!

brian-hussey Apr 27, 2026

Uh oh!

brian-hussey Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

madhav165 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐛 Bug-fix PR

📌 Summary

🔁 Reproduction Steps

🐞 Root Cause

💡 Fix Description

🧪 Verification

📐 MCP Compliance (if relevant)

✅ Checklist

Uh oh!

madhav165 commented Apr 23, 2026

Self-Review Notes

Uh oh!

brian-hussey left a comment

Choose a reason for hiding this comment

Uh oh!

brian-hussey Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

brian-hussey Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

brian-hussey Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

madhav165 commented Apr 23, 2026 •

edited

Loading