Skip to content

[codex] Tighten auth flows and unify live canary coverage#2367

Draft
ilblackdragon wants to merge 9 commits intostagingfrom
codex/auth-oauth-canary-unification
Draft

[codex] Tighten auth flows and unify live canary coverage#2367
ilblackdragon wants to merge 9 commits intostagingfrom
codex/auth-oauth-canary-unification

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

Summary

This branch tightens extension and tool auth handling and unifies the auth live-canary system so future provider coverage follows one path.

It includes:

  • consolidating extension/tool auth ownership under src/auth/
  • fixing remaining auth URL and redirect edge cases
  • partitioning MCP auth/session/client state by user to fix same-server multi-user isolation
  • adding backend and browser coverage for two users interacting with the same MCP server
  • adding seeded-token and browser-consent auth canaries for Google, GitHub, and Notion
  • adding GitHub OAuth browser flow support for the GitHub tool with PAT fallback
  • unifying the auth canary runners behind shared scripts/live_canary/ framework modules and one canonical account/setup guide

Why

We had auth regressions and open MCP multi-user isolation issues, and the live-canary work had started to split across multiple runner shapes. This branch fixes the highest-risk auth isolation bug, expands end-to-end auth coverage, and gives future auth/provider canaries a single setup and extension path.

Validation

Ran targeted checks during the branch work, including:

  • targeted cargo test coverage for auth URL sanitization, redirect handling, MCP session partitioning, factory partitioning, runtime-user MCP wrapper execution, and MCP multi-tenant integration
  • targeted pytest coverage for auth/browser scenarios in tests/e2e/scenarios/test_extensions.py and tests/e2e/scenarios/test_v2_auth_oauth_matrix.py
  • python3 -m py_compile for the shared live-canary modules and auth runners
  • bash -n scripts/live-canary/run.sh scripts/live-canary/scrub-artifacts.sh
  • wrapper discovery checks via scripts/live-canary/run.sh --list-tests and --list-cases

Impact

  • auth behavior is safer for multi-user MCP deployments
  • live-provider auth coverage now has deterministic, seeded, and browser-consent lanes under one wrapper
  • future auth canary additions should go through scripts/live_canary/auth_registry.py and scripts/live-canary/ACCOUNTS.md

@github-actions github-actions bot added size: XL 500+ changed lines scope: agent Agent core (agent loop, router, scheduler) scope: channel/web Web gateway channel scope: tool/mcp MCP client scope: extensions Extension management scope: ci CI/CD workflows scope: docs Documentation risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 12, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive live canary regression system for auth flows, including new browser-based and seeded-token runners. It also refactors the authentication manager and MCP client architecture to support multi-user isolation, ensuring that MCP sessions and tokens are correctly partitioned by user. My review highlights a potential issue with the McpClient::for_user implementation regarding shared state for Stdio transports, which could lead to request ID collisions and redundant handshakes.

@github-actions github-actions bot added the scope: dependencies Dependency updates label Apr 12, 2026
*.png|*.jpg|*.jpeg|*.gif|*.webp|*.sqlite|*.db|*.wasm|*.zip) continue ;;
esac
for pattern in "${patterns[@]}"; do
if grep -nIEi "${pattern}" "${file}" >> "${matches_file}" 2>/dev/null; then
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical Severity

This scrubber persists raw secret matches inside the same artifact directory it is supposed to sanitize. grep appends matching lines to artifacts/live-canary/scrub-matches.txt, and the workflow later uploads artifacts/live-canary/; in non-strict lanes it even continues after matches are found. That means a leaked bearer token/PAT in any log gets copied into a new artifact file and uploaded. The console redaction also only handles some prefix forms, so raw ghp_, github_pat_, ya29, etc. matches can still be printed unredacted.

Please write matches to a temp file outside the upload tree, ensure every persisted/printed match is redacted before it exists under the artifact directory, and make secret matches fail or remove the affected upload payload.

"provider": "github",
"oauth": {
"authorization_url": "https://github.com/login/oauth/authorize",
"token_url": "https://github.com/login/oauth/access_token",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity

The new GitHub OAuth descriptor points at https://github.com/login/oauth/access_token, but the shared OAuth exchanger parses the token response as JSON and does not set Accept: application/json. GitHub returns a form-encoded access-token response by default and only returns JSON when that Accept header is present, so this browser OAuth path will fail after the user consents.

Please add provider token request headers or a urlencoded fallback for this descriptor, and cover it with a mock GitHub token endpoint regression test.

// the same credential should reuse a single pending entry rather than
// accumulate stale flows. This logic used to live in
// bridge::auth_manager and was lost when the call moved here; without
// crate::auth::extension and was lost when the call moved here; without
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity

This pending-flow path was made user-aware here, but the surrounding pending-auth lifecycle is still extension-scoped: pending_auth is keyed only by extension name, and clear_pending_extension_auth(name) / remove() retain gateway flows only by flow.extension_name != name. That breaks multi-user auth isolation: if user A has a pending OAuth flow for github/notion and user B starts auth for the same extension, B's auth call clears A's pending callback state, so A's eventual callback can no longer complete.

Please key pending auth cleanup by (user_id, extension) or thread user_id into clear_pending_extension_auth, and add a two-user same-extension pending OAuth regression test.

Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review covering multi-user MCP isolation, credential handling in live canary infrastructure, and auth URL sanitization.

let url: String = server_url.into();
let name = extract_server_name(&url);
let transport = Arc::new(HttpMcpTransport::new(url.clone(), name.clone()));
let runtime_state = Self::new_runtime_state();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical Severity — Multi-user MCP isolation fragility for stdio transports

When for_user() is called on stdio transports, it shares initialized (OnceCell), tools_cache, and next_id across all users (lines 316-321). The first user's initialization wins and InitializeResult (server capabilities) is frozen from that user's perspective. If a future MCP server returns user-scoped capabilities, they leak across users.

The test test_stdio_for_user_shares_initialize_cache_and_request_ids validates this as intentional behavior, which is fine for current MCP servers. However, this shared-state assumption should be documented as a contract on McpClient itself (not just in the Clone impl doc), so future maintainers know adding user-scoped capabilities to stdio transports would require breaking this sharing.

Suggested fix: Add a doc comment on for_user() explicitly stating that stdio transports share initialization/tool-cache across all users by design, and that changing this would require per-user OnceCells.

-e 's/(secret[[:space:]]*[:=][[:space:]]*)[^[:space:]]+/\1<REDACTED>/Ig' \
"${matches_file}" | head -200
if [[ "${STRICT_ARTIFACT_SCRUB}" == "true" || "${STRICT_ARTIFACT_SCRUB}" == "1" ]]; then
exit 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — STRICT_ARTIFACT_SCRUB not set for lanes handling real credentials

STRICT_ARTIFACT_SCRUB is only set to true for the private-oauth lane (visible in .github/workflows/live-canary.yml). However, auth-live-seeded and auth-browser-consent lanes handle real provider tokens (Google OAuth, GitHub PAT) but upload artifacts with only the best-effort regex scrub here. If the regex misses a token format, it flows into uploaded CI artifacts.

Line 47: if [[ "${STRICT_ARTIFACT_SCRUB}" == "true" ...]] — the fallthrough prints a warning and continues.

Suggested fix: Set STRICT_ARTIFACT_SCRUB=true for all lanes that handle real provider credentials (auth-live-seeded, auth-browser-consent), not just private-oauth. The regex scrub is best-effort and should not be the only gate for lanes with real tokens.

DEFAULT_VENV = E2E_DIR / ".venv"
DEFAULT_SECRETS_MASTER_KEY = (
"0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — Hardcoded predictable master key in committed source

DEFAULT_SECRETS_MASTER_KEY = (
    "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
)

This key is committed to the repo. Canary databases created with it (which include real provider tokens for auth-live-seeded) can be trivially decrypted by anyone reading this source. While the canary databases are ephemeral, there is a window during which real OAuth tokens are encrypted with a publicly-known key.

Suggested fix: Use os.urandom(32).hex() per canary run to generate a unique master key. This ensures that even if a canary database leaks (e.g., via CI artifacts), the tokens inside cannot be decrypted. Verify cleanup covers all failure modes so the DB+key are never persisted together.

return None;
}
url::Url::parse(u)
.ok()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — Auth URL sanitization returns original string, not parsed URL

sanitize_auth_url parses with url::Url::parse() but returns the original string u.to_owned(). Percent-encoded CRLF sequences (%0d%0a) pass the char::is_control check (which only catches literal control chars in the input string) and flow through to the caller. After the URL is later used in an HTTP context (e.g., Location header), the server or browser may decode the percent-encoding, enabling header injection.

The existing test rejects_invalid_or_control_character_urls only tests literal \n and \r, not their percent-encoded forms.

Suggested fix: Return parsed.to_string() instead of u.to_owned() to get the normalized, canonicalized URL. Also add test cases for %0d%0a sequences.

}

/// Get or create a session for a server.
pub async fn get_or_create(&self, server_name: &str, server_url: &str) -> McpSession {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Unbounded session growth

Session map is now keyed by (user_id, server_name) which increases the cardinality compared to the previous server-only key. cleanup_stale() exists but there is no evidence it is called periodically (no timer, no background task scheduling it). In a multi-user deployment, the session map grows without bound until the process restarts.

Suggested fix: Either call cleanup_stale() on a timer (e.g., every 5 minutes via a tokio interval task) or add a max capacity with LRU eviction. The unbounded growth is more impactful now that the key space is O(users * servers) rather than O(servers).

@@ -63,10 +63,10 @@ pub struct McpClient {
server_name: String,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — No user_id validation in for_user()

for_user() accepts any impl Into<String>. An empty string or a string containing path separators could cause issues in session keys, secret paths, or log parsing. Given that the user_id flows into McpSessionKey, secret lookups, and header values, basic validation would prevent subtle bugs.

Suggested fix: Validate that user_id is non-empty and does not contain path separators (/, \) or control characters. Return a Result or panic-in-debug to surface misuse early.


Ok(Self {
transport,
server_url: config.url.clone(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — New HTTP transport created per tool call

In McpToolWrapper::execute(), self.client.for_user(&ctx.user_id) is called on every tool execution. For HTTP transports, for_user() creates a new HttpMcpTransport with a fresh OnceCell, meaning the MCP initialize handshake is attempted on every single tool call. This adds latency and unnecessary server load.

For stdio transports this is fine (shared runtime state), but HTTP callers pay a per-call initialization tax.

Suggested fix: Cache the per-user client in McpToolWrapper or ExtensionManager so that repeated calls from the same user reuse the same transport and initialization state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: channel/web Web gateway channel scope: ci CI/CD workflows scope: dependencies Dependency updates scope: docs Documentation scope: extensions Extension management scope: tool/mcp MCP client size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants