[codex] Tighten auth flows and unify live canary coverage#2367
[codex] Tighten auth flows and unify live canary coverage#2367ilblackdragon wants to merge 9 commits intostagingfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive live canary regression system for auth flows, including new browser-based and seeded-token runners. It also refactors the authentication manager and MCP client architecture to support multi-user isolation, ensuring that MCP sessions and tokens are correctly partitioned by user. My review highlights a potential issue with the McpClient::for_user implementation regarding shared state for Stdio transports, which could lead to request ID collisions and redundant handshakes.
| *.png|*.jpg|*.jpeg|*.gif|*.webp|*.sqlite|*.db|*.wasm|*.zip) continue ;; | ||
| esac | ||
| for pattern in "${patterns[@]}"; do | ||
| if grep -nIEi "${pattern}" "${file}" >> "${matches_file}" 2>/dev/null; then |
There was a problem hiding this comment.
Critical Severity
This scrubber persists raw secret matches inside the same artifact directory it is supposed to sanitize. grep appends matching lines to artifacts/live-canary/scrub-matches.txt, and the workflow later uploads artifacts/live-canary/; in non-strict lanes it even continues after matches are found. That means a leaked bearer token/PAT in any log gets copied into a new artifact file and uploaded. The console redaction also only handles some prefix forms, so raw ghp_, github_pat_, ya29, etc. matches can still be printed unredacted.
Please write matches to a temp file outside the upload tree, ensure every persisted/printed match is redacted before it exists under the artifact directory, and make secret matches fail or remove the affected upload payload.
| "provider": "github", | ||
| "oauth": { | ||
| "authorization_url": "https://github.com/login/oauth/authorize", | ||
| "token_url": "https://github.com/login/oauth/access_token", |
There was a problem hiding this comment.
High Severity
The new GitHub OAuth descriptor points at https://github.com/login/oauth/access_token, but the shared OAuth exchanger parses the token response as JSON and does not set Accept: application/json. GitHub returns a form-encoded access-token response by default and only returns JSON when that Accept header is present, so this browser OAuth path will fail after the user consents.
Please add provider token request headers or a urlencoded fallback for this descriptor, and cover it with a mock GitHub token endpoint regression test.
| // the same credential should reuse a single pending entry rather than | ||
| // accumulate stale flows. This logic used to live in | ||
| // bridge::auth_manager and was lost when the call moved here; without | ||
| // crate::auth::extension and was lost when the call moved here; without |
There was a problem hiding this comment.
High Severity
This pending-flow path was made user-aware here, but the surrounding pending-auth lifecycle is still extension-scoped: pending_auth is keyed only by extension name, and clear_pending_extension_auth(name) / remove() retain gateway flows only by flow.extension_name != name. That breaks multi-user auth isolation: if user A has a pending OAuth flow for github/notion and user B starts auth for the same extension, B's auth call clears A's pending callback state, so A's eventual callback can no longer complete.
Please key pending auth cleanup by (user_id, extension) or thread user_id into clear_pending_extension_auth, and add a two-user same-extension pending OAuth regression test.
serrrfirat
left a comment
There was a problem hiding this comment.
Code review covering multi-user MCP isolation, credential handling in live canary infrastructure, and auth URL sanitization.
| let url: String = server_url.into(); | ||
| let name = extract_server_name(&url); | ||
| let transport = Arc::new(HttpMcpTransport::new(url.clone(), name.clone())); | ||
| let runtime_state = Self::new_runtime_state(); |
There was a problem hiding this comment.
Critical Severity — Multi-user MCP isolation fragility for stdio transports
When for_user() is called on stdio transports, it shares initialized (OnceCell), tools_cache, and next_id across all users (lines 316-321). The first user's initialization wins and InitializeResult (server capabilities) is frozen from that user's perspective. If a future MCP server returns user-scoped capabilities, they leak across users.
The test test_stdio_for_user_shares_initialize_cache_and_request_ids validates this as intentional behavior, which is fine for current MCP servers. However, this shared-state assumption should be documented as a contract on McpClient itself (not just in the Clone impl doc), so future maintainers know adding user-scoped capabilities to stdio transports would require breaking this sharing.
Suggested fix: Add a doc comment on for_user() explicitly stating that stdio transports share initialization/tool-cache across all users by design, and that changing this would require per-user OnceCells.
| -e 's/(secret[[:space:]]*[:=][[:space:]]*)[^[:space:]]+/\1<REDACTED>/Ig' \ | ||
| "${matches_file}" | head -200 | ||
| if [[ "${STRICT_ARTIFACT_SCRUB}" == "true" || "${STRICT_ARTIFACT_SCRUB}" == "1" ]]; then | ||
| exit 1 |
There was a problem hiding this comment.
High Severity — STRICT_ARTIFACT_SCRUB not set for lanes handling real credentials
STRICT_ARTIFACT_SCRUB is only set to true for the private-oauth lane (visible in .github/workflows/live-canary.yml). However, auth-live-seeded and auth-browser-consent lanes handle real provider tokens (Google OAuth, GitHub PAT) but upload artifacts with only the best-effort regex scrub here. If the regex misses a token format, it flows into uploaded CI artifacts.
Line 47: if [[ "${STRICT_ARTIFACT_SCRUB}" == "true" ...]] — the fallthrough prints a warning and continues.
Suggested fix: Set STRICT_ARTIFACT_SCRUB=true for all lanes that handle real provider credentials (auth-live-seeded, auth-browser-consent), not just private-oauth. The regex scrub is best-effort and should not be the only gate for lanes with real tokens.
| DEFAULT_VENV = E2E_DIR / ".venv" | ||
| DEFAULT_SECRETS_MASTER_KEY = ( | ||
| "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef" | ||
| ) |
There was a problem hiding this comment.
High Severity — Hardcoded predictable master key in committed source
DEFAULT_SECRETS_MASTER_KEY = (
"0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
)This key is committed to the repo. Canary databases created with it (which include real provider tokens for auth-live-seeded) can be trivially decrypted by anyone reading this source. While the canary databases are ephemeral, there is a window during which real OAuth tokens are encrypted with a publicly-known key.
Suggested fix: Use os.urandom(32).hex() per canary run to generate a unique master key. This ensures that even if a canary database leaks (e.g., via CI artifacts), the tokens inside cannot be decrypted. Verify cleanup covers all failure modes so the DB+key are never persisted together.
| return None; | ||
| } | ||
| url::Url::parse(u) | ||
| .ok() |
There was a problem hiding this comment.
High Severity — Auth URL sanitization returns original string, not parsed URL
sanitize_auth_url parses with url::Url::parse() but returns the original string u.to_owned(). Percent-encoded CRLF sequences (%0d%0a) pass the char::is_control check (which only catches literal control chars in the input string) and flow through to the caller. After the URL is later used in an HTTP context (e.g., Location header), the server or browser may decode the percent-encoding, enabling header injection.
The existing test rejects_invalid_or_control_character_urls only tests literal \n and \r, not their percent-encoded forms.
Suggested fix: Return parsed.to_string() instead of u.to_owned() to get the normalized, canonicalized URL. Also add test cases for %0d%0a sequences.
| } | ||
|
|
||
| /// Get or create a session for a server. | ||
| pub async fn get_or_create(&self, server_name: &str, server_url: &str) -> McpSession { |
There was a problem hiding this comment.
Medium Severity — Unbounded session growth
Session map is now keyed by (user_id, server_name) which increases the cardinality compared to the previous server-only key. cleanup_stale() exists but there is no evidence it is called periodically (no timer, no background task scheduling it). In a multi-user deployment, the session map grows without bound until the process restarts.
Suggested fix: Either call cleanup_stale() on a timer (e.g., every 5 minutes via a tokio interval task) or add a max capacity with LRU eviction. The unbounded growth is more impactful now that the key space is O(users * servers) rather than O(servers).
| @@ -63,10 +63,10 @@ pub struct McpClient { | |||
| server_name: String, | |||
|
|
|||
There was a problem hiding this comment.
Medium Severity — No user_id validation in for_user()
for_user() accepts any impl Into<String>. An empty string or a string containing path separators could cause issues in session keys, secret paths, or log parsing. Given that the user_id flows into McpSessionKey, secret lookups, and header values, basic validation would prevent subtle bugs.
Suggested fix: Validate that user_id is non-empty and does not contain path separators (/, \) or control characters. Return a Result or panic-in-debug to surface misuse early.
|
|
||
| Ok(Self { | ||
| transport, | ||
| server_url: config.url.clone(), |
There was a problem hiding this comment.
Medium Severity — New HTTP transport created per tool call
In McpToolWrapper::execute(), self.client.for_user(&ctx.user_id) is called on every tool execution. For HTTP transports, for_user() creates a new HttpMcpTransport with a fresh OnceCell, meaning the MCP initialize handshake is attempted on every single tool call. This adds latency and unnecessary server load.
For stdio transports this is fine (shared runtime state), but HTTP callers pay a per-call initialization tax.
Suggested fix: Cache the per-user client in McpToolWrapper or ExtensionManager so that repeated calls from the same user reuse the same transport and initialization state.
Summary
This branch tightens extension and tool auth handling and unifies the auth live-canary system so future provider coverage follows one path.
It includes:
src/auth/scripts/live_canary/framework modules and one canonical account/setup guideWhy
We had auth regressions and open MCP multi-user isolation issues, and the live-canary work had started to split across multiple runner shapes. This branch fixes the highest-risk auth isolation bug, expands end-to-end auth coverage, and gives future auth/provider canaries a single setup and extension path.
Validation
Ran targeted checks during the branch work, including:
cargo testcoverage for auth URL sanitization, redirect handling, MCP session partitioning, factory partitioning, runtime-user MCP wrapper execution, and MCP multi-tenant integrationpytestcoverage for auth/browser scenarios intests/e2e/scenarios/test_extensions.pyandtests/e2e/scenarios/test_v2_auth_oauth_matrix.pypython3 -m py_compilefor the shared live-canary modules and auth runnersbash -n scripts/live-canary/run.sh scripts/live-canary/scrub-artifacts.shscripts/live-canary/run.sh --list-testsand--list-casesImpact
scripts/live_canary/auth_registry.pyandscripts/live-canary/ACCOUNTS.md