Skip to content

fix(connectors): repair Email Triage Google grant + actionable token errors (#1592)#1599

Merged
itomek merged 3 commits into
mainfrom
fix/1592-email-connect-grant-401
Jun 11, 2026
Merged

fix(connectors): repair Email Triage Google grant + actionable token errors (#1592)#1599
itomek merged 3 commits into
mainfrom
fix/1592-email-connect-grant-401

Conversation

@itomek

@itomek itomek commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Connecting a Google account in the Agent UI left the Email Triage agent unusable: triage failed with a vague "no permissions" message that could only be cleared from the CLI, and after the CLI grant, token refresh failed with a generic "technical issue." Two root causes — the #1520 hub migration renamed the per-agent grant key builtin:emailinstalled:email (orphaning every existing grant), and the in-chat "grant access" CTA never fired because the email tools emitted str(exc) instead of the AGENT_NOT_GRANTED:-prefixed format_connector_error. After this change: legacy grants migrate automatically on startup; the in-chat CTA fires and points the user at the existing in-UI grant control (no CLI step); token-refresh 401s become actionable (names the account, distinguishes reconnect vs. server-side client-secret); and the OAuth client secret is validated at connect time so a "connected" account that can't refresh is caught immediately.

Closes #1592

Test plan

  • pytest tests/unit/connectors/ — 534 passed, 5 skipped (incl. new grant-migration, token-401, secret-validation, and email-tool-envelope regression tests)
  • python util/lint.py --all
  • cd src/gaia/apps/webui && npm run build + vitest (EmailConnectCta fires on the AGENT_NOT_GRANTED: prefix)
  • Real-world (t-nx-strx-halo): connect Google → Email Triage → "triage my last N emails" runs with no CLI grant; a stale builtin:email grant migrates to installed:email on restart — pending live OAuth

Note: auto-confer-on-connect was intentionally NOT added (consent scope) — the in-UI one-click grant + migration satisfy the AC without blanket-granting every agent.

itomek added 2 commits June 11, 2026 14:46
… 401 actionable, connect-time secret check (#1592)

Four related root causes, all verified on-box (t-nx-strx-halo):

1. builtin:email grants orphaned by #1520 hub-rename are now migrated to
   installed:email at UI-server startup (grants.py + server.py).  Idempotent:
   existing installed:email entries are preserved.

2. Email tool ConnectorsError envelopes now go through format_connector_error()
   instead of str(exc), so the NOT_CONNECTED:/AGENT_NOT_GRANTED:/AUTH_REQUIRED:
   prefix is present and the in-chat CTA in EmailConnectCta.tsx fires.

3. _refresh_token now has an explicit 401 branch: no client_secret →
   ConfigurationError (actionable: re-enter credentials); secret present →
   AuthRequiredError REAUTH_REQUIRED (actionable: reconnect).  Previously
   both landed in a generic ConnectorsError.

4. OAuthPkceHandler.configure() validates the provider client_secret before
   starting the PKCE flow, so a bad config surfaces at connect time (AC5)
   rather than as a cryptic 401 on first email triage.

21 new unit tests (4 test files) cover every changed path; 10 vitest tests
cover the isAuthRequiredMessage CTA detector.  Auto-confer-on-connect is
intentionally not added (AC1 satisfied by CTA + migration + existing
Settings grant panel).

Closes #1592
…handlers

EmailSummarizeError (a RuntimeError) was caught alongside ConnectorsError and
routed through format_connector_error, which mapped it to
"UNEXPECTED_ERROR: EmailSummarizeError: …" instead of the plain str(exc) the
user and LLM see. Split the handlers in summarize_tools.py and read_tools.py
so each error type produces the correct output.

Also fixes docstring contradiction in grants.py (said "left as-is" but code
correctly removes the legacy key), narrows the silent except in
oauth_pkce._validate_provider_secret to (ConfigurationError, KeyError), and
strengthens the migration test to assert the stale builtin:email key is removed.
@github-actions github-actions Bot added the tests Test changes label Jun 11, 2026
@itomek itomek self-assigned this Jun 11, 2026
@itomek itomek marked this pull request as ready for review June 11, 2026 19:53
@itomek itomek requested a review from kovtcharov-amd as a code owner June 11, 2026 19:53
@github-actions

Copy link
Copy Markdown
Contributor

Review — fix(connectors): repair Email Triage Google grant + actionable token errors (#1592)

Approve with suggestions. This is a well-structured, well-tested fix that correctly attacks all four root causes behind the broken Email Triage grant flow. The changes are scope-clean, the error paths fail loudly with actionable messages, and the new test coverage is genuinely strong (migration idempotency, the 401 secret-present/absent split, the envelope-prefix contract, and an explicit "this is the bug" regression test). The one thing worth confirming before merge: the startup migration is wired into the UI server only, so CLI-only users with an orphaned builtin:email grant won't auto-migrate.

Issues

🟡 Legacy-grant migration runs on UI startup only, not CLI (src/gaia/ui/server.py:437)
migrate_legacy_agent_grants() is invoked in create_app() init, but the per-agent grant check fires for every email tool call regardless of whether the agent was launched via the UI or gaia email. A user who granted under builtin:email (pre-#1520) and runs the agent purely from the CLI will keep hitting AGENT_NOT_GRANTED until they happen to start the UI once. The migration's own docstring says "Call once at startup (e.g. from the CLI or the UI server init)" — but the CLI half isn't wired. #1592 is UI-scoped so this may be acceptable, but please confirm intent; if CLI is in scope, wiring the same call into the gaia email entry path would close the gap. Migration is idempotent, so double-invocation is safe.

🟢 The actionable 401 reauth message is discarded when surfaced through the email CTA (src/gaia/connectors/tokens.py:226)
The REAUTH_REQUIRED AuthRequiredError carries a carefully-built message ("Token endpoint returned 401 for google … Reconnect from Settings → … See docs/runbooks/google-oauth-client.md"), but format_connector_error ignores the custom message for REAUTH_REQUIRED and returns the canned NOT_CONNECTED: google is not currently connected… string (formatting.py:71-78). So in the email-tool path the user never sees the 401-specific wording or the runbook link — they see a generic "click Connect." It still surfaces verbatim on direct str(exc)/CLI paths, and your test_reauth_required_has_prefix confirms the prefix is intentional, so this isn't a bug — just flagging that the rich message is effectively dead in the CTA path. Consider whether REAUTH_REQUIRED deserves its own prefix that preserves the message.

🟢 PR description says the 401 error "names the account"; the code names only the provider (tokens.py:229-235)
_refresh_token(provider, refresh_token) has no account_email in scope, so neither the ConfigurationError nor the AuthRequiredError includes the account — they name the provider. The tests only assert on the provider name, which matches the code. Minor: tighten the PR description ("names the account, distinguishes reconnect vs. server-side client-secret") to match what's actually emitted, or thread the account through if naming it is a real AC.

🟢 Broad except Exception around the migration (src/gaia/ui/server.py:436)
The noqa: BLE001 — defence in depth swallow is reasonable here — a migration failure shouldn't take down UI startup, and it logs an actionable warning rather than substituting behavior, so it reads as boundary tolerance rather than a silent fallback. No change required; noted only because CLAUDE.md's "fail loudly" rule makes broad catches worth a second look.

Strengths

  • Root-cause tests, not just happy-path. test_str_exc_does_not_have_prefix documents the exact pre-fix bug, and the summarize split has dedicated regression tests proving EmailSummarizeError now yields a plain string while ConnectorsError keeps the AGENT_NOT_GRANTED: prefix — this is precisely the contract that broke the CTA, locked down on both the Python and the vitest side.
  • The ConnectorsError/EmailSummarizeError handler split (read_tools.py:811, summarize_tools.py:250) is the right call. The old combined except (ConnectorsError, EmailSummarizeError) would have routed summarize failures through format_connector_error and mislabeled them UNEXPECTED_ERROR: — good catch separating them.
  • Connect-time secret validation fails loudly at the right seam. _validate_provider_secret runs after credential save but before start_authorization, so a Google account that would 401 on first refresh is rejected at connect time with an actionable message — exactly the "loud error the user can fix over a quiet wrong answer" the project asks for. Restricting it to known-secret-requiring providers and passing unknowns through is a sensible conservative default.
  • Migration concurrency is correct — it reuses _write_lock + load_grants/_save_grants_locked, mutates only the nested agent_grants values (not the outer dict under iteration), and the "new key already present → drop stale legacy key, preserve existing value" branch is the safe choice.

Verdict

Approve with suggestions. No blocking issues. Please confirm the UI-only migration wiring is intentional (🟡 above) — that's the only item that could leave a class of users still broken. The two 🟢 message-fidelity notes are polish; the description tweak is trivial.

@itomek itomek enabled auto-merge June 11, 2026 20:01
@github-actions

Copy link
Copy Markdown
Contributor

🟡 src/gaia/connectors/tokens.py:215 — broad except Exception substituting a placeholder

try:
    err_payload = response.json()
except Exception:
    err_payload = {}

This violates CLAUDE.md's explicit prohibition on except Exception handlers that discard errors and return a placeholder. If response.json() raises something unexpected (e.g. AttributeError on a mock or a memory error), the bug silently collapses to {} and the err_payload.get('error', 'invalid_client') in the message will always say "invalid_client", hiding the real problem.

        try:
            err_payload = response.json()
        except (ValueError, json.JSONDecodeError):
            err_payload = {}

httpx's .json() only raises json.JSONDecodeError (a subclass of ValueError) on malformed bodies — narrowing to those two is sufficient and lets any genuine programming error surface.

@kovtcharov-amd kovtcharov-amd left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Thorough fix with two correctly-diagnosed root causes and strong regression coverage.

Strengths:

  • str(exc) -> format_connector_error(exc) is the right fix; guarantees the AGENT_NOT_GRANTED:/NOT_CONNECTED:/AUTH_REQUIRED: prefix the frontend CTA keys on, applied consistently across tool modules.
  • Good catch splitting EmailSummarizeError out of the ConnectorsError except branch — otherwise it would have been mangled into 'UNEXPECTED_ERROR: EmailSummarizeError: …'. Guarded by tests.
  • Migration is idempotent and conservative (never clobbers an existing installed:email grant), held under _write_lock, with a no-op fast path.
  • 401 handling is correctly placed between the existing 400 branch and the generic !=200 branch, and distinguishes 'secret missing -> ConfigurationError' from 'secret present but rejected -> reauth'.

Non-blocking points to consider:

  1. Migration is only wired into ui/server.py; the docstring mentions the CLI too but the CLI path isn't wired. Acceptable since #1592 is UI-scoped — consider wiring the CLI entrypoint or tightening the docstring.
  2. Wording drift: new messages say 'Settings -> Connections' while other parts (and CTA fixtures) say 'Connectors'. The fuzzy matcher accepts both, but pick one for consistency.
  3. _validate_provider_secret hardcodes the google-only early return; a comment or capability flag would age better when Microsoft needs a secret.

@itomek itomek added this pull request to the merge queue Jun 11, 2026
Merged via the queue into main with commit e188015 Jun 11, 2026
51 checks passed
@itomek itomek deleted the fix/1592-email-connect-grant-401 branch June 11, 2026 21:06
@itomek

itomek commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

This PR merged before these two review items were addressed, so they're tracked as follow-ups:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Agent UI — connected Google account can't be used by Email Triage agent (per-agent grant missing in UI, then token 401)

2 participants