Skip to content

Commit b14e09d

Browse files
authored
fix(email): wire EmailSendRequest.provider fallback + exact pre-scan budget (#1603 review) (#1616)
Follow-up to #1614 (merged) — applies the claude.yml review suggestions plus the **root-cause** enumeration fix that #1614 only worked around. ## Why this matters #1614 shipped connector-derived mailbox selection, but two gaps remained: 1. **`store.list_connections()` was hardcoded to `("google",)`** — so *every generic* connection consumer stayed Microsoft-blind: `api.list_connections()`, `api.get_connection("microsoft")` (returned `None` even when Microsoft was connected), the `/v1/connections` REST endpoint, the UI connectors list, and `tripwire_check()` (never swept the Microsoft connection at startup). #1614's `connected_mailbox_providers()` patched only the email agent's path. This makes the primitive registry-driven (enumerate the registered `oauth_pkce` providers via a function-local import — no layering regression), so the whole system is consistent. 2. **`EmailSendRequest.provider` was a public REST field that did nothing** — with two mailboxes connected and an unbound confirmation token, a caller supplying `provider` to disambiguate still got a 400. Now wired as the unbound-token fallback (the token's own binding still wins when present). Plus a latent-correctness fix to the pre-scan budget guard (counts actual returned messages, not the per-backend cap). ## Test plan - [x] `GAIA_MEMORY_DISABLED=1 python -m pytest tests/unit/connectors/ tests/unit/agents/email/ -x` — green; `util/lint.py --all` clean - [x] New tests: `store.list_connections()` finds microsoft-only / both; `api.list_connections()` + `get_connection("microsoft")` see Microsoft (no secret leak); `TestSendRequestProviderFallback` (unbound→routes there, bound-token wins, unbound+none+both→400); pre-scan budget counts actual returned - [x] **Real-world (Strix Halo, same persisted real Gmail + Outlook connections), before/after:** | Surface | BEFORE (`f768c2e`, merged #1614) | AFTER (`e694f00`, this PR) | |---|---|---| | `store.list_connections()` | `['google']` | `['google','microsoft']` | | `api.list_connections()` | `['google']` | `['google','microsoft']` | | `api.get_connection('microsoft')` | `None` | full metadata (no secrets) | | `GET /v1/connections` | `['google']` | `['google','microsoft']` | Microsoft was connected the whole time — only the enumeration primitive changed. Deferred from the review (own follow-up): surfacing the chosen `from:` mailbox in the `send_now` confirmation prompt — needs a base-agent confirmation hook touching all agents.
1 parent f6d81e3 commit b14e09d

7 files changed

Lines changed: 262 additions & 10 deletions

File tree

hub/agents/python/email/gaia_agent_email/agent.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -518,7 +518,15 @@ def _pre_scan_all_backends(self, *, max_messages: int) -> dict:
518518
force_llm=force_llm,
519519
debug=debug_flag,
520520
)
521-
scanned += per_backend
521+
# Count messages actually returned, not the cap — an under-filled
522+
# backend would otherwise trip the budget guard and skip a later one.
523+
backend_totals = out.get("totals", {})
524+
scanned += (
525+
int(backend_totals.get("urgent", 0))
526+
+ int(backend_totals.get("actionable", 0))
527+
+ int(backend_totals.get("suggested_archives", 0))
528+
+ int(out.get("informational_count", 0))
529+
)
522530
merged_prefs_applied = out.get("preferences_applied", merged_prefs_applied)
523531
for item in out.get("urgent", []):
524532
item["mailbox"] = provider

hub/agents/python/email/gaia_agent_email/api_routes.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -680,9 +680,11 @@ class EmailSendRequest(_Strict):
680680
provider: Optional[str] = Field(
681681
default=None,
682682
description=(
683-
"Optional provider override ('google' or 'microsoft'). Ignored "
684-
"when the confirmation token carries a provider binding — the "
685-
"token's bound provider always wins."
683+
"Optional provider ('google' or 'microsoft'), used ONLY as the "
684+
"fallback when the confirmation token carries no provider binding. "
685+
"A token's bound provider always wins; with two mailboxes connected "
686+
"and neither a binding nor this field set, the send is rejected as "
687+
"ambiguous (400)."
686688
),
687689
)
688690

@@ -767,9 +769,10 @@ async def send_email(request: EmailSendRequest) -> EmailSendResponse:
767769
# send takes a single 'to' header string. Run off the event loop so
768770
# get_access_token_sync (used inside the token resolvers) does not hit
769771
# the "called from a thread with a running event loop" guard (#1594).
770-
# The token's bound provider (D5) overrides the count-based D2 logic;
771-
# if no binding, fall back to the single-mailbox resolver.
772-
backend = _resolve_backend_for_provider(bound_provider)
772+
# Provider precedence: the token's bound provider (D5) always wins; only an
773+
# unbound token falls back to request.provider; with neither, the
774+
# count-based resolver decides (and 400s when 2+ are connected).
775+
backend = _resolve_backend_for_provider(bound_provider or request.provider)
773776
to_header = ", ".join(_format_address(a) for a in request.to)
774777
result = await asyncio.to_thread(
775778
backend.send_message, to=to_header, subject=request.subject, body=request.body

src/gaia/connectors/store.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -367,10 +367,21 @@ def list_connections() -> List[str]:
367367
Best-effort enumeration of stored providers.
368368
369369
The ``keyring`` API does not expose a portable "list all entries for
370-
service" call. v1 returns the providers we know about (currently
371-
just ``google``); future providers extend this.
370+
service" call, so we probe each registered OAuth provider for a stored
371+
connection blob. The provider set is registry-driven (every ``oauth_pkce``
372+
spec), so a newly added provider is enumerated without editing this
373+
function — and generic consumers (``api.list_connections`` /
374+
``get_connection`` / the REST endpoint / ``tripwire_check``) see every
375+
connected mailbox, not just Google.
372376
"""
373-
known = ("google",)
377+
# Function-local import: keep this module free of a module-level registry
378+
# dependency (mirrors ``api._require_mcp_server_for_activation``). Importing
379+
# the catalog registers the built-in specs; the module cache makes repeats a
380+
# no-op.
381+
import gaia.connectors.catalog # noqa: F401 # pylint: disable=unused-import
382+
from gaia.connectors.registry import REGISTRY
383+
384+
known = tuple(spec.id for spec in REGISTRY.all() if spec.type == "oauth_pkce")
374385
found: list[str] = []
375386
for provider in known:
376387
username = _connection_username(provider, DEFAULT_ACCOUNT)

tests/unit/agents/email/test_draft_token_provider_binding.py

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,3 +255,125 @@ def test_confirmation_store_none_provider_on_unbound_token(self):
255255
ok, bound_provider = store.consume_with_provider(token, fp)
256256
assert ok is True
257257
assert bound_provider is None
258+
259+
260+
class TestSendRequestProviderFallback:
261+
"""``EmailSendRequest.provider`` is the unbound-token fallback.
262+
263+
Precedence: the token's bound provider always wins; ``request.provider`` is
264+
consulted ONLY when the token carries no binding. Both None + multiple
265+
connected → still 400 ambiguous.
266+
"""
267+
268+
def _draft_token(self, client, *, provider=None):
269+
body = {
270+
"to": [{"email": "bob@example.com"}],
271+
"subject": "Hello",
272+
"body": "World",
273+
}
274+
if provider is not None:
275+
body["provider"] = provider
276+
resp = client.post("/v1/email/draft", json=body)
277+
assert resp.status_code == 200, resp.text
278+
return resp.json()["confirmation_token"]
279+
280+
def _spy_resolver(self, monkeypatch, calls):
281+
"""Patch _resolve_backend_for_provider to record the provider it sees."""
282+
283+
class _Fake:
284+
def __init__(self, name):
285+
self.name = name
286+
287+
def send_message(self, *, to, subject, body, **_kw):
288+
calls.append(self.name)
289+
if self.name == "microsoft":
290+
return {"id": "", "sent": True, "to": to, "subject": subject}
291+
return {"id": f"{self.name}-sent", "to": to, "subject": subject}
292+
293+
def _resolve(provider=None):
294+
# Mirror production: None + 2 connected → 400 ambiguous.
295+
if provider is None:
296+
from fastapi import HTTPException
297+
298+
connected = email_routes.connected_mailbox_providers()
299+
if len(connected) != 1:
300+
raise HTTPException(400, f"ambiguous: {connected}")
301+
provider = connected[0]
302+
return _Fake(provider)
303+
304+
monkeypatch.setattr(email_routes, "_resolve_backend_for_provider", _resolve)
305+
return calls
306+
307+
def test_unbound_token_uses_request_provider(self, monkeypatch):
308+
"""Unbound token + request.provider='microsoft' + both connected → Outlook."""
309+
if app is None:
310+
pytest.skip("gaia.api.openai_server not available")
311+
monkeypatch.setattr(
312+
email_routes,
313+
"connected_mailbox_providers",
314+
lambda: ["google", "microsoft"],
315+
)
316+
calls = self._spy_resolver(monkeypatch, [])
317+
client = TestClient(app)
318+
token = self._draft_token(client) # no binding
319+
resp = client.post(
320+
"/v1/email/send",
321+
json={
322+
"to": [{"email": "bob@example.com"}],
323+
"subject": "Hello",
324+
"body": "World",
325+
"confirmation_token": token,
326+
"provider": "microsoft",
327+
},
328+
)
329+
assert resp.status_code == 200, resp.text
330+
assert calls == ["microsoft"], f"request.provider ignored: {calls}"
331+
332+
def test_token_binding_wins_over_request_provider(self, monkeypatch):
333+
"""Token bound to google + request.provider='microsoft' → google wins."""
334+
if app is None:
335+
pytest.skip("gaia.api.openai_server not available")
336+
monkeypatch.setattr(
337+
email_routes,
338+
"connected_mailbox_providers",
339+
lambda: ["google", "microsoft"],
340+
)
341+
calls = self._spy_resolver(monkeypatch, [])
342+
client = TestClient(app)
343+
token = self._draft_token(client, provider="google") # bound to google
344+
resp = client.post(
345+
"/v1/email/send",
346+
json={
347+
"to": [{"email": "bob@example.com"}],
348+
"subject": "Hello",
349+
"body": "World",
350+
"confirmation_token": token,
351+
"provider": "microsoft", # must be ignored
352+
},
353+
)
354+
assert resp.status_code == 200, resp.text
355+
assert calls == ["google"], f"token binding did not win: {calls}"
356+
357+
def test_unbound_token_no_provider_both_connected_still_400(self, monkeypatch):
358+
"""Unbound token + no request.provider + both connected → 400 ambiguous."""
359+
if app is None:
360+
pytest.skip("gaia.api.openai_server not available")
361+
monkeypatch.setattr(
362+
email_routes,
363+
"connected_mailbox_providers",
364+
lambda: ["google", "microsoft"],
365+
)
366+
calls = self._spy_resolver(monkeypatch, [])
367+
client = TestClient(app)
368+
token = self._draft_token(client) # no binding, no provider on send
369+
resp = client.post(
370+
"/v1/email/send",
371+
json={
372+
"to": [{"email": "bob@example.com"}],
373+
"subject": "Hello",
374+
"body": "World",
375+
"confirmation_token": token,
376+
},
377+
)
378+
assert resp.status_code == 400, resp.text
379+
assert calls == [], f"a send leaked through the ambiguity guard: {calls}"

tests/unit/agents/email/test_multi_inbox_triage.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,33 @@ def test_pre_scan_tags_every_section_item(self, tmp_path, monkeypatch):
241241
finally:
242242
agent.close_db()
243243

244+
def test_pre_scan_under_budget_backend_not_skipped_when_earlier_backend_underfills(
245+
self, tmp_path, monkeypatch
246+
):
247+
# The budget guard must count messages ACTUALLY returned, not the
248+
# per-backend cap. Fails-first scenario (max_messages < n_backends):
249+
# - google inbox EMPTY, microsoft has 1 message
250+
# - pre_scan(max_messages=1) → per_backend = max(1, 1//2) = 1
251+
# - OLD `scanned += per_backend`: empty google bumps scanned to 1 →
252+
# guard `scanned >= 1` trips → microsoft is SKIPPED (the bug)
253+
# - NEW `scanned += actual`: empty google returns 0 → scanned stays 0
254+
# → microsoft is scanned and contributes.
255+
# google is first in registry order, so it is the under-filling backend.
256+
agent, _, _ = _agent_two_backends(
257+
tmp_path, monkeypatch, google_ids=[], microsoft_ids=["m1"]
258+
)
259+
try:
260+
envelope = json.loads(_registered_tool("pre_scan_inbox")(1))
261+
data = envelope["data"]
262+
items = data["urgent"] + data["actionable"] + data["suggested_archives"]
263+
mailboxes = {it["mailbox"] for it in items}
264+
# microsoft was reached despite the empty google scanned first.
265+
assert "microsoft" in mailboxes, mailboxes
266+
# And its message was tagged for downstream routing.
267+
assert agent._message_mailbox.get("m1") == "microsoft"
268+
finally:
269+
agent.close_db()
270+
244271

245272
# ---------------------------------------------------------------------------
246273
# D4 — per-message backend routing

tests/unit/connectors/test_api.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
from gaia.connectors import (
2727
AuthRequiredError,
2828
get_access_token,
29+
get_connection,
2930
grant_agent,
3031
list_agent_grants,
3132
list_connections,
@@ -174,3 +175,48 @@ def test_list_connections_via_public_api(self, seeded):
174175
def test_revoke_connection_via_public_api(self, seeded):
175176
revoke_connection("google")
176177
assert list_connections() == []
178+
179+
def test_microsoft_connection_visible_to_generic_api(self, monkeypatch, tmp_path):
180+
# Root-cause fix (#1603): a stored Microsoft connection with no google
181+
# must be seen by the GENERIC api surface — list_connections() includes
182+
# it and get_connection("microsoft") returns metadata (not None). The
183+
# old store.list_connections() hardcoded ("google",), so every generic
184+
# consumer was Microsoft-blind.
185+
monkeypatch.setenv("GAIA_MICROSOFT_CLIENT_ID", "test-ms-client")
186+
monkeypatch.setattr("gaia.connectors.grants.Path.home", lambda: tmp_path)
187+
_registry.clear()
188+
from gaia.connectors.providers import get as get_provider
189+
190+
ms = get_provider("microsoft")
191+
save_connection(
192+
provider="microsoft",
193+
account_email="user@outlook.com",
194+
refresh_token="ms-rt",
195+
scopes=["Mail.ReadWrite"],
196+
client_id_hash=ms.client_id_hash,
197+
)
198+
199+
rows = list_connections()
200+
providers = {row["provider"] for row in rows}
201+
assert "microsoft" in providers
202+
203+
conn = get_connection("microsoft")
204+
assert conn is not None, "get_connection('microsoft') must not be None"
205+
assert conn["account_email"] == "user@outlook.com"
206+
assert "refresh_token" not in conn
207+
208+
def test_both_providers_visible_to_generic_api(self, monkeypatch, tmp_path, seeded):
209+
# google seeded by the fixture; add microsoft and assert both surface.
210+
monkeypatch.setenv("GAIA_MICROSOFT_CLIENT_ID", "test-ms-client")
211+
from gaia.connectors.providers import get as get_provider
212+
213+
ms = get_provider("microsoft")
214+
save_connection(
215+
provider="microsoft",
216+
account_email="user@outlook.com",
217+
refresh_token="ms-rt",
218+
scopes=["Mail.ReadWrite"],
219+
client_id_hash=ms.client_id_hash,
220+
)
221+
providers = {row["provider"] for row in list_connections()}
222+
assert {"google", "microsoft"} <= providers

tests/unit/connectors/test_store.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,41 @@ def test_list_connections(self):
9090
ids = list_connections()
9191
assert "google" in ids
9292

93+
def test_list_connections_finds_microsoft_only(self):
94+
# Registry-driven enumeration (#1603): a stored Microsoft connection
95+
# with NO google must surface — the old hardcoded ("google",) tuple
96+
# made every generic consumer Microsoft-blind.
97+
save_connection(
98+
provider="microsoft",
99+
account_email="m@example.com",
100+
refresh_token=SENTINEL_REFRESH_TOKEN,
101+
scopes=["s1"],
102+
client_id_hash="h",
103+
)
104+
ids = list_connections()
105+
assert ids == ["microsoft"]
106+
107+
def test_list_connections_finds_both_providers(self):
108+
save_connection(
109+
provider="google",
110+
account_email="g@example.com",
111+
refresh_token=SENTINEL_REFRESH_TOKEN,
112+
scopes=["s1"],
113+
client_id_hash="h",
114+
)
115+
save_connection(
116+
provider="microsoft",
117+
account_email="m@example.com",
118+
refresh_token=SENTINEL_REFRESH_TOKEN,
119+
scopes=["s1"],
120+
client_id_hash="h",
121+
)
122+
ids = list_connections()
123+
assert set(ids) == {"google", "microsoft"}
124+
# Only oauth_pkce providers are enumerated — MCP-server connectors have
125+
# no keyring connection and must not appear.
126+
assert "mcp-git" not in ids
127+
93128

94129
class TestSingleBlobAtomicity:
95130
def test_one_keyring_slot_per_connection(self):

0 commit comments

Comments
 (0)