Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .github/workflows/live-canary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,22 @@ jobs:
AUTH_LIVE_NOTION_ACCESS_TOKEN: ${{ secrets.AUTH_LIVE_NOTION_ACCESS_TOKEN }}
AUTH_LIVE_NOTION_REFRESH_TOKEN: ${{ secrets.AUTH_LIVE_NOTION_REFRESH_TOKEN }}
AUTH_LIVE_NOTION_QUERY: ${{ secrets.AUTH_LIVE_NOTION_QUERY }}
AUTH_LIVE_LINEAR_ACCESS_TOKEN: ${{ secrets.AUTH_LIVE_LINEAR_ACCESS_TOKEN }}
AUTH_LIVE_LINEAR_REFRESH_TOKEN: ${{ secrets.AUTH_LIVE_LINEAR_REFRESH_TOKEN }}
AUTH_LIVE_LINEAR_QUERY: ${{ secrets.AUTH_LIVE_LINEAR_QUERY }}
AUTH_LIVE_LINEAR_TOOL_NAME: ${{ vars.AUTH_LIVE_LINEAR_TOOL_NAME || 'linear_search_issues' }}
AUTH_LIVE_LINEAR_TOOL_ARGS_JSON: ${{ secrets.AUTH_LIVE_LINEAR_TOOL_ARGS_JSON }}
AUTH_LIVE_BRAVE_API_KEY: ${{ secrets.AUTH_LIVE_BRAVE_API_KEY }}
AUTH_LIVE_SLACK_BOT_TOKEN: ${{ secrets.AUTH_LIVE_SLACK_BOT_TOKEN }}
AUTH_LIVE_COMPOSIO_API_KEY: ${{ secrets.AUTH_LIVE_COMPOSIO_API_KEY }}
AUTH_LIVE_TELEGRAM_API_ID: ${{ secrets.AUTH_LIVE_TELEGRAM_API_ID }}
AUTH_LIVE_TELEGRAM_API_HASH: ${{ secrets.AUTH_LIVE_TELEGRAM_API_HASH }}
AUTH_LIVE_TELEGRAM_SESSION_JSON: ${{ secrets.AUTH_LIVE_TELEGRAM_SESSION_JSON }}
AUTH_LIVE_GOOGLE_DRIVE_QUERY: ${{ vars.AUTH_LIVE_GOOGLE_DRIVE_QUERY || 'trashed = false' }}
AUTH_LIVE_GOOGLE_DOC_ID: ${{ secrets.AUTH_LIVE_GOOGLE_DOC_ID }}
AUTH_LIVE_GOOGLE_SHEET_ID: ${{ secrets.AUTH_LIVE_GOOGLE_SHEET_ID }}
AUTH_LIVE_GOOGLE_SHEET_RANGE: ${{ vars.AUTH_LIVE_GOOGLE_SHEET_RANGE || 'A1:Z10' }}
AUTH_LIVE_GOOGLE_SLIDES_ID: ${{ secrets.AUTH_LIVE_GOOGLE_SLIDES_ID }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
Expand Down
8 changes: 8 additions & 0 deletions scripts/auth_live_canary/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@ not model behavior.
- `notion`
Uses `mcp_notion_access_token`
Runs through Responses API
- `linear`
Uses `mcp_linear_access_token`
Runs through Responses API
- `ops_workflow`
Installs Gmail, Calendar, Drive, Docs, Sheets, Slides, GitHub, Web Search,
LLM Context, Slack, Telegram, Composio, Notion, and Linear. It dispatches one
deterministic multi-tool ops brief probe through `/v1/responses`.

## Setup

Expand Down Expand Up @@ -76,6 +83,7 @@ Run only selected providers:

```bash
python3 scripts/auth_live_canary/run_live_canary.py --case gmail --case github
python3 scripts/auth_live_canary/run_live_canary.py --case ops_workflow
```

CI-style fresh-machine install:
Expand Down
30 changes: 28 additions & 2 deletions scripts/auth_live_canary/config.example.env
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
# Google / Gmail / Calendar
# Google / Gmail / Calendar / Drive / Docs / Sheets / Slides
GOOGLE_OAUTH_CLIENT_ID=
GOOGLE_OAUTH_CLIENT_SECRET=
AUTH_LIVE_GOOGLE_ACCESS_TOKEN=
AUTH_LIVE_GOOGLE_REFRESH_TOKEN=
AUTH_LIVE_GOOGLE_SCOPES=gmail.modify gmail.compose calendar.events
AUTH_LIVE_GOOGLE_SCOPES="https://www.googleapis.com/auth/gmail.modify https://www.googleapis.com/auth/gmail.compose https://www.googleapis.com/auth/calendar.events https://www.googleapis.com/auth/drive https://www.googleapis.com/auth/documents https://www.googleapis.com/auth/spreadsheets https://www.googleapis.com/auth/presentations"
# Set to 0 to skip forced refresh on first probe.
AUTH_LIVE_FORCE_GOOGLE_REFRESH=1
AUTH_LIVE_GOOGLE_DRIVE_QUERY="trashed = false"
AUTH_LIVE_GOOGLE_DOC_ID=
AUTH_LIVE_GOOGLE_SHEET_ID=
AUTH_LIVE_GOOGLE_SHEET_RANGE=A1:Z10
AUTH_LIVE_GOOGLE_SLIDES_ID=

# GitHub
AUTH_LIVE_GITHUB_TOKEN=
Expand All @@ -17,3 +22,24 @@ AUTH_LIVE_GITHUB_ISSUE_NUMBER=
AUTH_LIVE_NOTION_ACCESS_TOKEN=
AUTH_LIVE_NOTION_REFRESH_TOKEN=
AUTH_LIVE_NOTION_QUERY=canary

# Linear MCP
AUTH_LIVE_LINEAR_ACCESS_TOKEN=
AUTH_LIVE_LINEAR_REFRESH_TOKEN=
AUTH_LIVE_LINEAR_QUERY=canary
AUTH_LIVE_LINEAR_TOOL_NAME=linear_search_issues
AUTH_LIVE_LINEAR_TOOL_ARGS_JSON=

# Brave-backed tools
AUTH_LIVE_BRAVE_API_KEY=

# Slack
AUTH_LIVE_SLACK_BOT_TOKEN=

# Composio
AUTH_LIVE_COMPOSIO_API_KEY=

# Telegram user-mode tool
AUTH_LIVE_TELEGRAM_API_ID=
AUTH_LIVE_TELEGRAM_API_HASH=
AUTH_LIVE_TELEGRAM_SESSION_JSON=
126 changes: 107 additions & 19 deletions scripts/auth_live_canary/run_live_canary.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,12 @@
sys.path.insert(0, str(ROOT))

from scripts.live_canary.auth_registry import SeededProviderCase, configured_seeded_cases
from scripts.live_canary.auth_runtime import activate_extension, install_extension, put_secret
from scripts.live_canary.auth_runtime import (
activate_extension,
install_extension,
put_secret,
write_memory,
)
from scripts.live_canary.common import (
DEFAULT_SECRETS_MASTER_KEY,
DEFAULT_VENV,
Expand All @@ -49,7 +54,17 @@

DEFAULT_OUTPUT_DIR = ROOT / "artifacts" / "auth-live-canary"
OWNER_USER_ID = "auth-live-owner"
GOOGLE_SCOPE_DEFAULT = "gmail.modify gmail.compose calendar.events"
GOOGLE_SCOPE_DEFAULT = " ".join(
[
"https://www.googleapis.com/auth/gmail.modify",
"https://www.googleapis.com/auth/gmail.compose",
"https://www.googleapis.com/auth/calendar.events",
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/documents",
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/presentations",
]
)


def expire_secret_in_db(db_path: Path, user_id: str, secret_name: str) -> None:
Expand Down Expand Up @@ -95,6 +110,7 @@ async def create_response_probe(
response_id = body.get("id")
output = body.get("output", [])
tool_names = [item.get("name") for item in output if item.get("type") == "function_call"]
expected_tool_names = probe.required_tool_names
tool_outputs = [
item.get("output", "")
for item in output
Expand All @@ -120,14 +136,14 @@ async def create_response_probe(

success = (
body.get("status") == "completed"
and probe.expected_tool_name in tool_names
and all(tool_name in tool_names for tool_name in expected_tool_names)
and bool(tool_outputs)
and not any(
marker in output_text.lower()
for output_text in tool_outputs
for marker in ("error", "authentication required", "unauthorized", "forbidden")
)
and probe.expected_text in response_text
and (not probe.expected_text or probe.expected_text in response_text)
and fetched_status == 200
)

Expand All @@ -140,6 +156,7 @@ async def create_response_probe(
"response_id": response_id,
"status": body.get("status"),
"tool_names": tool_names,
"expected_tool_names": expected_tool_names,
"tool_outputs": tool_outputs,
"response_text": response_text,
"get_status_code": fetched_status,
Expand Down Expand Up @@ -291,6 +308,64 @@ async def seed_live_credentials(base_url: str, token: str, db_path: Path) -> Non
provider="mcp:notion",
)

linear_access = env_str("AUTH_LIVE_LINEAR_ACCESS_TOKEN")
linear_refresh = env_str("AUTH_LIVE_LINEAR_REFRESH_TOKEN")
if linear_refresh and not linear_access:
raise CanaryError(
"AUTH_LIVE_LINEAR_ACCESS_TOKEN is required when AUTH_LIVE_LINEAR_REFRESH_TOKEN is set"
)
if linear_access:
await put_secret(
base_url,
token,
user_id=OWNER_USER_ID,
name="mcp_linear_access_token",
value=linear_access,
provider="mcp:linear",
)
if linear_refresh:
await put_secret(
base_url,
token,
user_id=OWNER_USER_ID,
name="mcp_linear_access_token_refresh_token",
value=linear_refresh,
provider="mcp:linear",
)

for env_name, secret_name, provider in (
("AUTH_LIVE_BRAVE_API_KEY", "brave_api_key", "brave"),
("AUTH_LIVE_SLACK_BOT_TOKEN", "slack_bot_token", "slack"),
("AUTH_LIVE_COMPOSIO_API_KEY", "composio_api_key", "composio"),
("AUTH_LIVE_TELEGRAM_API_ID", "telegram_api_id", "telegram"),
("AUTH_LIVE_TELEGRAM_API_HASH", "telegram_api_hash", "telegram"),
):
value = env_str(env_name)
if value:
await put_secret(
base_url,
token,
user_id=OWNER_USER_ID,
name=secret_name,
value=value,
provider=provider,
)

telegram_api_id = env_str("AUTH_LIVE_TELEGRAM_API_ID")
telegram_api_hash = env_str("AUTH_LIVE_TELEGRAM_API_HASH")
telegram_session = env_str("AUTH_LIVE_TELEGRAM_SESSION_JSON")
if telegram_api_id:
await write_memory(base_url, token, path="telegram/api_id", content=telegram_api_id)
if telegram_api_hash:
await write_memory(base_url, token, path="telegram/api_hash", content=telegram_api_hash)
if telegram_session:
await write_memory(
base_url,
token,
path="telegram/session.json",
content=telegram_session,
)


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description=__doc__)
Expand Down Expand Up @@ -325,7 +400,14 @@ def parse_args() -> argparse.Namespace:
parser.add_argument(
"--case",
action="append",
choices=("gmail", "google_calendar", "github", "notion"),
choices=(
"gmail",
"google_calendar",
"github",
"notion",
"linear",
"ops_workflow",
),
help="Limit the run to specific providers. Repeat for multiple values.",
)
parser.add_argument(
Expand Down Expand Up @@ -374,21 +456,27 @@ async def async_main(args: argparse.Namespace) -> int:
try:
await seed_live_credentials(stack.base_url, stack.gateway_token, stack.db_path)

installed: dict[str, dict[str, Any]] = {}
for probe in probes:
ext = await install_extension(
stack.base_url,
stack.gateway_token,
name=probe.extension_install_name,
expected_display_name=probe.expected_display_name,
install_kind=probe.install_kind,
install_url=probe.install_url,
)
await activate_extension(
stack.base_url,
stack.gateway_token,
extension_name=ext["name"],
expected_display_name=ext.get("display_name") or probe.expected_display_name,
)
for installation in probe.installations:
if installation.name in installed:
continue
ext = await install_extension(
stack.base_url,
stack.gateway_token,
name=installation.name,
expected_display_name=installation.expected_display_name,
install_kind=installation.install_kind,
install_url=installation.install_url,
)
await activate_extension(
stack.base_url,
stack.gateway_token,
extension_name=ext["name"],
expected_display_name=ext.get("display_name")
or installation.expected_display_name,
)
installed[installation.name] = ext

results: list[ProbeResult] = []
for probe in probes:
Expand Down
90 changes: 90 additions & 0 deletions scripts/live-canary/ACCOUNTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,17 @@ Every provider should have one stable, low-risk probe target.

- Gmail: one inbox with at least one readable message or draft
- Google Calendar: one calendar with at least one upcoming event
- Google Drive: one accessible stable fixture query or file set
- Google Docs: one readable fixture document
- Google Sheets: one readable fixture spreadsheet/range
- Google Slides: one readable fixture presentation
- GitHub: one dedicated repository with one stable issue
- Brave Search: one low-volume API key shared by Web Search and LLM Context
- Slack: one workspace with a bot token that can list channels
- Telegram: one logged-in user-mode MTProto session
- Composio: one API key with at least one readable connected-account state
- Notion: one test workspace with one searchable page or database row
- Linear: one workspace with one searchable issue

## Seeded Lane Secrets

Expand Down Expand Up @@ -100,6 +109,21 @@ Recommended scopes:
- `https://www.googleapis.com/auth/gmail.modify`
- `https://www.googleapis.com/auth/gmail.compose`
- `https://www.googleapis.com/auth/calendar.events`
- `https://www.googleapis.com/auth/drive`
- `https://www.googleapis.com/auth/documents`
- `https://www.googleapis.com/auth/spreadsheets`
- `https://www.googleapis.com/auth/presentations`

Required only for the combined `ops_workflow` case:

- `AUTH_LIVE_GOOGLE_DOC_ID`
- `AUTH_LIVE_GOOGLE_SHEET_ID`
- `AUTH_LIVE_GOOGLE_SLIDES_ID`

Optional:

- `AUTH_LIVE_GOOGLE_DRIVE_QUERY` (defaults to `trashed = false`)
- `AUTH_LIVE_GOOGLE_SHEET_RANGE` (defaults to `A1:Z10`)

### GitHub

Expand All @@ -125,6 +149,72 @@ Optional:

The probe should match a stable test page or database entry.

### Linear

Required:

- `AUTH_LIVE_LINEAR_ACCESS_TOKEN`
- `AUTH_LIVE_LINEAR_QUERY`

Optional:

- `AUTH_LIVE_LINEAR_REFRESH_TOKEN`
- `AUTH_LIVE_LINEAR_TOOL_NAME`
- `AUTH_LIVE_LINEAR_TOOL_ARGS_JSON`

Use `AUTH_LIVE_LINEAR_TOOL_NAME` and `AUTH_LIVE_LINEAR_TOOL_ARGS_JSON` if the
Linear MCP server's tool name or argument schema changes. The default tool name
is `linear_search_issues`, with arguments `{"query": "<AUTH_LIVE_LINEAR_QUERY>"}`.

### Brave Search

Required for Web Search and LLM Context probes:

- `AUTH_LIVE_BRAVE_API_KEY`

### Slack

Required:

- `AUTH_LIVE_SLACK_BOT_TOKEN`

The combined workflow uses `list_channels` to avoid posting on every scheduled
run.

### Telegram

Required:

- `AUTH_LIVE_TELEGRAM_API_ID`
- `AUTH_LIVE_TELEGRAM_API_HASH`
- `AUTH_LIVE_TELEGRAM_SESSION_JSON`

The seeded runner writes these to `telegram/api_id`, `telegram/api_hash`, and
`telegram/session.json` in the fresh workspace before activating the tool. The
combined workflow uses `get_me` to avoid sending messages on every scheduled
run.

### Composio

Required:

- `AUTH_LIVE_COMPOSIO_API_KEY`

The combined workflow uses `connected_accounts`, which is read-only.

### Combined Ops Workflow

Run this after provisioning every fixture above:

```bash
LANE=auth-live-seeded CASES=ops_workflow scripts/live-canary/run.sh
```

It installs and activates Gmail, Google Calendar, Google Drive, Google Docs,
Google Sheets, Google Slides, GitHub, Web Search, LLM Context, Slack, Telegram,
Composio, Notion, and Linear, then dispatches one deterministic `/v1/responses`
turn that calls every tool.

## Browser-Consent Lane Secrets

These are read by `scripts/auth_browser_canary/run_browser_canary.py`.
Expand Down
1 change: 1 addition & 0 deletions scripts/live-canary/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Run selected auth provider cases:

```bash
LANE=auth-live-seeded CASES=gmail,github scripts/live-canary/run.sh
LANE=auth-live-seeded CASES=ops_workflow scripts/live-canary/run.sh
LANE=auth-browser-consent CASES=google,github scripts/live-canary/run.sh
```

Expand Down
Loading
Loading