feat(e2e): extensions tab tests, CI parallelization, and 3 production bug fixes (#584)

henrypark133 · claude · web-flow · commit b425213c5382 · 2026-03-06T08:27:38.000Z
* feat(e2e): extensions tab tests, CI parallelization, and 3 bug fixes

## E2E test coverage
- Add tests/e2e/scenarios/test_extensions.py with 57 tests covering all
  extensions tab flows: installed WASM tool/MCP/channel cards, configure
  modal (open, fields, cancel, save, OAuth, error), auth card (token,
  OAuth, submit, cancel, error, multi-extension coexistence), activate
  flow, install/remove flows, WASM channel stepper states, and tab reload
  behaviour. All network calls intercepted via page.route() — no real
  binaries or external registries needed.
- Expand tests/e2e/helpers.py with 50+ new CSS selectors for the
  extensions tab UI.
- Add tests/e2e/README.md documentation on the page.route() mocking
  pattern, LIFO handler ordering, and page.evaluate() injection.

## CI parallelization
- Split .github/workflows/e2e.yml into a build job (compile once,
  upload artifact) and a 3-way parallel test matrix (core / features /
  extensions), matching the pattern in test.yml. Reduces wall-clock time
  from ~15–20 min serial to ~10–12 min. Adds an e2e roll-up job for
  branch protection.

## Bug fixes in app.js (found via test-driven code review)
- Fix null crash: renderExtensionCard() called ext.tools.length without
  a null guard; add ext.tools &amp;&amp; check (regression: test_ext_tools_null).
- Fix modal UX: submitConfigureModal() closed the overlay before checking
  success, making failures unrecoverable without reopening; close only on
  success, re-enable buttons and keep modal open on failure
  (regression: test_configure_modal_stays_open_on_save_failure).
- Fix URL injection: all window.open() calls for server-supplied auth_url
  now go through openOAuthUrl() which rejects non-HTTPS schemes
  (regression: test_oauth_url_injection_blocked).

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;

* refactor(e2e): prune extensions tests 57→46 by merging redundant setups

Merge 11 tests that shared identical fixture+navigation overhead:
- Group A: 3 empty-state tests → test_extensions_empty_tab_layout
- Group B: card_renders absorbs ext_tools_list_shown (same _WASM_TOOL fixture)
- Group B: auth_dot_unauthed + unauthed_shows_configure_btn → test_installed_wasm_tool_unauthed_state
- Group D: installed + configured states → test_wasm_channel_setup_states (identical UI)
- Group D: failed_state + stepper_failed_circle → test_wasm_channel_failed_renders
- Group G: 5 field badge tests → test_configure_modal_field_variants (4 fields, one pass)
- Group H: submit_success + enter_key_submits → test_auth_card_submit_success

Coverage preserved: all assertions kept, no unique behaviors removed.
Extensions CI job estimated to drop from ~7 min to ~5 min.

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;

* fix(e2e): fix configure_input selector scoping in merged field variants test

modal.locator(".configure-modal input[type='password']") scoped the absolute
selector inside .configure-modal, effectively searching for a nested
.configure-modal which never exists → count() == 0. Use page.locator()
instead, consistent with all other tests in the file.

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;

* fix(e2e): address PR review comments — replace fixed sleeps with deterministic waits

- Remove unnecessary wait_for_timeout(1000) from test_remove_cancelled_keeps_card
  (window.confirm = () =&gt; false is synchronous; DOM is unchanged when click() returns)
- Replace wait_for_timeout(800) with wait_for_function() for window._lastOpenedUrl
  checks in configure_modal_save_oauth and activate_with_auth_url_opens_popup
- Replace wait_for_timeout(300) with nth(1).wait_for(visible) in
  test_auth_card_multiple_extensions_coexist
- Remove wait_for_timeout(800/300) in test_auth_card_submit_empty_noop and
  test_auth_completed_sse_dismisses_card (both check synchronous JS side-effects)
- Add comment in test_oauth_url_injection_blocked explaining why timeout is kept
  (negative assertion — cannot use wait_for_function for absence of event)

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;

* fix(e2e): address remaining PR review comments

- Remove unused `import pytest` from test_extensions.py
- Fix unawaited coroutine bug: convert lambda route handlers to async def
  in test_extensions_tab_reloads_on_revisit and
  test_auth_completed_sse_triggers_extensions_reload (lambda r: r.fulfill(...)
  returns an unawaited coroutine; requests silently fell through to real server)
- Fix README.md example to use async def handler (same bug in docs)
- Harden openOAuthUrl() in app.js: use URL constructor instead of
  .startsWith() so non-string server-supplied values (objects, null, etc.)
  are safely rejected rather than throwing TypeError

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;

* fix(e2e): address second round of PR review comments

- Add timeout-minutes to CI build job to prevent hung workflows
- Use parsed.href instead of raw url in openOAuthUrl for safety
- Remove unused MessageEvent variable in auth_completed test
- Replace wait_for_timeout(800) with expect_response in activate test
- Replace wait_for_timeout(300) with tab panel wait_for in reload test

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/.github/workflows/e2e.yml b/.github/workflows/e2e.yml
@@ -9,8 +9,9 @@ on:
       - "tests/e2e/**"
 
 jobs:
-  e2e:
-    name: Browser E2E
+  # ── Step 1: compile once ──────────────────────────────────────────────────
+  build:
+    name: Build ironclaw (libsql)
     runs-on: ubuntu-latest
     timeout-minutes: 30
     steps:
@@ -25,9 +26,44 @@ jobs:
             ~/.cargo/registry
           key: e2e-${{ runner.os }}-${{ hashFiles('Cargo.lock') }}
 
-      - name: Build ironclaw (libsql)
+      - name: Build
         run: cargo build --no-default-features --features libsql
 
+      - name: Upload binary
+        uses: actions/upload-artifact@v4
+        with:
+          name: ironclaw-e2e-binary
+          path: target/debug/ironclaw
+          retention-days: 1
+
+  # ── Step 2: run test slices in parallel ───────────────────────────────────
+  test:
+    name: E2E (${{ matrix.group }})
+    needs: build
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - group: core
+            files: "tests/e2e/scenarios/test_connection.py tests/e2e/scenarios/test_chat.py tests/e2e/scenarios/test_sse_reconnect.py tests/e2e/scenarios/test_html_injection.py"
+          - group: features
+            files: "tests/e2e/scenarios/test_skills.py tests/e2e/scenarios/test_tool_approval.py"
+          - group: extensions
+            files: "tests/e2e/scenarios/test_extensions.py"
+    steps:
+      - uses: actions/checkout@v6
+
+      - name: Download binary
+        uses: actions/download-artifact@v4
+        with:
+          name: ironclaw-e2e-binary
+          path: target/debug/
+
+      - name: Make binary executable
+        run: chmod +x target/debug/ironclaw
+
       - uses: actions/setup-python@v5
         with:
           python-version: "3.12"
@@ -38,13 +74,26 @@ jobs:
           pip install -e .
           playwright install --with-deps chromium
 
-      - name: Run E2E tests
-        run: pytest tests/e2e/ -v -x --timeout=120
+      - name: Run E2E tests (${{ matrix.group }})
+        run: pytest ${{ matrix.files }} -v --timeout=120
 
       - name: Upload screenshots on failure
         if: failure()
         uses: actions/upload-artifact@v4
         with:
-          name: e2e-screenshots
+          name: e2e-screenshots-${{ matrix.group }}
           path: tests/e2e/screenshots/
           if-no-files-found: ignore
+
+  # ── Roll-up for branch protection ────────────────────────────────────────
+  e2e:
+    name: E2E Tests
+    runs-on: ubuntu-latest
+    if: always()
+    needs: [test]
+    steps:
+      - run: |
+          if [[ "${{ needs.test.result }}" != "success" ]]; then
+            echo "One or more E2E jobs failed"
+            exit 1
+          fi
diff --git a/src/channels/web/static/app.js b/src/channels/web/static/app.js
@@ -1003,7 +1003,7 @@ function showAuthCard(data) {
     oauthBtn.className = 'auth-oauth';
     oauthBtn.textContent = 'Authenticate with ' + data.extension_name;
     oauthBtn.addEventListener('click', () => {
-      window.open(data.auth_url, '_blank', 'width=600,height=700');
+      openOAuthUrl(data.auth_url);
     });
     links.appendChild(oauthBtn);
   }
@@ -1921,7 +1921,7 @@ function renderAvailableExtensionCard(entry) {
         // OAuth popup if auth started during install (builtin creds)
         if (res.auth_url) {
           showToast('Opening authentication for ' + entry.display_name, 'info');
-          window.open(res.auth_url, '_blank', 'width=600,height=700');
+          openOAuthUrl(res.auth_url);
         }
         loadExtensions();
         // Auto-open configure for WASM channels
@@ -2079,7 +2079,7 @@ function renderExtensionCard(ext) {
     card.appendChild(url);
   }
 
-  if (ext.tools.length > 0) {
+  if (ext.tools && ext.tools.length > 0) {
     const tools = document.createElement('div');
     tools.className = 'ext-tools';
     tools.textContent = 'Tools: ' + ext.tools.join(', ');
@@ -2179,15 +2179,15 @@ function activateExtension(name) {
         // Even on success, the tool may need OAuth (e.g., WASM loaded but no token yet)
         if (res.auth_url) {
           showToast('Opening authentication for ' + name, 'info');
-          window.open(res.auth_url, '_blank', 'width=600,height=700');
+          openOAuthUrl(res.auth_url);
         }
         loadExtensions();
         return;
       }
 
       if (res.auth_url) {
         showToast('Opening authentication for ' + name, 'info');
-        window.open(res.auth_url, '_blank');
+        openOAuthUrl(res.auth_url);
       } else if (res.awaiting_token) {
         showConfigureModal(name);
       } else {
@@ -2329,20 +2329,21 @@ function submitConfigureModal(name, fields) {
     body: { secrets },
   })
     .then((res) => {
-      closeConfigureModal();
       if (res.success) {
+        closeConfigureModal();
         if (res.auth_url) {
           // OAuth flow started — open consent popup. The auth_completed SSE will
           // not arrive immediately (it fires after OAuth callback), so show a toast now.
           showToast('Opening OAuth authorization for ' + name, 'info');
-          window.open(res.auth_url, '_blank', 'width=600,height=700');
+          openOAuthUrl(res.auth_url);
           loadExtensions();
         }
         // For non-OAuth success: the server always broadcasts auth_completed SSE,
         // which will show the toast and refresh extensions — no need to do it here too.
       } else {
+        // Keep modal open so the user can correct their input and retry.
+        btns.forEach(function(b) { b.disabled = false; });
         showToast(res.message || 'Configuration failed', 'error');
-        loadExtensions();
       }
     })
     .catch((err) => {
@@ -2356,6 +2357,25 @@ function closeConfigureModal() {
   if (existing) existing.remove();
 }
 
+// Validate that a server-supplied OAuth URL is HTTPS before opening a popup.
+// Rejects javascript:, data:, and other non-HTTPS schemes to prevent URL-injection.
+// Uses the URL constructor to safely parse and validate the scheme, which also
+// handles non-string values (objects, null, etc.) that would throw on .startsWith().
+function openOAuthUrl(url) {
+  let parsed;
+  try {
+    parsed = new URL(url);
+    if (parsed.protocol !== 'https:') {
+      throw new Error('non-HTTPS protocol: ' + parsed.protocol);
+    }
+  } catch (e) {
+    console.warn('Blocked invalid/non-HTTPS OAuth URL:', url, e.message);
+    showToast('Invalid OAuth URL returned by server', 'error');
+    return;
+  }
+  window.open(parsed.href, '_blank', 'width=600,height=700');
+}
+
 // --- Pairing ---
 
 function loadPairingRequests(channel, container) {
diff --git a/tests/e2e/README.md b/tests/e2e/README.md
@@ -52,10 +52,117 @@ Then Playwright drives a headless Chromium browser against the gateway, making D
 | `test_connection.py` | Auth, tab navigation, connection status |
 | `test_chat.py` | Send message, SSE streaming, response rendering |
 | `test_skills.py` | ClawHub search, skill install/remove |
+| `test_tool_approval.py` | Tool approval overlay (approve, deny, always, params toggle) |
+| `test_sse_reconnect.py` | SSE reconnection handling |
+| `test_html_injection.py` | HTML injection security |
+| `test_extensions.py` | Extensions tab: install, remove, configure, OAuth, auth card, activate |
 
 ## Adding new scenarios
 
 1. Create `tests/e2e/scenarios/test_<name>.py`
 2. Use the `page` fixture for a fresh browser page
 3. Use selectors from `helpers.py` (update `SEL` dict if new elements are needed)
 4. Keep tests deterministic -- use the mock LLM, not real providers
+
+## Mocking API responses with `page.route()`
+
+For tabs that depend on external data (extensions, jobs, memory, routines), use
+Playwright's `page.route()` to intercept the browser's HTTP requests to the
+ironclaw gateway and return deterministic fixture JSON. This avoids needing
+real installed binaries, live external services, or complex database setup.
+
+### Basic pattern
+
+```python
+import json
+
+async def test_something(page):
+    # 1. Set up route intercepts BEFORE navigation triggers the fetch
+    # Always use async def handlers — route.fulfill() is a coroutine and must be awaited.
+    async def handle_tools(route):
+        await route.fulfill(
+            status=200,
+            content_type="application/json",
+            body=json.dumps({"tools": [{"name": "echo", "description": "Echo"}]}),
+        )
+
+    await page.route("**/api/extensions/tools", handle_tools)
+
+    # 2. Navigate / interact to trigger the fetch
+    await page.locator('.tab-bar button[data-tab="extensions"]').click()
+
+    # 3. Assert on the rendered DOM
+    rows = page.locator("#tools-tbody tr")
+    assert await rows.count() == 1
+```
+
+### Matching only the exact path
+
+`**/api/extensions` matches `http://host/api/extensions` but NOT sub-paths
+like `http://host/api/extensions/install`. For the bare list endpoint, add
+a check inside the handler:
+
+```python
+async def handle_ext_list(route):
+    path = route.request.url.split("?")[0]
+    if path.endswith("/api/extensions"):
+        await route.fulfill(json={"extensions": []})
+    else:
+        await route.continue_()   # Let sub-paths through to the real server
+
+await page.route("**/api/extensions*", handle_ext_list)
+```
+
+### Mocking method-specific behaviour (GET vs POST)
+
+```python
+async def handle_setup(route):
+    if route.request.method == "GET":
+        await route.fulfill(json={"secrets": [...]})
+    else:  # POST
+        await route.fulfill(json={"success": True})
+
+await page.route("**/api/extensions/my-ext/setup", handle_setup)
+```
+
+### Counting calls (for reload tests)
+
+```python
+calls = []
+
+async def counting_handler(route):
+    calls.append(1)
+    await route.fulfill(json={"extensions": []})
+
+await page.route("**/api/extensions", counting_handler)
+# ... interact ...
+assert len(calls) == 2   # called twice (initial + after some action)
+```
+
+### Applying the pattern to other tabs
+
+| Tab | Key API endpoints to mock |
+|-----|--------------------------|
+| **Jobs** | `/api/jobs`, `/api/jobs/{id}`, `/api/jobs/{id}/events` |
+| **Memory** | `/api/memory/search`, `/api/memory/tree`, `/api/memory/read` |
+| **Routines** | `/api/routines`, `/api/routines/{id}/runs` |
+
+### Injecting state directly via `page.evaluate()`
+
+For purely client-side UI (components rendered entirely in JS without API calls),
+call the JavaScript function directly to skip the network layer entirely:
+
+```python
+# Show an approval card without needing a real tool execution
+await page.evaluate("""
+    showApproval({
+        request_id: 'test-001',
+        thread_id: currentThreadId,
+        tool_name: 'shell',
+        description: 'Run something',
+    })
+""")
+```
+
+This is the pattern used in `test_tool_approval.py` and parts of
+`test_extensions.py` (auth card, configure modal).
diff --git a/tests/e2e/helpers.py b/tests/e2e/helpers.py
@@ -43,6 +43,58 @@
     "approval_always_btn": ".approval-actions button.always",
     "approval_deny_btn": ".approval-actions button.deny",
     "approval_resolved": ".approval-resolved",
+    # Extensions tab – sections
+    "extensions_list":          "#extensions-list",
+    "available_wasm_list":      "#available-wasm-list",
+    "mcp_servers_list":         "#mcp-servers-list",
+    "tools_tbody":              "#tools-tbody",
+    "tools_empty":              "#tools-empty",
+    # Extensions tab – cards
+    "ext_card_installed":       "#extensions-list .ext-card",
+    "ext_card_available":       "#available-wasm-list .ext-card.ext-available",
+    "ext_card_mcp":             "#mcp-servers-list .ext-card",
+    "ext_name":                 ".ext-name",
+    "ext_kind":                 ".ext-kind",
+    "ext_auth_dot":             ".ext-auth-dot",
+    "ext_auth_dot_authed":      ".ext-auth-dot.authed",
+    "ext_auth_dot_unauthed":    ".ext-auth-dot.unauthed",
+    "ext_active_label":         ".ext-active-label",
+    "ext_pairing_label":        ".ext-pairing-label",
+    "ext_error":                ".ext-error",
+    "ext_tools":                ".ext-tools",
+    # Extensions tab – action buttons
+    "ext_install_btn":          ".btn-ext.install",
+    "ext_remove_btn":           ".btn-ext.remove",
+    "ext_activate_btn":         ".btn-ext.activate",
+    "ext_configure_btn":        ".btn-ext.configure",
+    # Configure modal
+    "configure_overlay":        ".configure-overlay",
+    "configure_modal":          ".configure-modal",
+    "configure_field":          ".configure-field",
+    "configure_input":          ".configure-modal input[type='password']",
+    "configure_save_btn":       ".configure-actions button.btn-ext.activate",
+    "configure_cancel_btn":     ".configure-actions button.btn-ext.remove",
+    "field_provided":           ".field-provided",
+    "field_autogen":            ".field-autogen",
+    "field_optional":           ".field-optional",
+    # Auth card (SSE-triggered, injected into chat-messages)
+    "auth_card":                ".auth-card",
+    "auth_header":              ".auth-header",
+    "auth_instructions":        ".auth-instructions",
+    "auth_oauth_btn":           ".auth-oauth",
+    "auth_token_input":         ".auth-token-input input",
+    "auth_submit_btn":          ".auth-submit",
+    "auth_cancel_btn":          ".auth-cancel",
+    "auth_error":               ".auth-error",
+    # WASM channel progress stepper
+    "ext_stepper":              ".ext-stepper",
+    "stepper_step":             ".stepper-step",
+    "stepper_circle":           ".stepper-circle",
+    # Toast notifications
+    "toast":                    ".toast",
+    "toast_success":            ".toast.toast-success",
+    "toast_error":              ".toast.toast-error",
+    "toast_info":               ".toast.toast-info",
 }
 
 TABS = ["chat", "memory", "jobs", "routines", "extensions", "skills"]
diff --git a/tests/e2e/scenarios/test_extensions.py b/tests/e2e/scenarios/test_extensions.py