Skip to content

Playwright e2e: caching MITM proxy with cross-shard merge#5578

Draft
alisman wants to merge 2 commits into
cBioPortal:masterfrom
alisman:playwright-response-cache-proxy
Draft

Playwright e2e: caching MITM proxy with cross-shard merge#5578
alisman wants to merge 2 commits into
cBioPortal:masterfrom
alisman:playwright-response-cache-proxy

Conversation

@alisman

@alisman alisman commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds an opt-in caching forward proxy in front of the Playwright suite so repeat requests to *.cbioportal.org are served from a SQLite-backed cache instead of hitting the live portal. Each CircleCI shard contributes to its own cache file; a new merge_playwright_cache job unions all 12 and save_caches the result, so the next workflow run starts hot with the union of every previous response.

  • L1 cache: in-process dict inside mitmdump, populated on every response.
  • L2 cache: SQLite file at PW_CACHE_DB. Per-shard during the run; merged at the end.
  • Cross-workflow: save_cache/restore_cache keyed by branch with a master fallback.
  • Cross-shard (within one workflow): a new merge_playwright_cache job INSERT OR IGNOREs all 12 shard caches into one before save_cache.

Wired through PW_CACHE_PROXY=1 env var on remote_e2e_shards; off by default in local dev.

Why

We're hammering www.cbioportal.org from CI on every PR — both for the test backend and for the JS bundle / assets / API. Most of those requests are deterministic and repeat dozens of times per shard. Caching them locally (a) cuts wall time on warm runs, (b) reduces load on the public portal, (c) sets us up for fully hermetic e2e if we ever want it.

What this PR is not

  • Not a replacement for the frontend's HTTP cache — this is upstream of the browser, transparent to test code.
  • Not yet doing intelligent cache invalidation; entries live forever in SQLite. If the portal's behaviour for a URL changes, bust the cache key (playwright-cache-v1-...v2-...) in .circleci/config.yml.
  • Not enabled for e2e_localdb_shards (the local-db tests hit localhost:8080, which is outside the allowlist).

How to verify locally

# One-time: build the overlay image (canonical CI image gets mitmproxy
# baked in by the Dockerfile change in this PR, but the GHCR rebuild is
# async — overlay lets you iterate without waiting).
cd end-to-end-test-playwright
docker build --platform linux/amd64 \
  -f proxy/Dockerfile.overlay \
  -t cbioportal-frontend-playwright-ci:cache-local \
  proxy

# Then:
LOCALDEV=0 PW_CACHE_PROXY=1 \
  PW_CACHE_IMAGE=cbioportal-frontend-playwright-ci:cache-local \
  ./scripts/docker-test.sh tests/config.spec.ts

Look for the [cache] summary: N L1 hits / M L2 hits / P misses ... line at the end.

What to watch on CircleCI

  • First workflow run: cold cache. restore_cache step has no match; shards run normally and populate SQLite from scratch. merge_playwright_cache runs serially after shards, unions them, and save_caches the result.
  • Second workflow run (same branch): warm cache. Shards see [cache] N L2 hits and a much smaller M misses. Wall time should drop noticeably on screenshot-heavy shards.

Test plan

  • Verify the new merge_playwright_cache job runs after shards and produces a cache.sqlite artifact.
  • Re-run the workflow on the PR; confirm restore_cache finds the previously-saved cache and the shards report L2 hits.
  • Compare wall time on the second run vs the first.
  • Check that an intentionally bad cached response would not break the test (e.g., by busting the version key) — sanity check the eviction path.
  • Inspect the cache.sqlite artifact size; if it's growing unboundedly, add a per-entry size cap.

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

…xy + SQLite

Adds an opt-in caching MITM proxy in front of the Playwright suite that
serves repeat requests from a per-shard in-memory dict (L1) backed by a
shared SQLite file (L2). A new merge_playwright_cache CircleCI job
unions every shard's contributions and save_cache's the result, so the
next workflow run starts hot with the union of all previous responses.

Disabled in local dev by default (PW_CACHE_PROXY=0); turned on
unconditionally in the remote_e2e_shards job. The Run Playwright shard
step intentionally exits 0 either way, so the per-shard cache file is
staged + persist_to_workspace'd regardless of test pass/fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 15, 2026 14:50
@netlify

netlify Bot commented May 15, 2026

Copy link
Copy Markdown

Deploy Preview for cbioportalfrontend ready!

Name Link
🔨 Latest commit fface8f
🔍 Latest deploy log https://app.netlify.com/projects/cbioportalfrontend/deploys/6a07382c2ccbd700089233ba
😎 Deploy Preview https://deploy-preview-5578.preview.cbioportal.org
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an opt-in caching MITM forward proxy for the Playwright E2E suite, backed by SQLite, and wires CircleCI to restore/save the cache across workflow runs while merging shard caches into a union cache at the end of green runs.

Changes:

  • Add a mitmproxy addon (cache_addon.py) and runner wrapper (run-with-cache-proxy.sh) to cache allowlisted responses (L1 in-memory + optional L2 SQLite).
  • Update Playwright configuration to route Chromium through the proxy and relax TLS checks when proxying.
  • Extend CircleCI and the Playwright CI image to install mitmproxy, restore prior caches, stage per-shard caches, merge them, and save the merged cache for the next run.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
end-to-end-test-playwright/scripts/run-with-cache-proxy.sh Starts/stops mitmdump and runs Playwright with proxy env vars set.
end-to-end-test-playwright/scripts/docker-test.sh Routes local dockerized runs through the proxy wrapper when enabled.
end-to-end-test-playwright/proxy/merge_cache.py Merges shard SQLite caches into a single union DB.
end-to-end-test-playwright/proxy/Dockerfile.overlay Local-only overlay image to add mitmproxy before the canonical CI image rebuilds.
end-to-end-test-playwright/proxy/cache_addon.py mitmproxy addon implementing L1/L2 caching behavior and SQLite storage.
end-to-end-test-playwright/proxy/.gitignore Ignores Python bytecode artifacts for the new proxy scripts.
end-to-end-test-playwright/playwright.config.ts Adds proxy wiring and TLS relaxation when a proxy is detected.
.circleci/images/playwright/Dockerfile Installs mitmproxy (via pip) into the Playwright CI image.
.circleci/config.yml Restores/saves the cache, stages per-shard DBs, adds a merge job, and enables proxying in shards.

Comment on lines 88 to +95
-e CBIOPORTAL_URL="${CBIOPORTAL_URL:-https://www.cbioportal.org}" \
-e CI="${CI:-}" \
-e LOCALDEV="${LOCALDEV}" \
-e PW_LOCAL="${PW_LOCAL:-}" \
-e PW_CACHE_PROXY="${PW_CACHE_PROXY}" \
-e PW_CACHE_HOSTS="${PW_CACHE_HOSTS:-}" \
-e PW_CACHE_STATUSES="${PW_CACHE_STATUSES:-}" \
-e PW_CACHE_LOG="${PW_CACHE_LOG:-}" \
Comment on lines +135 to +142
# check_same_thread=False because mitmproxy may dispatch
# hooks from worker threads in some flow types. isolation
# level=None puts SQLite in autocommit so each INSERT is
# durable without an explicit commit() — cheap insurance
# against losing entries if mitmdump is force-killed.
self._db = sqlite3.connect(
DB_PATH, check_same_thread=False, isolation_level=None
)
Comment on lines +160 to +164
row = self._db.execute(
"SELECT status, headers, body FROM cache WHERE key = ?", (key,)
).fetchone()
if row is None:
return None
Comment on lines +44 to +51
// When scripts/run-with-cache-proxy.sh is in play, HTTPS_PROXY points
// at a local mitmdump that caches *.cbioportal.org responses for the
// duration of a single test run. Routing Playwright's browser through
// it requires (a) the proxy server setting and (b) accepting the
// proxy's self-signed CA — easier than installing the CA into Chromium.
const proxyServer = process.env.HTTPS_PROXY || process.env.HTTP_PROXY;
const proxy = proxyServer ? { server: proxyServer } : undefined;
if (proxy) {
Comment on lines +67 to +70
rows = src.execute(
"SELECT key, status, headers, body FROM cache"
).fetchall()
except sqlite3.Error as exc:
Comment on lines +30 to +35
# release (11.x at time of writing) has a working request/response hook
# pipeline. DEBIAN_FRONTEND=noninteractive prevents tzdata (pulled in
# transitively by python3-pip) from blocking on its interactive prompt.
RUN DEBIAN_FRONTEND=noninteractive apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends jq python3-pip \
&& pip3 install --no-cache-dir 'mitmproxy>=11,<12' \
Comment thread .circleci/config.yml
Comment on lines +495 to +502
- run:
name: Stage shard cache for workspace
command: |
mkdir -p /tmp/pw-cache-shards
if [ -f /tmp/pw-cache/cache.sqlite ]; then
cp /tmp/pw-cache/cache.sqlite \
"/tmp/pw-cache-shards/cache-${CIRCLE_NODE_INDEX}.sqlite"
ls -la /tmp/pw-cache-shards/
The image-rebuild workflow only fires on pushes to the main repo, so a
PR from a fork can't update the canonical GHCR image with the new
mitmproxy install. The wrapper now bootstraps mitmproxy at runtime if
it's not already on PATH, so the cache wiring is exercisable on the
first CI run; once the merged Dockerfile change ships, the block is a
no-op (mitmdump is already there).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alisman alisman marked this pull request as draft May 15, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants