Playwright e2e: caching MITM proxy with cross-shard merge#5578
Draft
alisman wants to merge 2 commits into
Draft
Conversation
…xy + SQLite Adds an opt-in caching MITM proxy in front of the Playwright suite that serves repeat requests from a per-shard in-memory dict (L1) backed by a shared SQLite file (L2). A new merge_playwright_cache CircleCI job unions every shard's contributions and save_cache's the result, so the next workflow run starts hot with the union of all previous responses. Disabled in local dev by default (PW_CACHE_PROXY=0); turned on unconditionally in the remote_e2e_shards job. The Run Playwright shard step intentionally exits 0 either way, so the per-shard cache file is staged + persist_to_workspace'd regardless of test pass/fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for cbioportalfrontend ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces an opt-in caching MITM forward proxy for the Playwright E2E suite, backed by SQLite, and wires CircleCI to restore/save the cache across workflow runs while merging shard caches into a union cache at the end of green runs.
Changes:
- Add a mitmproxy addon (
cache_addon.py) and runner wrapper (run-with-cache-proxy.sh) to cache allowlisted responses (L1 in-memory + optional L2 SQLite). - Update Playwright configuration to route Chromium through the proxy and relax TLS checks when proxying.
- Extend CircleCI and the Playwright CI image to install mitmproxy, restore prior caches, stage per-shard caches, merge them, and save the merged cache for the next run.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| end-to-end-test-playwright/scripts/run-with-cache-proxy.sh | Starts/stops mitmdump and runs Playwright with proxy env vars set. |
| end-to-end-test-playwright/scripts/docker-test.sh | Routes local dockerized runs through the proxy wrapper when enabled. |
| end-to-end-test-playwright/proxy/merge_cache.py | Merges shard SQLite caches into a single union DB. |
| end-to-end-test-playwright/proxy/Dockerfile.overlay | Local-only overlay image to add mitmproxy before the canonical CI image rebuilds. |
| end-to-end-test-playwright/proxy/cache_addon.py | mitmproxy addon implementing L1/L2 caching behavior and SQLite storage. |
| end-to-end-test-playwright/proxy/.gitignore | Ignores Python bytecode artifacts for the new proxy scripts. |
| end-to-end-test-playwright/playwright.config.ts | Adds proxy wiring and TLS relaxation when a proxy is detected. |
| .circleci/images/playwright/Dockerfile | Installs mitmproxy (via pip) into the Playwright CI image. |
| .circleci/config.yml | Restores/saves the cache, stages per-shard DBs, adds a merge job, and enables proxying in shards. |
Comment on lines
88
to
+95
| -e CBIOPORTAL_URL="${CBIOPORTAL_URL:-https://www.cbioportal.org}" \ | ||
| -e CI="${CI:-}" \ | ||
| -e LOCALDEV="${LOCALDEV}" \ | ||
| -e PW_LOCAL="${PW_LOCAL:-}" \ | ||
| -e PW_CACHE_PROXY="${PW_CACHE_PROXY}" \ | ||
| -e PW_CACHE_HOSTS="${PW_CACHE_HOSTS:-}" \ | ||
| -e PW_CACHE_STATUSES="${PW_CACHE_STATUSES:-}" \ | ||
| -e PW_CACHE_LOG="${PW_CACHE_LOG:-}" \ |
Comment on lines
+135
to
+142
| # check_same_thread=False because mitmproxy may dispatch | ||
| # hooks from worker threads in some flow types. isolation | ||
| # level=None puts SQLite in autocommit so each INSERT is | ||
| # durable without an explicit commit() — cheap insurance | ||
| # against losing entries if mitmdump is force-killed. | ||
| self._db = sqlite3.connect( | ||
| DB_PATH, check_same_thread=False, isolation_level=None | ||
| ) |
Comment on lines
+160
to
+164
| row = self._db.execute( | ||
| "SELECT status, headers, body FROM cache WHERE key = ?", (key,) | ||
| ).fetchone() | ||
| if row is None: | ||
| return None |
Comment on lines
+44
to
+51
| // When scripts/run-with-cache-proxy.sh is in play, HTTPS_PROXY points | ||
| // at a local mitmdump that caches *.cbioportal.org responses for the | ||
| // duration of a single test run. Routing Playwright's browser through | ||
| // it requires (a) the proxy server setting and (b) accepting the | ||
| // proxy's self-signed CA — easier than installing the CA into Chromium. | ||
| const proxyServer = process.env.HTTPS_PROXY || process.env.HTTP_PROXY; | ||
| const proxy = proxyServer ? { server: proxyServer } : undefined; | ||
| if (proxy) { |
Comment on lines
+67
to
+70
| rows = src.execute( | ||
| "SELECT key, status, headers, body FROM cache" | ||
| ).fetchall() | ||
| except sqlite3.Error as exc: |
Comment on lines
+30
to
+35
| # release (11.x at time of writing) has a working request/response hook | ||
| # pipeline. DEBIAN_FRONTEND=noninteractive prevents tzdata (pulled in | ||
| # transitively by python3-pip) from blocking on its interactive prompt. | ||
| RUN DEBIAN_FRONTEND=noninteractive apt-get update \ | ||
| && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends jq python3-pip \ | ||
| && pip3 install --no-cache-dir 'mitmproxy>=11,<12' \ |
Comment on lines
+495
to
+502
| - run: | ||
| name: Stage shard cache for workspace | ||
| command: | | ||
| mkdir -p /tmp/pw-cache-shards | ||
| if [ -f /tmp/pw-cache/cache.sqlite ]; then | ||
| cp /tmp/pw-cache/cache.sqlite \ | ||
| "/tmp/pw-cache-shards/cache-${CIRCLE_NODE_INDEX}.sqlite" | ||
| ls -la /tmp/pw-cache-shards/ |
The image-rebuild workflow only fires on pushes to the main repo, so a PR from a fork can't update the canonical GHCR image with the new mitmproxy install. The wrapper now bootstraps mitmproxy at runtime if it's not already on PATH, so the cache wiring is exercisable on the first CI run; once the merged Dockerfile change ships, the block is a no-op (mitmdump is already there). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in caching forward proxy in front of the Playwright suite so repeat requests to
*.cbioportal.orgare served from a SQLite-backed cache instead of hitting the live portal. Each CircleCI shard contributes to its own cache file; a newmerge_playwright_cachejob unions all 12 andsave_caches the result, so the next workflow run starts hot with the union of every previous response.mitmdump, populated on every response.PW_CACHE_DB. Per-shard during the run; merged at the end.save_cache/restore_cachekeyed by branch with a master fallback.merge_playwright_cachejobINSERT OR IGNOREs all 12 shard caches into one beforesave_cache.Wired through
PW_CACHE_PROXY=1env var onremote_e2e_shards; off by default in local dev.Why
We're hammering
www.cbioportal.orgfrom CI on every PR — both for the test backend and for the JS bundle / assets / API. Most of those requests are deterministic and repeat dozens of times per shard. Caching them locally (a) cuts wall time on warm runs, (b) reduces load on the public portal, (c) sets us up for fully hermetic e2e if we ever want it.What this PR is not
playwright-cache-v1-...→v2-...) in.circleci/config.yml.e2e_localdb_shards(the local-db tests hitlocalhost:8080, which is outside the allowlist).How to verify locally
Look for the
[cache] summary: N L1 hits / M L2 hits / P misses ...line at the end.What to watch on CircleCI
restore_cachestep has no match; shards run normally and populate SQLite from scratch.merge_playwright_cacheruns serially after shards, unions them, andsave_caches the result.[cache] N L2 hitsand a much smallerM misses. Wall time should drop noticeably on screenshot-heavy shards.Test plan
merge_playwright_cachejob runs after shards and produces acache.sqliteartifact.restore_cachefinds the previously-saved cache and the shards report L2 hits.cache.sqliteartifact size; if it's growing unboundedly, add a per-entry size cap.🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.