Skip to content

Optimize Playwright E2E test execution time - Docker caching and browser cache improvements #405

@samjewell

Description

@samjewell

⚠️Warning⚠️ This issue was written by an AI Agent

I @samjewell only supervised it, and asked it questions.

Context

I've been analyzing CI performance for the grafana-cube-datasource plugin and found several opportunities to speed things up. Faster CI runs would significantly improve my development workflow, allowing for quicker feedback loops and more efficient iteration.

After analyzing a recent CI run (PR https://github.com/grafana/grafana-cube-datasource/pull/109), I discovered optimization opportunities across the entire CI pipeline. I wanted to share these findings with the team so we can improve CI performance for all plugins using these workflows.

Overall CI Performance Summary

Total CI time: 7.4 minutes (443 seconds)

Key Findings

  1. Critical path: "Test and build plugin" job takes 209s (47% of total)

    • Backend build: 110s (52.6% of build job)
    • Frontend build: 40s (19.1% of build job)
  2. Playwright tests: 4 parallel jobs (~3 min each), wall time ~3.6 min

    • Good parallelization (saves ~9 minutes vs sequential)

Build & Test Optimization Opportunities

These optimizations relate to the main CI workflow (.github/workflows/ci.yml):

1. Enable Go Module Caching (HIGH IMPACT - saves ~30-50s)

Problem: Go modules are downloaded every run, even when dependencies haven't changed

Current State: The CI workflow supports go-setup-caching input, but it's not enabled by default

Proposed Solution:
Enable Go module caching by default in the CI workflow, or at minimum document it better so plugin maintainers can easily enable it:

# In plugin-ci-workflows/.github/workflows/ci.yml
# The go-setup-caching input should be enabled by default or better documented

For plugin maintainers: They can enable it by adding to their workflow:

with:
  go-setup-caching: true

Expected Savings: 30-50 seconds per CI run

Location: .github/workflows/ci.yml - Setup step (line ~365-376)

2. Backend Build Optimization (MEDIUM IMPACT - potential savings ~20-40s)

Problem: Backend build takes 110s (52.6% of the build job)

Current State: Go tests and compilation run sequentially

Potential Optimizations:

  • Review if all Go tests need to run on every PR
  • Consider test parallelization if not already enabled
  • Check if there are slow integration tests that could be moved to separate job
  • Better Go module caching (covered above)

Expected Savings: 20-40 seconds (depending on test suite)

Location: .github/workflows/ci.yml - Backend test/build step

4. Frontend Build Optimization (LOW IMPACT - potential savings ~10-20s)

Problem: Frontend build takes 40s (19.1% of build job)

Current State: The workflow should already cache npm dependencies

Potential Optimizations:

  • Verify node_modules caching is working properly
  • Consider if all frontend tests need to run on every PR
  • Optimize webpack/build configuration if possible

Expected Savings: 10-20 seconds

Location: .github/workflows/ci.yml - Frontend test/build step


Playwright E2E Test Optimization Opportunities

Per-job wall time: ~184s (3.1 minutes)

Each Playwright test job currently takes approximately 3 minutes, with the following breakdown:

Step Duration % of Total Notes
Start Grafana 74s 40.2% ⚠️ BIGGEST BOTTLENECK - Pulling Docker image
Install Playwright Browsers 31s 16.8% Should be cached but might not be working
Install npm dependencies 30s 16.3% Reasonable if cache is working
Run Playwright tests 19s 10.3% Actual test execution
Wait for Grafana to start 8s 4.3% Health check wait
Other setup steps 22s 12.0% Checkout, cache checks, etc.

1. Docker Image Caching (HIGH IMPACT - saves ~50-60s per job)

Problem: Grafana Docker images are being pulled every run (74s - 40% of total time)

The "Start Grafana" step runs:

docker compose ${DOCKER_COMPOSE_FILE:+-f "$DOCKER_COMPOSE_FILE"} up -d

This pulls the Grafana Docker image from the registry every time, even if it hasn't changed.

Proposed Solutions:

  1. Enable Docker BuildKit cache mounts (recommended):

    env:
      DOCKER_BUILDKIT: 1
      COMPOSE_DOCKER_CLI_BUILD: 1
  2. Pre-pull images with caching:
    Add a step before "Start Grafana" to pull images with Docker's layer caching:

    - name: Pull Grafana image
      run: |
        docker pull ${GRAFANA_IMAGE}:${GRAFANA_VERSION} || true
      env:
        GRAFANA_IMAGE: ${{ matrix.GRAFANA_IMAGE.NAME }}
        GRAFANA_VERSION: ${{ matrix.GRAFANA_IMAGE.VERSION }}
  3. Use Docker registry cache/proxy: Configure a local registry cache to speed up pulls

Expected Savings: 50-60 seconds per job

Location: .github/workflows/playwright.yml - "Start Grafana" step (line ~215)

2. Playwright Browser Cache (MEDIUM IMPACT - saves ~20-25s per job)

Problem: Playwright browsers are being installed even though cache is configured (31s - 17% of time)

The workflow currently has:

- name: Cache Playwright
  uses: actions/cache@...
  with:
    path: ~/.cache/ms-playwright
    key: playwright-${{ steps.version.outputs.version }}

- name: Install Playwright Browsers
  run: npx playwright install --with-deps chromium

However, browsers are still being installed every run, suggesting the cache might not be restoring properly.

Proposed Solutions:

  1. Add conditional installation:

    - name: Install Playwright Browsers
      if: steps.cache.outputs.cache-hit != 'true'
      run: npx playwright install --with-deps chromium
  2. Verify cache key stability: Ensure the Playwright version detection is stable

  3. Add cache hit logging: Add debug output to verify cache is working:

    - name: Check cache status
      run: echo "Cache hit: ${{ steps.cache.outputs.cache-hit }}"

Expected Savings: 20-25 seconds per job (when cache hits)

Location: .github/workflows/playwright.yml - "Cache Playwright" and "Install Playwright Browsers" steps (lines ~167-175)

3. npm Dependencies in Playwright (LOW IMPACT - saves ~10-15s per job)

Problem: npm install takes 30s

Proposed Solutions:

  1. Verify npm cache is working: Ensure actions/setup-node cache is properly configured

  2. Use npm ci --prefer-offline: If cache exists, prefer offline mode:

    - name: Install npm dependencies
      run: npm ci --prefer-offline || npm ci
  3. Ensure setup-node cache is enabled: Verify the Node.js setup step has caching enabled

Expected Savings: 10-15 seconds per job

Location: .github/workflows/playwright.yml - npm install step

Total Potential Impact

Build & Test Optimizations:

  • Go module caching: ~30-50s saved
  • npm caching for lockfile check: ~20-30s saved (plugin-specific)
  • Backend build optimization: ~20-40s saved (potential)
  • Frontend build optimization: ~10-20s saved (potential)

Total build/test savings: ~80-140 seconds per CI run

Playwright E2E Optimizations:

  • With Docker caching: ~50-60s saved per job
  • With Playwright cache fix: ~20-25s saved per job
  • With npm optimization: ~10-15s saved per job

Total Playwright savings: ~80-100 seconds per job (43-54% reduction)

For a typical plugin with 4 parallel Playwright jobs: This would reduce wall time from ~184s to ~84-104s per job

Combined Impact

Total potential savings: ~160-240 seconds per CI run (2.7-4 minutes)

This would reduce total CI time from 7.4 minutes to approximately 4-5 minutes - a significant improvement for developer productivity!

Recommendations Priority

High Priority:

  1. Enable Go module caching by default - Easy win, saves 30-50s
  2. Fix Docker image caching in Playwright - Biggest single win (saves 50-60s per job)

Medium Priority:

  1. Verify/fix Playwright browser cache - Saves 20-25s per job
  2. Document npm caching for lockfile checks - Helps plugin maintainers save 20-30s

Low Priority:

  1. Optimize npm dependency installation in Playwright - Saves 10-15s per job
  2. Backend/Frontend build optimizations - Requires more investigation

Additional Context

This analysis was done by examining the CI run for grafana-cube-datasource PR #109, which had a total CI time of 7.4 minutes. These optimizations would benefit all plugins using these workflows.

I'm happy to help implement these changes or provide more details if needed!

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

💡 Ideation

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions