-
Notifications
You must be signed in to change notification settings - Fork 5
Description
⚠️ Warning⚠️ This issue was written by an AI Agent
I @samjewell only supervised it, and asked it questions.
Context
I've been analyzing CI performance for the grafana-cube-datasource plugin and found several opportunities to speed things up. Faster CI runs would significantly improve my development workflow, allowing for quicker feedback loops and more efficient iteration.
After analyzing a recent CI run (PR https://github.com/grafana/grafana-cube-datasource/pull/109), I discovered optimization opportunities across the entire CI pipeline. I wanted to share these findings with the team so we can improve CI performance for all plugins using these workflows.
Overall CI Performance Summary
Total CI time: 7.4 minutes (443 seconds)
Key Findings
-
Critical path: "Test and build plugin" job takes 209s (47% of total)
- Backend build: 110s (52.6% of build job)
- Frontend build: 40s (19.1% of build job)
-
Playwright tests: 4 parallel jobs (~3 min each), wall time ~3.6 min
- Good parallelization (saves ~9 minutes vs sequential)
Build & Test Optimization Opportunities
These optimizations relate to the main CI workflow (.github/workflows/ci.yml):
1. Enable Go Module Caching (HIGH IMPACT - saves ~30-50s)
Problem: Go modules are downloaded every run, even when dependencies haven't changed
Current State: The CI workflow supports go-setup-caching input, but it's not enabled by default
Proposed Solution:
Enable Go module caching by default in the CI workflow, or at minimum document it better so plugin maintainers can easily enable it:
# In plugin-ci-workflows/.github/workflows/ci.yml
# The go-setup-caching input should be enabled by default or better documentedFor plugin maintainers: They can enable it by adding to their workflow:
with:
go-setup-caching: trueExpected Savings: 30-50 seconds per CI run
Location: .github/workflows/ci.yml - Setup step (line ~365-376)
2. Backend Build Optimization (MEDIUM IMPACT - potential savings ~20-40s)
Problem: Backend build takes 110s (52.6% of the build job)
Current State: Go tests and compilation run sequentially
Potential Optimizations:
- Review if all Go tests need to run on every PR
- Consider test parallelization if not already enabled
- Check if there are slow integration tests that could be moved to separate job
- Better Go module caching (covered above)
Expected Savings: 20-40 seconds (depending on test suite)
Location: .github/workflows/ci.yml - Backend test/build step
4. Frontend Build Optimization (LOW IMPACT - potential savings ~10-20s)
Problem: Frontend build takes 40s (19.1% of build job)
Current State: The workflow should already cache npm dependencies
Potential Optimizations:
- Verify
node_modulescaching is working properly - Consider if all frontend tests need to run on every PR
- Optimize webpack/build configuration if possible
Expected Savings: 10-20 seconds
Location: .github/workflows/ci.yml - Frontend test/build step
Playwright E2E Test Optimization Opportunities
Per-job wall time: ~184s (3.1 minutes)
Each Playwright test job currently takes approximately 3 minutes, with the following breakdown:
| Step | Duration | % of Total | Notes |
|---|---|---|---|
| Start Grafana | 74s | 40.2% | |
| Install Playwright Browsers | 31s | 16.8% | Should be cached but might not be working |
| Install npm dependencies | 30s | 16.3% | Reasonable if cache is working |
| Run Playwright tests | 19s | 10.3% | Actual test execution |
| Wait for Grafana to start | 8s | 4.3% | Health check wait |
| Other setup steps | 22s | 12.0% | Checkout, cache checks, etc. |
1. Docker Image Caching (HIGH IMPACT - saves ~50-60s per job)
Problem: Grafana Docker images are being pulled every run (74s - 40% of total time)
The "Start Grafana" step runs:
docker compose ${DOCKER_COMPOSE_FILE:+-f "$DOCKER_COMPOSE_FILE"} up -dThis pulls the Grafana Docker image from the registry every time, even if it hasn't changed.
Proposed Solutions:
-
Enable Docker BuildKit cache mounts (recommended):
env: DOCKER_BUILDKIT: 1 COMPOSE_DOCKER_CLI_BUILD: 1
-
Pre-pull images with caching:
Add a step before "Start Grafana" to pull images with Docker's layer caching:- name: Pull Grafana image run: | docker pull ${GRAFANA_IMAGE}:${GRAFANA_VERSION} || true env: GRAFANA_IMAGE: ${{ matrix.GRAFANA_IMAGE.NAME }} GRAFANA_VERSION: ${{ matrix.GRAFANA_IMAGE.VERSION }}
-
Use Docker registry cache/proxy: Configure a local registry cache to speed up pulls
Expected Savings: 50-60 seconds per job
Location: .github/workflows/playwright.yml - "Start Grafana" step (line ~215)
2. Playwright Browser Cache (MEDIUM IMPACT - saves ~20-25s per job)
Problem: Playwright browsers are being installed even though cache is configured (31s - 17% of time)
The workflow currently has:
- name: Cache Playwright
uses: actions/cache@...
with:
path: ~/.cache/ms-playwright
key: playwright-${{ steps.version.outputs.version }}
- name: Install Playwright Browsers
run: npx playwright install --with-deps chromiumHowever, browsers are still being installed every run, suggesting the cache might not be restoring properly.
Proposed Solutions:
-
Add conditional installation:
- name: Install Playwright Browsers if: steps.cache.outputs.cache-hit != 'true' run: npx playwright install --with-deps chromium
-
Verify cache key stability: Ensure the Playwright version detection is stable
-
Add cache hit logging: Add debug output to verify cache is working:
- name: Check cache status run: echo "Cache hit: ${{ steps.cache.outputs.cache-hit }}"
Expected Savings: 20-25 seconds per job (when cache hits)
Location: .github/workflows/playwright.yml - "Cache Playwright" and "Install Playwright Browsers" steps (lines ~167-175)
3. npm Dependencies in Playwright (LOW IMPACT - saves ~10-15s per job)
Problem: npm install takes 30s
Proposed Solutions:
-
Verify npm cache is working: Ensure
actions/setup-nodecache is properly configured -
Use
npm ci --prefer-offline: If cache exists, prefer offline mode:- name: Install npm dependencies run: npm ci --prefer-offline || npm ci
-
Ensure setup-node cache is enabled: Verify the Node.js setup step has caching enabled
Expected Savings: 10-15 seconds per job
Location: .github/workflows/playwright.yml - npm install step
Total Potential Impact
Build & Test Optimizations:
- Go module caching: ~30-50s saved
- npm caching for lockfile check: ~20-30s saved (plugin-specific)
- Backend build optimization: ~20-40s saved (potential)
- Frontend build optimization: ~10-20s saved (potential)
Total build/test savings: ~80-140 seconds per CI run
Playwright E2E Optimizations:
- With Docker caching: ~50-60s saved per job
- With Playwright cache fix: ~20-25s saved per job
- With npm optimization: ~10-15s saved per job
Total Playwright savings: ~80-100 seconds per job (43-54% reduction)
For a typical plugin with 4 parallel Playwright jobs: This would reduce wall time from ~184s to ~84-104s per job
Combined Impact
Total potential savings: ~160-240 seconds per CI run (2.7-4 minutes)
This would reduce total CI time from 7.4 minutes to approximately 4-5 minutes - a significant improvement for developer productivity!
Recommendations Priority
High Priority:
- Enable Go module caching by default - Easy win, saves 30-50s
- Fix Docker image caching in Playwright - Biggest single win (saves 50-60s per job)
Medium Priority:
- Verify/fix Playwright browser cache - Saves 20-25s per job
- Document npm caching for lockfile checks - Helps plugin maintainers save 20-30s
Low Priority:
- Optimize npm dependency installation in Playwright - Saves 10-15s per job
- Backend/Frontend build optimizations - Requires more investigation
Additional Context
This analysis was done by examining the CI run for grafana-cube-datasource PR #109, which had a total CI time of 7.4 minutes. These optimizations would benefit all plugins using these workflows.
I'm happy to help implement these changes or provide more details if needed!
Metadata
Metadata
Assignees
Labels
Type
Projects
Status