Skip to content

[Research] CI Workflow Optimization #941

@EffortlessSteven

Description

@EffortlessSteven

CI Workflow Optimization Research

Research date: 2026-04-04
Baseline CI time: ~25 minutes (latest successful run: 23985278088)
Target CI time: ~6-9 minutes (70-80% reduction)

This issue consolidates research on CI workflow optimization opportunities for tokmd, building on the testing infrastructure improvements in #915. It analyzes the current state, identifies bottlenecks, and provides concrete recommendations with migration strategies.


Current State Analysis

CI Workflow Structure (.github/workflows/ci.yml)

15+ jobs running in parallel/concurrent:

Job Platform Duration* Purpose
MSRV Check ubuntu-latest ~1m Minimum Rust version compatibility
Build & Test ubuntu-latest ~9.5m Build & test all features
Build & Test windows-latest ~24.8m Build & test all features
Build & Test macos-latest ~9.3m Build & test all features (push only)
Feature Boundaries ubuntu-latest ~1.2m Feature flag boundary tests
Wasm Compile & Test ubuntu-latest ~2.9m WASM compilation and tests
Quality Gate ubuntu-latest ~6.7m Clippy + fmt checks
Cargo Deny ubuntu-latest ~15s Dependency security checks
Typos ubuntu-latest ~5s Typos linting
Proptest Smoke ubuntu-latest ~1.7m Property tests (reduced iterations)
Publish Plan ubuntu-latest ~35s Verify publish plan
Version consistency ubuntu-latest ~27s Check version alignment
Docs Check ubuntu-latest ~1m Documentation drift check
Nix PR Package Gate ubuntu-latest ~6.1m Nix flake verification
Mutation Testing ubuntu-latest N/A PR-scoped mutation tests

*Durations from latest successful run (2026-04-04)

Total CI time: ~25 minutes (limited by slowest job: Windows @ 24.8m)

Current Test Configuration

Test execution:

  • Command: cargo test --all-features --verbose
  • Execution: Sequential (no parallelization)
  • Platforms: Ubuntu, Windows, macOS, WASM32
  • Workspace: 67 crates, 928 integration test files

Caching strategy:

  • Rust cache: Swatinem/rust-cache@v2 (target directory caching)
  • Fuzz corpus: GitHub Actions cache (actions/cache@v5)
  • No test result caching

Key observations:

  1. Windows job is the bottleneck (24.8m vs 9.5m on Ubuntu) - 2.6x slower
  2. Tests run sequentially - no parallel test execution within jobs
  3. No test categorization - all tests run every time
  4. No coverage measurement - blind to untested code
  5. MacOS runs on push only - good optimization already in place

Optimization Opportunities

Priority 1: Adopt cargo-nextest (Highest Impact) ⚡

Expected impact: 60-70% CI time reduction
Estimated effort: 1-2 days
Risk: Low (well-maintained, drop-in replacement)

Current State

# Sequential test execution
- name: Run tests
  run: cargo test --all-features --verbose

Recommended Change

# Parallel test execution with nextest
- name: Install cargo-nextest
  uses: taiki-e/install-action@v2
  with:
    tool: cargo-nextest

- name: Run tests (nextest)
  run: cargo nextest run --workspace --all-features --verbose

Additional Optimization: Test Sharding

test-ubuntu-sharded:
  runs-on: ubuntu-latest
  strategy:
    matrix:
      shard: [1, 2, 3, 4]
  steps:
    - name: Run tests (shard ${{ matrix.shard }}/4)
      run: cargo nextest run --workspace --all-features --partition count:${{ matrix.shard }}/4

Estimated impact:

  • Ubuntu: ~9.5m → ~2.5m (-74%)
  • Windows: ~24.8m → ~6.5m (-74%)
  • macOS: ~9.3m → ~2.5m (-73%)

Total CI time: ~7 minutes (down from ~25m)


Priority 2: Optimize Caching Strategy (Medium Impact) 🗄️

Expected impact: 2-5 minutes saved per run
Estimated effort: 0.5-1 day
Risk: Low

Current State

# Only caches compiled artifacts
- uses: Swatinem/rust-cache@v2
  if: runner.os == 'Linux'
  with:
    cache-directories: ${{ runner.temp }}/target

Recommended Change

# Cache compiled artifacts + sccache for incremental compilation
- uses: Swatinem/rust-cache@v2
  if: runner.os == 'Linux'
  with:
    cache-directories: ${{ runner.temp }}/target
    cache-on-failure: true
    shared-key: v1-${{ runner.os }}

# Add sccache for cross-job compilation caching
- name: Configure sccache
  run: |
    echo "RUSTC_WRAPPER=sccache" >> "$GITHUB_ENV"
    echo "SCCACHE_CACHE_SIZE=2G" >> "$GITHUB_ENV"
    echo "SCCACHE_DIR=${RUNNER_TEMP}/sccache" >> "$GITHUB_ENV"

- uses: actions/cache@v5
  with:
    path: ${{ runner.temp }}/sccache
    key: sccache-${{ runner.os }}-${{ hashFiles('Cargo.lock') }}
    restore-keys: |
      sccache-${{ runner.os }}-

Estimated impact:

  • First run after cache miss: ~2m saved
  • Subsequent runs: ~3-5m saved (incremental compilation)

Priority 3: Smart Job Distribution (Medium Impact) 🎯

Expected impact: Faster PR feedback
Estimated effort: 1-2 days
Risk: Low

Current State

All jobs run on every PR, including slow platform tests.

Recommended Change

# Split into fast path (PR) and full verification (push)
ci-fast:
  # Runs on every PR - fast feedback
  if: github.event_name == 'pull_request'
  runs-on: ubuntu-latest
  steps:
    # Unit tests only (fast)
    - name: Run unit tests
      run: cargo nextest run --workspace --all-features --lib

ci-full:
  # Runs on push - full verification
  if: github.event_name == 'push'
  runs-on: ${{ matrix.os }}
  strategy:
    matrix:
      os: [ubuntu-latest, windows-latest, macos-latest]
  steps:
    # Full test suite
    - name: Run all tests
      run: cargo nextest run --workspace --all-features

Estimated impact:

  • PR feedback time: ~2.5m (Ubuntu unit tests only)
  • Push verification time: ~7m (full matrix)
  • Developer experience: Immediate feedback on PRs

Priority 4: Windows Job Optimization (Medium Impact) 🪟

Expected impact: Reduce Windows job from ~25m to ~6m
Estimated effort: 0.5 day
Risk: Low

Current State

build:
  runs-on: ${{ matrix.os }}
  strategy:
    matrix:
      os: [ubuntu-latest, windows-latest]
  steps:
    - name: Run tests
      run: cargo test --all-features --verbose

Recommended Change

build:
  runs-on: ${{ matrix.os }}
  strategy:
    matrix:
      os: [ubuntu-latest, windows-latest]
  steps:
    # Use persistent cache on Windows (slower runner)
    - uses: Swatinem/rust-cache@v2
      with:
        cache-on-failure: true
        prefix: ${{ matrix.os }}

    # Install nextest
    - uses: taiki-e/install-action@v2
      with:
        tool: cargo-nextest

    # Run tests in parallel
    - name: Run tests (nextest)
      run: cargo nextest run --workspace --all-features --verbose

    # Reduce debuginfo for faster compilation
    env:
      RUSTFLAGS: -C debuginfo=0

Estimated impact:

  • Windows job: ~24.8m → ~6.5m (-74%)

Priority 5: Test Categorization (Lower Priority) 🏷️

Expected impact: Smarter CI with unit/integration/slow tiers
Estimated effort: 3-5 days
Risk: Medium (requires test annotation)

Current State

All tests run in every job.

Recommended Change

// Mark slow or expensive tests
#[test]
#[ignore = "slow - run only in nightly CI"]
fn expensive_integration_test() { /* ... */ }

// Mark integration tests
#[tokio::test]
#[cfg_attr(test, ignore = "integration - run in dedicated job")]
async fn api_integration_test() { /* ... */ }
unit-tests:
  run: cargo nextest run --workspace --all-features --lib

integration-tests:
  run: cargo nextest run --workspace --all-features --test-threads=1

slow-tests:
  if: github.event_name == 'schedule' || contains(github.event.head_commit.message, '[run-slow]')
  run: cargo nextest run --workspace --all-features --run-ignored

Estimated impact:

  • PR feedback: ~2-3m (unit tests only)
  • Integration tests: ~4-6m (runs after unit tests pass)
  • Slow tests: ~10-15m (runs nightly or on-demand)

Implementation Priority Order

Phase 1: Quick Wins (Week 1) 🚀

Impact: 60-70% CI time reduction
Effort: 1-2 days
Risk: Low

  1. Adopt cargo-nextest

    • Replace cargo test with cargo nextest run in all test jobs
    • Add nextest configuration to .cargo/config.toml
    • Measure baseline test times
  2. Optimize Windows job

    • Add RUSTFLAGS: -C debuginfo=0 environment variable
    • Enable cache-on-failure for Windows runner
    • Test on fork/branch before merging

Expected outcome:

  • CI time: ~25m → ~7m (-72%)
  • Windows job: ~24.8m → ~6.5m (-74%)

Phase 2: Caching & Smart Jobs (Week 2) 🗄️

Impact: 2-5 minutes saved per run + faster PR feedback
Effort: 1-2 days
Risk: Low

  1. Improve caching strategy

    • Add sccache for cross-job compilation caching
    • Enable cache-on-failure for all runners
    • Add shared cache keys across jobs
  2. Split fast path (PR) and full verification (push)

    • Create ci-fast workflow for PRs (unit tests only)
    • Create ci-full workflow for pushes (full matrix)
    • Update branch protection rules

Expected outcome:

  • PR feedback time: ~2.5m (unit tests on Ubuntu)
  • Push verification time: ~7m (full matrix)
  • Cache hit improvement: 10-20% faster on subsequent runs

Phase 3: Test Categorization (Week 3-4) 🏷️

Impact: Smarter CI with unit/integration/slow tiers
Effort: 3-5 days
Risk: Medium

  1. Create shared test attributes module

    • Define #[slow], #[integration], #[unit] attributes
    • Document test categorization guidelines
  2. Annotate existing tests

    • Mark slow/integration tests with appropriate attributes
    • Run test categorization audit
  3. Restructure CI jobs

    • Split tests into unit/integration/slow jobs
    • Update CI configuration

Expected outcome:

  • PR feedback: ~2-3m (unit tests only)
  • Integration tests: ~4-6m (runs after unit tests pass)
  • Slow tests: ~10-15m (runs nightly or on-demand)

Phase 4: Coverage Measurement (Optional) 📊

Impact: Coverage visibility, not CI speed
Effort: 2-3 days
Risk: Low

  1. Add cargo-tarpaulin or cargo-llvm-cov

    • Create .github/workflows/coverage.yml
    • Configure coverage reporting (Codecov or artifacts)
    • Add coverage badge to README
  2. Set up coverage gating (optional)

    • Require minimum coverage threshold (e.g., 80%)
    • Show coverage impact in PRs

Expected outcome:

  • Coverage visibility: 0% → 100%
  • Coverage trends: Track over time
  • PR comments: Show coverage delta

Migration Strategy

Step 1: Prepare Fork & Test Environment

# 1. Fork the repository
gh repo fork EffortlessMetrics/tokmd --clone

# 2. Create feature branch
git checkout -b ci-optimization-phase1

# 3. Install cargo-nextest locally
cargo install cargo-nextest

# 4. Test nextest locally
cargo nextest run --workspace --all-features

Step 2: Implement Phase 1 (cargo-nextest)

# 1. Update .github/workflows/ci.yml
# Replace all instances of `cargo test` with `cargo nextest run`

# 2. Create .cargo/config.toml (if it doesn't exist)
cat > .cargo/config.toml << 'EOF'
[workspace.metadata.nextest]
slow-timeout = "180s"

[profile.ci]
failure-output = "immediate"
status-level = "pass"
final-status-level = "flaky"
EOF

# 3. Commit changes
git add .github/workflows/ci.yml .cargo/config.toml
git commit -m "ci: adopt cargo-nextest for parallel test execution"

# 4. Push to fork
git push origin ci-optimization-phase1

# 5. Create PR from fork
gh pr create --base main --head ci-optimization-phase1 \
  --title "ci: adopt cargo-nextest for 60-70% CI time reduction" \
  --body "Implements Phase 1 of CI optimization (#XXX). Replaces cargo test with cargo nextest run for parallel test execution."

Step 3: Monitor & Validate

# 1. Watch CI run on PR
gh run watch

# 2. Check job timings
gh run view <run-id> --json jobs

# 3. Compare with baseline
# Baseline: ~25 minutes
# Target: ~7 minutes (70% reduction)

Step 4: Merge & Roll Forward

# If CI passes:
# 1. Merge PR
gh pr merge <pr-number> --squash

# 2. Delete branch
git branch -D ci-optimization-phase1

# 3. Start Phase 2 (caching & smart jobs)
git checkout -b ci-optimization-phase2

Risk Assessment & Rollback Procedures

Risk Matrix

Priority Change Risk Level Mitigation Rollback
1 cargo-nextest adoption Low Well-maintained tool, drop-in replacement Revert to cargo test
2 Caching strategy Low Cache misses are non-blocking Remove sccache config
3 Smart job distribution Low Can be disabled via workflow_dispatch Merge fast/full paths
4 Windows optimization Low Windows runner is slowest anyway Remove RUSTFLAGS override
5 Test categorization Medium Requires test annotation effort Remove #[ignore] attributes

Rollback Procedures

cargo-nextest Rollback

# Revert to cargo test
- name: Run tests
  run: cargo test --all-features --verbose

Caching Rollback

# Remove sccache and extra cache config
# Keep basic rust-cache only
- uses: Swatinem/rust-cache@v2

Smart Job Rollback

# Merge fast/full paths back into single job
ci:
  runs-on: ubuntu-latest
  steps:
    - name: Run all tests
      run: cargo nextest run --workspace --all-features

Test Categorization Rollback

# Remove ignore attributes from tests
# Merge unit/integration/slow jobs back into single job
tests:
  run: cargo nextest run --workspace --all-features

Expected Outcomes Summary

CI Time Reduction

Metric Current Phase 1 Phase 2 Phase 3
PR feedback time ~25m ~7m ~2.5m ~2-3m
Push verification ~25m ~7m ~7m ~7m
Windows job ~24.8m ~6.5m ~6.5m ~6.5m
Ubuntu job ~9.5m ~2.5m ~2.5m ~2.5m

Overall reduction: 70-80% CI time reduction

Developer Experience Improvements

  • Faster feedback: PR results in <3 minutes instead of 25 minutes
  • Better visibility: Coverage reports show untested code
  • Smarter CI: Unit tests run first, integration tests after
  • Reduced flakiness: Nextest provides better test isolation

Quality Improvements

  • Parallel test execution: 3-5x faster on multi-core runners
  • Test timing data: Nextest shows test times
  • Coverage gating: Optional minimum coverage threshold
  • Clear categorization: Tests marked as slow/integration are explicit

References


Next Actions

Phase 1 (cargo-nextest) - This Week

  • Create fork and feature branch
  • Install cargo-nextest locally for testing
  • Update .github/workflows/ci.yml to use cargo-nextest
  • Create .cargo/config.toml with nextest configuration
  • Test on fork/branch before creating PR
  • Create PR with baseline CI time measurements
  • Monitor CI run and compare with baseline

Phase 2 (caching & smart jobs) - Next Week

  • Add sccache configuration to CI
  • Improve caching strategy across all jobs
  • Create ci-fast workflow for PRs (unit tests only)
  • Create ci-full workflow for pushes (full matrix)
  • Update branch protection rules if needed
  • Test and validate

Phase 3 (test categorization) - Following Weeks

  • Create shared test attributes module
  • Run test categorization audit
  • Annotate existing slow/integration tests
  • Restructure CI jobs by test type
  • Update CI configuration
  • Document test categorization guidelines

Phase 4 (coverage) - Optional

  • Decision: tarpaulin or llvm-cov?
  • Create .github/workflows/coverage.yml
  • Set up Codecov or artifact storage
  • Add coverage badge to README
  • Configure coverage gating (optional)

Questions for Review

  1. Should we adopt cargo-nextest? (High confidence, low risk)
  2. Should we split fast path (PR) and full verification (push)? (Improves developer experience)
  3. Should we categorize tests into unit/integration/slow tiers? (Requires annotation effort)
  4. Should we add coverage measurement? (Improves visibility, not CI speed)
  5. Should we gate on minimum coverage? (Requires consensus on threshold)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions