[Research] CI Workflow Optimization

## CI Workflow Optimization Research

**Research date:** 2026-04-04  
**Baseline CI time:** ~25 minutes (latest successful run: 23985278088)  
**Target CI time:** ~6-9 minutes (70-80% reduction)

This issue consolidates research on CI workflow optimization opportunities for tokmd, building on the testing infrastructure improvements in #915. It analyzes the current state, identifies bottlenecks, and provides concrete recommendations with migration strategies.

---

## Current State Analysis

### CI Workflow Structure (`.github/workflows/ci.yml`)

**15+ jobs running in parallel/concurrent:**

| Job | Platform | Duration* | Purpose |
|-----|----------|-----------|---------|
| MSRV Check | ubuntu-latest | ~1m | Minimum Rust version compatibility |
| Build & Test | ubuntu-latest | ~9.5m | Build & test all features |
| Build & Test | windows-latest | ~24.8m | Build & test all features |
| Build & Test | macos-latest | ~9.3m | Build & test all features (push only) |
| Feature Boundaries | ubuntu-latest | ~1.2m | Feature flag boundary tests |
| Wasm Compile & Test | ubuntu-latest | ~2.9m | WASM compilation and tests |
| Quality Gate | ubuntu-latest | ~6.7m | Clippy + fmt checks |
| Cargo Deny | ubuntu-latest | ~15s | Dependency security checks |
| Typos | ubuntu-latest | ~5s | Typos linting |
| Proptest Smoke | ubuntu-latest | ~1.7m | Property tests (reduced iterations) |
| Publish Plan | ubuntu-latest | ~35s | Verify publish plan |
| Version consistency | ubuntu-latest | ~27s | Check version alignment |
| Docs Check | ubuntu-latest | ~1m | Documentation drift check |
| Nix PR Package Gate | ubuntu-latest | ~6.1m | Nix flake verification |
| Mutation Testing | ubuntu-latest | N/A | PR-scoped mutation tests |

\*Durations from latest successful run (2026-04-04)

**Total CI time:** ~25 minutes (limited by slowest job: Windows @ 24.8m)

### Current Test Configuration

**Test execution:**
- Command: `cargo test --all-features --verbose`
- Execution: Sequential (no parallelization)
- Platforms: Ubuntu, Windows, macOS, WASM32
- Workspace: 67 crates, 928 integration test files

**Caching strategy:**
- Rust cache: `Swatinem/rust-cache@v2` (target directory caching)
- Fuzz corpus: GitHub Actions cache (`actions/cache@v5`)
- No test result caching

**Key observations:**
1. **Windows job is the bottleneck** (24.8m vs 9.5m on Ubuntu) - 2.6x slower
2. **Tests run sequentially** - no parallel test execution within jobs
3. **No test categorization** - all tests run every time
4. **No coverage measurement** - blind to untested code
5. **MacOS runs on push only** - good optimization already in place

---

## Optimization Opportunities

### Priority 1: Adopt cargo-nextest (Highest Impact) ⚡

**Expected impact:** 60-70% CI time reduction  
**Estimated effort:** 1-2 days  
**Risk:** Low (well-maintained, drop-in replacement)

#### Current State
```yaml
# Sequential test execution
- name: Run tests
  run: cargo test --all-features --verbose
```

#### Recommended Change
```yaml
# Parallel test execution with nextest
- name: Install cargo-nextest
  uses: taiki-e/install-action@v2
  with:
    tool: cargo-nextest

- name: Run tests (nextest)
  run: cargo nextest run --workspace --all-features --verbose
```

#### Additional Optimization: Test Sharding
```yaml
test-ubuntu-sharded:
  runs-on: ubuntu-latest
  strategy:
    matrix:
      shard: [1, 2, 3, 4]
  steps:
    - name: Run tests (shard ${{ matrix.shard }}/4)
      run: cargo nextest run --workspace --all-features --partition count:${{ matrix.shard }}/4
```

**Estimated impact:**
- Ubuntu: ~9.5m → ~2.5m (-74%)
- Windows: ~24.8m → ~6.5m (-74%)
- macOS: ~9.3m → ~2.5m (-73%)

**Total CI time:** ~7 minutes (down from ~25m)

---

### Priority 2: Optimize Caching Strategy (Medium Impact) 🗄️

**Expected impact:** 2-5 minutes saved per run  
**Estimated effort:** 0.5-1 day  
**Risk:** Low

#### Current State
```yaml
# Only caches compiled artifacts
- uses: Swatinem/rust-cache@v2
  if: runner.os == 'Linux'
  with:
    cache-directories: ${{ runner.temp }}/target
```

#### Recommended Change
```yaml
# Cache compiled artifacts + sccache for incremental compilation
- uses: Swatinem/rust-cache@v2
  if: runner.os == 'Linux'
  with:
    cache-directories: ${{ runner.temp }}/target
    cache-on-failure: true
    shared-key: v1-${{ runner.os }}

# Add sccache for cross-job compilation caching
- name: Configure sccache
  run: |
    echo "RUSTC_WRAPPER=sccache" >> "$GITHUB_ENV"
    echo "SCCACHE_CACHE_SIZE=2G" >> "$GITHUB_ENV"
    echo "SCCACHE_DIR=${RUNNER_TEMP}/sccache" >> "$GITHUB_ENV"

- uses: actions/cache@v5
  with:
    path: ${{ runner.temp }}/sccache
    key: sccache-${{ runner.os }}-${{ hashFiles('Cargo.lock') }}
    restore-keys: |
      sccache-${{ runner.os }}-
```

**Estimated impact:**
- First run after cache miss: ~2m saved
- Subsequent runs: ~3-5m saved (incremental compilation)

---

### Priority 3: Smart Job Distribution (Medium Impact) 🎯

**Expected impact:** Faster PR feedback  
**Estimated effort:** 1-2 days  
**Risk:** Low

#### Current State
All jobs run on every PR, including slow platform tests.

#### Recommended Change
```yaml
# Split into fast path (PR) and full verification (push)
ci-fast:
  # Runs on every PR - fast feedback
  if: github.event_name == 'pull_request'
  runs-on: ubuntu-latest
  steps:
    # Unit tests only (fast)
    - name: Run unit tests
      run: cargo nextest run --workspace --all-features --lib

ci-full:
  # Runs on push - full verification
  if: github.event_name == 'push'
  runs-on: ${{ matrix.os }}
  strategy:
    matrix:
      os: [ubuntu-latest, windows-latest, macos-latest]
  steps:
    # Full test suite
    - name: Run all tests
      run: cargo nextest run --workspace --all-features
```

**Estimated impact:**
- PR feedback time: ~2.5m (Ubuntu unit tests only)
- Push verification time: ~7m (full matrix)
- Developer experience: Immediate feedback on PRs

---

### Priority 4: Windows Job Optimization (Medium Impact) 🪟

**Expected impact:** Reduce Windows job from ~25m to ~6m  
**Estimated effort:** 0.5 day  
**Risk:** Low

#### Current State
```yaml
build:
  runs-on: ${{ matrix.os }}
  strategy:
    matrix:
      os: [ubuntu-latest, windows-latest]
  steps:
    - name: Run tests
      run: cargo test --all-features --verbose
```

#### Recommended Change
```yaml
build:
  runs-on: ${{ matrix.os }}
  strategy:
    matrix:
      os: [ubuntu-latest, windows-latest]
  steps:
    # Use persistent cache on Windows (slower runner)
    - uses: Swatinem/rust-cache@v2
      with:
        cache-on-failure: true
        prefix: ${{ matrix.os }}

    # Install nextest
    - uses: taiki-e/install-action@v2
      with:
        tool: cargo-nextest

    # Run tests in parallel
    - name: Run tests (nextest)
      run: cargo nextest run --workspace --all-features --verbose

    # Reduce debuginfo for faster compilation
    env:
      RUSTFLAGS: -C debuginfo=0
```

**Estimated impact:**
- Windows job: ~24.8m → ~6.5m (-74%)

---

### Priority 5: Test Categorization (Lower Priority) 🏷️

**Expected impact:** Smarter CI with unit/integration/slow tiers  
**Estimated effort:** 3-5 days  
**Risk:** Medium (requires test annotation)

#### Current State
All tests run in every job.

#### Recommended Change
```rust
// Mark slow or expensive tests
#[test]
#[ignore = "slow - run only in nightly CI"]
fn expensive_integration_test() { /* ... */ }

// Mark integration tests
#[tokio::test]
#[cfg_attr(test, ignore = "integration - run in dedicated job")]
async fn api_integration_test() { /* ... */ }
```

```yaml
unit-tests:
  run: cargo nextest run --workspace --all-features --lib

integration-tests:
  run: cargo nextest run --workspace --all-features --test-threads=1

slow-tests:
  if: github.event_name == 'schedule' || contains(github.event.head_commit.message, '[run-slow]')
  run: cargo nextest run --workspace --all-features --run-ignored
```

**Estimated impact:**
- PR feedback: ~2-3m (unit tests only)
- Integration tests: ~4-6m (runs after unit tests pass)
- Slow tests: ~10-15m (runs nightly or on-demand)

---

## Implementation Priority Order

### Phase 1: Quick Wins (Week 1) 🚀

**Impact:** 60-70% CI time reduction  
**Effort:** 1-2 days  
**Risk:** Low

1. ✅ **Adopt cargo-nextest**
   - Replace `cargo test` with `cargo nextest run` in all test jobs
   - Add nextest configuration to `.cargo/config.toml`
   - Measure baseline test times

2. ✅ **Optimize Windows job**
   - Add `RUSTFLAGS: -C debuginfo=0` environment variable
   - Enable cache-on-failure for Windows runner
   - Test on fork/branch before merging

**Expected outcome:**
- CI time: ~25m → ~7m (-72%)
- Windows job: ~24.8m → ~6.5m (-74%)

### Phase 2: Caching & Smart Jobs (Week 2) 🗄️

**Impact:** 2-5 minutes saved per run + faster PR feedback  
**Effort:** 1-2 days  
**Risk:** Low

1. ✅ **Improve caching strategy**
   - Add sccache for cross-job compilation caching
   - Enable cache-on-failure for all runners
   - Add shared cache keys across jobs

2. ✅ **Split fast path (PR) and full verification (push)**
   - Create `ci-fast` workflow for PRs (unit tests only)
   - Create `ci-full` workflow for pushes (full matrix)
   - Update branch protection rules

**Expected outcome:**
- PR feedback time: ~2.5m (unit tests on Ubuntu)
- Push verification time: ~7m (full matrix)
- Cache hit improvement: 10-20% faster on subsequent runs

### Phase 3: Test Categorization (Week 3-4) 🏷️

**Impact:** Smarter CI with unit/integration/slow tiers  
**Effort:** 3-5 days  
**Risk:** Medium

1. ✅ **Create shared test attributes module**
   - Define `#[slow]`, `#[integration]`, `#[unit]` attributes
   - Document test categorization guidelines

2. ✅ **Annotate existing tests**
   - Mark slow/integration tests with appropriate attributes
   - Run test categorization audit

3. ✅ **Restructure CI jobs**
   - Split tests into unit/integration/slow jobs
   - Update CI configuration

**Expected outcome:**
- PR feedback: ~2-3m (unit tests only)
- Integration tests: ~4-6m (runs after unit tests pass)
- Slow tests: ~10-15m (runs nightly or on-demand)

### Phase 4: Coverage Measurement (Optional) 📊

**Impact:** Coverage visibility, not CI speed  
**Effort:** 2-3 days  
**Risk:** Low

1. ✅ **Add cargo-tarpaulin or cargo-llvm-cov**
   - Create `.github/workflows/coverage.yml`
   - Configure coverage reporting (Codecov or artifacts)
   - Add coverage badge to README

2. ✅ **Set up coverage gating (optional)**
   - Require minimum coverage threshold (e.g., 80%)
   - Show coverage impact in PRs

**Expected outcome:**
- Coverage visibility: 0% → 100%
- Coverage trends: Track over time
- PR comments: Show coverage delta

---

## Migration Strategy

### Step 1: Prepare Fork & Test Environment
```bash
# 1. Fork the repository
gh repo fork EffortlessMetrics/tokmd --clone

# 2. Create feature branch
git checkout -b ci-optimization-phase1

# 3. Install cargo-nextest locally
cargo install cargo-nextest

# 4. Test nextest locally
cargo nextest run --workspace --all-features
```

### Step 2: Implement Phase 1 (cargo-nextest)
```bash
# 1. Update .github/workflows/ci.yml
# Replace all instances of `cargo test` with `cargo nextest run`

# 2. Create .cargo/config.toml (if it doesn't exist)
cat > .cargo/config.toml << 'EOF'
[workspace.metadata.nextest]
slow-timeout = "180s"

[profile.ci]
failure-output = "immediate"
status-level = "pass"
final-status-level = "flaky"
EOF

# 3. Commit changes
git add .github/workflows/ci.yml .cargo/config.toml
git commit -m "ci: adopt cargo-nextest for parallel test execution"

# 4. Push to fork
git push origin ci-optimization-phase1

# 5. Create PR from fork
gh pr create --base main --head ci-optimization-phase1 \
  --title "ci: adopt cargo-nextest for 60-70% CI time reduction" \
  --body "Implements Phase 1 of CI optimization (#XXX). Replaces cargo test with cargo nextest run for parallel test execution."
```

### Step 3: Monitor & Validate
```bash
# 1. Watch CI run on PR
gh run watch

# 2. Check job timings
gh run view <run-id> --json jobs

# 3. Compare with baseline
# Baseline: ~25 minutes
# Target: ~7 minutes (70% reduction)
```

### Step 4: Merge & Roll Forward
```bash
# If CI passes:
# 1. Merge PR
gh pr merge <pr-number> --squash

# 2. Delete branch
git branch -D ci-optimization-phase1

# 3. Start Phase 2 (caching & smart jobs)
git checkout -b ci-optimization-phase2
```

---

## Risk Assessment & Rollback Procedures

### Risk Matrix

| Priority | Change | Risk Level | Mitigation | Rollback |
|----------|--------|------------|------------|----------|
| 1 | cargo-nextest adoption | Low | Well-maintained tool, drop-in replacement | Revert to `cargo test` |
| 2 | Caching strategy | Low | Cache misses are non-blocking | Remove sccache config |
| 3 | Smart job distribution | Low | Can be disabled via workflow_dispatch | Merge fast/full paths |
| 4 | Windows optimization | Low | Windows runner is slowest anyway | Remove RUSTFLAGS override |
| 5 | Test categorization | Medium | Requires test annotation effort | Remove `#[ignore]` attributes |

### Rollback Procedures

#### cargo-nextest Rollback
```yaml
# Revert to cargo test
- name: Run tests
  run: cargo test --all-features --verbose
```

#### Caching Rollback
```yaml
# Remove sccache and extra cache config
# Keep basic rust-cache only
- uses: Swatinem/rust-cache@v2
```

#### Smart Job Rollback
```yaml
# Merge fast/full paths back into single job
ci:
  runs-on: ubuntu-latest
  steps:
    - name: Run all tests
      run: cargo nextest run --workspace --all-features
```

#### Test Categorization Rollback
```yaml
# Remove ignore attributes from tests
# Merge unit/integration/slow jobs back into single job
tests:
  run: cargo nextest run --workspace --all-features
```

---

## Expected Outcomes Summary

### CI Time Reduction

| Metric | Current | Phase 1 | Phase 2 | Phase 3 |
|--------|---------|---------|---------|---------|
| PR feedback time | ~25m | ~7m | ~2.5m | ~2-3m |
| Push verification | ~25m | ~7m | ~7m | ~7m |
| Windows job | ~24.8m | ~6.5m | ~6.5m | ~6.5m |
| Ubuntu job | ~9.5m | ~2.5m | ~2.5m | ~2.5m |

**Overall reduction:** 70-80% CI time reduction

### Developer Experience Improvements

- **Faster feedback:** PR results in <3 minutes instead of 25 minutes
- **Better visibility:** Coverage reports show untested code
- **Smarter CI:** Unit tests run first, integration tests after
- **Reduced flakiness:** Nextest provides better test isolation

### Quality Improvements

- **Parallel test execution:** 3-5x faster on multi-core runners
- **Test timing data:** Nextest shows test times
- **Coverage gating:** Optional minimum coverage threshold
- **Clear categorization:** Tests marked as slow/integration are explicit

---

## References

- Testing infrastructure improvements: #915
- Research baseline: #905
- [cargo-nextest documentation](https://nexte.st/)
- [cargo-tarpaulin documentation](https://github.com/xd009642/tarpaulin)
- [cargo-llvm-cov documentation](https://github.com/taiki-e/cargo-llvm-cov)

---

## Next Actions

### Phase 1 (cargo-nextest) - This Week
- [ ] Create fork and feature branch
- [ ] Install cargo-nextest locally for testing
- [ ] Update `.github/workflows/ci.yml` to use cargo-nextest
- [ ] Create `.cargo/config.toml` with nextest configuration
- [ ] Test on fork/branch before creating PR
- [ ] Create PR with baseline CI time measurements
- [ ] Monitor CI run and compare with baseline

### Phase 2 (caching & smart jobs) - Next Week
- [ ] Add sccache configuration to CI
- [ ] Improve caching strategy across all jobs
- [ ] Create `ci-fast` workflow for PRs (unit tests only)
- [ ] Create `ci-full` workflow for pushes (full matrix)
- [ ] Update branch protection rules if needed
- [ ] Test and validate

### Phase 3 (test categorization) - Following Weeks
- [ ] Create shared test attributes module
- [ ] Run test categorization audit
- [ ] Annotate existing slow/integration tests
- [ ] Restructure CI jobs by test type
- [ ] Update CI configuration
- [ ] Document test categorization guidelines

### Phase 4 (coverage) - Optional
- [ ] Decision: tarpaulin or llvm-cov?
- [ ] Create `.github/workflows/coverage.yml`
- [ ] Set up Codecov or artifact storage
- [ ] Add coverage badge to README
- [ ] Configure coverage gating (optional)

---

## Questions for Review

1. **Should we adopt cargo-nextest?** (High confidence, low risk)
2. **Should we split fast path (PR) and full verification (push)?** (Improves developer experience)
3. **Should we categorize tests into unit/integration/slow tiers?** (Requires annotation effort)
4. **Should we add coverage measurement?** (Improves visibility, not CI speed)
5. **Should we gate on minimum coverage?** (Requires consensus on threshold)


Job	Platform	Duration*	Purpose
MSRV Check	ubuntu-latest	~1m	Minimum Rust version compatibility
Build & Test	ubuntu-latest	~9.5m	Build & test all features
Build & Test	windows-latest	~24.8m	Build & test all features
Build & Test	macos-latest	~9.3m	Build & test all features (push only)
Feature Boundaries	ubuntu-latest	~1.2m	Feature flag boundary tests
Wasm Compile & Test	ubuntu-latest	~2.9m	WASM compilation and tests
Quality Gate	ubuntu-latest	~6.7m	Clippy + fmt checks
Cargo Deny	ubuntu-latest	~15s	Dependency security checks
Typos	ubuntu-latest	~5s	Typos linting
Proptest Smoke	ubuntu-latest	~1.7m	Property tests (reduced iterations)
Publish Plan	ubuntu-latest	~35s	Verify publish plan
Version consistency	ubuntu-latest	~27s	Check version alignment
Docs Check	ubuntu-latest	~1m	Documentation drift check
Nix PR Package Gate	ubuntu-latest	~6.1m	Nix flake verification
Mutation Testing	ubuntu-latest	N/A	PR-scoped mutation tests

Priority	Change	Risk Level	Mitigation	Rollback
1	cargo-nextest adoption	Low	Well-maintained tool, drop-in replacement	Revert to `cargo test`
2	Caching strategy	Low	Cache misses are non-blocking	Remove sccache config
3	Smart job distribution	Low	Can be disabled via workflow_dispatch	Merge fast/full paths
4	Windows optimization	Low	Windows runner is slowest anyway	Remove RUSTFLAGS override
5	Test categorization	Medium	Requires test annotation effort	Remove `#[ignore]` attributes

Metric	Current	Phase 1	Phase 2	Phase 3
PR feedback time	~25m	~7m	~2.5m	~2-3m
Push verification	~25m	~7m	~7m	~7m
Windows job	~24.8m	~6.5m	~6.5m	~6.5m
Ubuntu job	~9.5m	~2.5m	~2.5m	~2.5m

[Research] CI Workflow Optimization #941

Description

CI Workflow Optimization Research

Current State Analysis

CI Workflow Structure (.github/workflows/ci.yml)

Current Test Configuration

Optimization Opportunities

Priority 1: Adopt cargo-nextest (Highest Impact) ⚡

Current State

Recommended Change

Additional Optimization: Test Sharding

Priority 2: Optimize Caching Strategy (Medium Impact) 🗄️

Current State

Recommended Change

Priority 3: Smart Job Distribution (Medium Impact) 🎯

Current State

Recommended Change

Priority 4: Windows Job Optimization (Medium Impact) 🪟

Current State

Recommended Change

Priority 5: Test Categorization (Lower Priority) 🏷️

Current State

Recommended Change

Implementation Priority Order

Phase 1: Quick Wins (Week 1) 🚀

Phase 2: Caching & Smart Jobs (Week 2) 🗄️

Phase 3: Test Categorization (Week 3-4) 🏷️

Phase 4: Coverage Measurement (Optional) 📊

Migration Strategy

Step 1: Prepare Fork & Test Environment

Step 2: Implement Phase 1 (cargo-nextest)

Step 3: Monitor & Validate

Step 4: Merge & Roll Forward

Risk Assessment & Rollback Procedures

Risk Matrix

Rollback Procedures

cargo-nextest Rollback

Caching Rollback

Smart Job Rollback

Test Categorization Rollback

Expected Outcomes Summary

CI Time Reduction

Developer Experience Improvements

Quality Improvements

References

Next Actions

Phase 1 (cargo-nextest) - This Week

Phase 2 (caching & smart jobs) - Next Week

Phase 3 (test categorization) - Following Weeks

Phase 4 (coverage) - Optional

Questions for Review

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

CI Workflow Structure (`.github/workflows/ci.yml`)