[Research] Testing Infrastructure Improvements

## Overview

This issue summarizes research findings on tokmd's testing infrastructure and recommends concrete improvements to reduce CI time, increase test coverage visibility, and enhance automation.

## Current CI/Test Setup Inventory

### CI Configuration (.github/workflows/ci.yml)

**Jobs (15+):**
- MSRV Check (ubuntu-latest)
- Build & Test (ubuntu-latest, windows-latest - matrix)
- Build & Test (macos-latest - push only)
- Feature Boundaries (ubuntu-latest)
- Wasm Compile & Test (ubuntu-latest)
- Quality Gate (ubuntu-latest)
- Cargo Deny (ubuntu-latest)
- Typos (ubuntu-latest)
- Proptest Smoke (ubuntu-latest)
- Publish Plan (ubuntu-latest)
- Version Consistency (ubuntu-latest)
- Docs Check (ubuntu-latest)
- Nix PR Package Gate (ubuntu-latest)
- Mutation Testing (Required, PR only)
- CI (Required - aggregates all above)

**Additional Workflows:**
- `test-action.yml` - GitHub action testing
- `mutants.yml` - Full mutation testing (on-demand)
- `fuzz.yml` - Nightly fuzz testing (9 targets)
- `cockpit.yml` - PR cockpit report generation

### Test Setup Across Crates

**Scale:**
- 67 workspace members (crates)
- 57 test directories (`crates/*/tests/`)
- 928 integration test files (`crates/*/tests/*.rs`)

**Testing Tools:**
- **Property testing**: proptest (256 cases, 10s timeout per case)
- **Snapshot testing**: insta v1.47.0 (configured in 14+ crates)
- **Mutation testing**: cargo-mutants v26.1.2 (PR-scoped, changed files only)
- **Fuzz testing**: cargo-fuzz (nightly builds, 9 targets)

**Test Types:**
- Unit tests (inline in `src/`)
- Integration tests (`tests/` directories)
- Property-based tests (`tests/properties.rs`)
- Snapshot tests (insta assertions)
- Fuzz targets (`fuzz/` directory)

### Current Test Execution

**Command:** `cargo test --all-features --verbose`

**Platforms:**
- Ubuntu latest (primary)
- Windows latest (matrix)
- macOS latest (push only, slower runner)
- Wasm32 (via wasm-pack)

**Performance:**
- Tests run sequentially (no parallelization)
- No test categorization (unit vs integration vs slow)
- No coverage measurement

## Identified Bottlenecks and Gaps

### 1. Test Parallelization (High Impact)

**Problem:**
- 928 integration tests run sequentially on each platform
- `cargo test` is single-threaded by default
- Large test suite = longer CI time, especially on slower runners (macOS)

**Evidence:**
```yaml
# Current CI runs tests sequentially on 3 platforms
- name: Run tests
  run: cargo test --all-features --verbose
```

### 2. Missing Coverage Measurement (Medium Impact)

**Problem:**
- No code coverage visibility
- No coverage gates (e.g., require 80% coverage)
- Cannot track coverage trends over time
- Mutation testing exists but doesn't provide coverage metrics

**Evidence:**
- No `cargo-tarpaulin` or `cargo-llvm-cov` in workflows
- No coverage artifacts or reports
- No coverage comments on PRs

### 3. Limited Integration Test Automation (Medium Impact)

**Problem:**
- No matrix for testing different feature combinations
- Feature boundary tests run manually, not systematically
- No explicit test categorization (unit/integration/e2e)

**Evidence:**
```yaml
# Only runs "all-features" and "no-default-features" for tokmd-analysis
- name: tokmd-analysis with all features
  run: cargo test -p tokmd-analysis --all-features --verbose
- name: tokmd-analysis with no default features
  run: cargo test -p tokmd-analysis --no-default-features --verbose
```

### 4. Slow Test Identification (Low Impact)

**Problem:**
- No test categorization (slow tests not isolated)
- No test duration tracking
- All tests run in every CI job

**Evidence:**
- No `#[ignore]` or `#[slow]` attributes found
- No test timing reports in CI output

### 5. Snapshot Test Workflow Gaps (Low Impact)

**Problem:**
- insta is configured but no CI workflow for snapshot review
- No automated snapshot update process
- Risk of snapshot drift

## Recommended Improvements

### Priority 1: Adopt cargo-nextest for Parallelization ⚡

**Implementation:**
```yaml
# Install nextest
- name: Install cargo-nextest
  uses: taiki-e/install-action@v2
  with:
    tool: cargo-nextest

# Run tests in parallel
- name: Run tests (nextest)
  run: cargo nextest run --all-features --workspace --verbose
```

**Benefits:**
- 3-5x faster test execution on multi-core runners
- Better test failure reporting
- Test timing data out of the box
- Smart test sharding for parallel CI jobs

**Estimated Impact:**
- Current: ~10-15 minutes per platform
- With nextest: ~3-5 minutes per platform
- **Overall CI reduction: 60-70%**

### Priority 2: Add Code Coverage Measurement 📊

**Implementation (Option A - cargo-tarpaulin):**
```yaml
- name: Generate coverage report
  run: |
    cargo install cargo-tarpaulin
    cargo tarpaulin --workspace --all-features \
      --out Xml --output-dir ./coverage \
      --ignore-tests --timeout 300

- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v4
  with:
    files: ./coverage/cobertura.xml
```

**Implementation (Option B - cargo-llvm-cov):**
```yaml
- name: Generate coverage report (llvm-cov)
  run: cargo llvm-cov --workspace --all-features --html --lcov

- name: Upload coverage artifacts
  uses: actions/upload-artifact@v4
  with:
    name: coverage-report
    path: target/llvm-cov/html/
```

**Benefits:**
- Visibility into test coverage gaps
- Coverage trends over time
- PR comments showing coverage impact
- Can set minimum coverage thresholds

**Estimated Impact:**
- 15-30% overhead per job (acceptable trade-off)
- Enables coverage gates (e.g., require 80% line coverage)

### Priority 3: Test Categorization & Smart CI 🎯

**Implementation:**
```rust
// Mark slow or expensive tests
#[test]
#[ignore = "slow - run only in nightly CI"]
fn expensive_integration_test() { /* ... */ }

// Mark integration tests
#[tokio::test]
#[cfg_attr(test, ignore = "integration - run in dedicated job")]
async fn api_integration_test() { /* ... */ }
```

**CI Workflow:**
```yaml
# Split into faster unit tests and slower integration tests
unit-tests:
  run: cargo nextest run --workspace --all-features --no-fail-fast --lib

integration-tests:
  run: cargo nextest run --workspace --all-features --test-threads=1

slow-tests:
  if: github.event_name == 'schedule' || contains(github.event.head_commit.message, '[run-slow]')
  run: cargo nextest run --workspace --all-features --run-ignored
```

**Benefits:**
- Faster PR feedback (unit tests run first)
- Reduced resource usage
- Better test organization

### Priority 4: Feature Matrix Testing 🧪

**Implementation:**
```yaml
feature-matrix:
  strategy:
    matrix:
      features:
        - "--all-features"
        - "--no-default-features"
        - "--features git"
        - "--features wasm"
  run: cargo nextest run --workspace ${{ matrix.features }}
```

**Benefits:**
- Catch feature boundary issues earlier
- Ensure feature flags work in isolation
- Prevent feature combinatorial bugs

### Priority 5: Snapshot Test Automation 📸

**Implementation:**
```yaml
# In CI, check snapshots (don't update)
env:
  INSTA_UPDATE: no

# In PR comment, show snapshot diffs
- name: Review snapshot changes
  if: failure()
  run: |
    cargo insta review --exit-code

# On explicit approval, update snapshots
- name: Update snapshots
  if: github.event_name == 'workflow_dispatch'
  env:
    INSTA_UPDATE: always
  run: cargo insta test --accept --unreferenced=auto
```

**Benefits:**
- Prevent snapshot drift
- Clear review process for snapshot changes
- Automated snapshot updates with approval

## Estimated CI Time Reduction

### Current State (Estimates)
- Ubuntu Build & Test: ~8-12 minutes
- Windows Build & Test: ~10-15 minutes (slower runner)
- macOS Build & Test: ~12-18 minutes (slowest runner)
- **Total test time per PR: ~30-45 minutes**

### With cargo-nextest (Priority 1)
- Ubuntu Build & Test: ~2-4 minutes (-70%)
- Windows Build & Test: ~3-5 minutes (-70%)
- macOS Build & Test: ~4-6 minutes (-70%)
- **Total test time per PR: ~9-15 minutes (-60-70%)**

### With All Improvements (Priorities 1-3)
- Unit tests (fast): ~2-3 minutes total (all platforms)
- Integration tests (medium): ~4-6 minutes total
- Slow tests (nightly): ~10-15 minutes (runs only on schedule)
- **PR feedback time: ~6-9 minutes (-70-80%)**

## Implementation Roadmap

### Phase 1: Quick Wins (1-2 days)
- [ ] Install cargo-nextest in CI
- [ ] Replace `cargo test` with `cargo nextest run`
- [ ] Measure baseline test times

### Phase 2: Coverage (2-3 days)
- [ ] Add cargo-tarpaulin or cargo-llvm-cov
- [ ] Configure coverage reporting (Codecov or artifacts)
- [ ] Set up coverage badge in README

### Phase 3: Test Organization (3-5 days)
- [ ] Categorize tests (unit/integration/slow attributes)
- [ ] Split CI jobs by test type
- [ ] Add feature matrix testing

### Phase 4: Advanced (1-2 days)
- [ ] Configure snapshot test review workflow
- [ ] Add coverage gating (optional, requires consensus)
- [ ] Optimize test fixtures for speed

## Next Steps

1. **Decision Point:** Do we want to adopt cargo-nextest? (High confidence, low risk)
2. **Decision Point:** Which coverage tool? (tarpaulin = simpler, llvm-cov = faster/more accurate)
3. **Decision Point:** Should we gate on minimum coverage? (Requires consensus on threshold)
4. **Action:** Start with Phase 1 (nextest adoption) as a proof-of-concept

## Related Issues

- None yet (this is the research baseline)

## References

- [cargo-nextest documentation](https://nexte.st/)
- [cargo-tarpaulin documentation](https://github.com/xd009642/tarpaulin)
- [cargo-llvm-cov documentation](https://github.com/taiki-e/cargo-llvm-cov)
- [Insta snapshot testing guide](https://insta.rs/)

[Research] Testing Infrastructure Improvements #905

Description

Overview

Current CI/Test Setup Inventory

CI Configuration (.github/workflows/ci.yml)

Test Setup Across Crates

Current Test Execution

Identified Bottlenecks and Gaps

1. Test Parallelization (High Impact)

2. Missing Coverage Measurement (Medium Impact)

3. Limited Integration Test Automation (Medium Impact)

4. Slow Test Identification (Low Impact)

5. Snapshot Test Workflow Gaps (Low Impact)

Recommended Improvements

Priority 1: Adopt cargo-nextest for Parallelization ⚡

Priority 2: Add Code Coverage Measurement 📊

Priority 3: Test Categorization & Smart CI 🎯

Priority 4: Feature Matrix Testing 🧪

Priority 5: Snapshot Test Automation 📸

Estimated CI Time Reduction

Current State (Estimates)

With cargo-nextest (Priority 1)

With All Improvements (Priorities 1-3)

Implementation Roadmap

Phase 1: Quick Wins (1-2 days)

Phase 2: Coverage (2-3 days)

Phase 3: Test Organization (3-5 days)

Phase 4: Advanced (1-2 days)

Next Steps

Related Issues

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions