Skip to content

[Research] Testing Infrastructure Improvements #905

@EffortlessSteven

Description

@EffortlessSteven

Overview

This issue summarizes research findings on tokmd's testing infrastructure and recommends concrete improvements to reduce CI time, increase test coverage visibility, and enhance automation.

Current CI/Test Setup Inventory

CI Configuration (.github/workflows/ci.yml)

Jobs (15+):

  • MSRV Check (ubuntu-latest)
  • Build & Test (ubuntu-latest, windows-latest - matrix)
  • Build & Test (macos-latest - push only)
  • Feature Boundaries (ubuntu-latest)
  • Wasm Compile & Test (ubuntu-latest)
  • Quality Gate (ubuntu-latest)
  • Cargo Deny (ubuntu-latest)
  • Typos (ubuntu-latest)
  • Proptest Smoke (ubuntu-latest)
  • Publish Plan (ubuntu-latest)
  • Version Consistency (ubuntu-latest)
  • Docs Check (ubuntu-latest)
  • Nix PR Package Gate (ubuntu-latest)
  • Mutation Testing (Required, PR only)
  • CI (Required - aggregates all above)

Additional Workflows:

  • test-action.yml - GitHub action testing
  • mutants.yml - Full mutation testing (on-demand)
  • fuzz.yml - Nightly fuzz testing (9 targets)
  • cockpit.yml - PR cockpit report generation

Test Setup Across Crates

Scale:

  • 67 workspace members (crates)
  • 57 test directories (crates/*/tests/)
  • 928 integration test files (crates/*/tests/*.rs)

Testing Tools:

  • Property testing: proptest (256 cases, 10s timeout per case)
  • Snapshot testing: insta v1.47.0 (configured in 14+ crates)
  • Mutation testing: cargo-mutants v26.1.2 (PR-scoped, changed files only)
  • Fuzz testing: cargo-fuzz (nightly builds, 9 targets)

Test Types:

  • Unit tests (inline in src/)
  • Integration tests (tests/ directories)
  • Property-based tests (tests/properties.rs)
  • Snapshot tests (insta assertions)
  • Fuzz targets (fuzz/ directory)

Current Test Execution

Command: cargo test --all-features --verbose

Platforms:

  • Ubuntu latest (primary)
  • Windows latest (matrix)
  • macOS latest (push only, slower runner)
  • Wasm32 (via wasm-pack)

Performance:

  • Tests run sequentially (no parallelization)
  • No test categorization (unit vs integration vs slow)
  • No coverage measurement

Identified Bottlenecks and Gaps

1. Test Parallelization (High Impact)

Problem:

  • 928 integration tests run sequentially on each platform
  • cargo test is single-threaded by default
  • Large test suite = longer CI time, especially on slower runners (macOS)

Evidence:

# Current CI runs tests sequentially on 3 platforms
- name: Run tests
  run: cargo test --all-features --verbose

2. Missing Coverage Measurement (Medium Impact)

Problem:

  • No code coverage visibility
  • No coverage gates (e.g., require 80% coverage)
  • Cannot track coverage trends over time
  • Mutation testing exists but doesn't provide coverage metrics

Evidence:

  • No cargo-tarpaulin or cargo-llvm-cov in workflows
  • No coverage artifacts or reports
  • No coverage comments on PRs

3. Limited Integration Test Automation (Medium Impact)

Problem:

  • No matrix for testing different feature combinations
  • Feature boundary tests run manually, not systematically
  • No explicit test categorization (unit/integration/e2e)

Evidence:

# Only runs "all-features" and "no-default-features" for tokmd-analysis
- name: tokmd-analysis with all features
  run: cargo test -p tokmd-analysis --all-features --verbose
- name: tokmd-analysis with no default features
  run: cargo test -p tokmd-analysis --no-default-features --verbose

4. Slow Test Identification (Low Impact)

Problem:

  • No test categorization (slow tests not isolated)
  • No test duration tracking
  • All tests run in every CI job

Evidence:

  • No #[ignore] or #[slow] attributes found
  • No test timing reports in CI output

5. Snapshot Test Workflow Gaps (Low Impact)

Problem:

  • insta is configured but no CI workflow for snapshot review
  • No automated snapshot update process
  • Risk of snapshot drift

Recommended Improvements

Priority 1: Adopt cargo-nextest for Parallelization ⚡

Implementation:

# Install nextest
- name: Install cargo-nextest
  uses: taiki-e/install-action@v2
  with:
    tool: cargo-nextest

# Run tests in parallel
- name: Run tests (nextest)
  run: cargo nextest run --all-features --workspace --verbose

Benefits:

  • 3-5x faster test execution on multi-core runners
  • Better test failure reporting
  • Test timing data out of the box
  • Smart test sharding for parallel CI jobs

Estimated Impact:

  • Current: ~10-15 minutes per platform
  • With nextest: ~3-5 minutes per platform
  • Overall CI reduction: 60-70%

Priority 2: Add Code Coverage Measurement 📊

Implementation (Option A - cargo-tarpaulin):

- name: Generate coverage report
  run: |
    cargo install cargo-tarpaulin
    cargo tarpaulin --workspace --all-features \
      --out Xml --output-dir ./coverage \
      --ignore-tests --timeout 300

- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v4
  with:
    files: ./coverage/cobertura.xml

Implementation (Option B - cargo-llvm-cov):

- name: Generate coverage report (llvm-cov)
  run: cargo llvm-cov --workspace --all-features --html --lcov

- name: Upload coverage artifacts
  uses: actions/upload-artifact@v4
  with:
    name: coverage-report
    path: target/llvm-cov/html/

Benefits:

  • Visibility into test coverage gaps
  • Coverage trends over time
  • PR comments showing coverage impact
  • Can set minimum coverage thresholds

Estimated Impact:

  • 15-30% overhead per job (acceptable trade-off)
  • Enables coverage gates (e.g., require 80% line coverage)

Priority 3: Test Categorization & Smart CI 🎯

Implementation:

// Mark slow or expensive tests
#[test]
#[ignore = "slow - run only in nightly CI"]
fn expensive_integration_test() { /* ... */ }

// Mark integration tests
#[tokio::test]
#[cfg_attr(test, ignore = "integration - run in dedicated job")]
async fn api_integration_test() { /* ... */ }

CI Workflow:

# Split into faster unit tests and slower integration tests
unit-tests:
  run: cargo nextest run --workspace --all-features --no-fail-fast --lib

integration-tests:
  run: cargo nextest run --workspace --all-features --test-threads=1

slow-tests:
  if: github.event_name == 'schedule' || contains(github.event.head_commit.message, '[run-slow]')
  run: cargo nextest run --workspace --all-features --run-ignored

Benefits:

  • Faster PR feedback (unit tests run first)
  • Reduced resource usage
  • Better test organization

Priority 4: Feature Matrix Testing 🧪

Implementation:

feature-matrix:
  strategy:
    matrix:
      features:
        - "--all-features"
        - "--no-default-features"
        - "--features git"
        - "--features wasm"
  run: cargo nextest run --workspace ${{ matrix.features }}

Benefits:

  • Catch feature boundary issues earlier
  • Ensure feature flags work in isolation
  • Prevent feature combinatorial bugs

Priority 5: Snapshot Test Automation 📸

Implementation:

# In CI, check snapshots (don't update)
env:
  INSTA_UPDATE: no

# In PR comment, show snapshot diffs
- name: Review snapshot changes
  if: failure()
  run: |
    cargo insta review --exit-code

# On explicit approval, update snapshots
- name: Update snapshots
  if: github.event_name == 'workflow_dispatch'
  env:
    INSTA_UPDATE: always
  run: cargo insta test --accept --unreferenced=auto

Benefits:

  • Prevent snapshot drift
  • Clear review process for snapshot changes
  • Automated snapshot updates with approval

Estimated CI Time Reduction

Current State (Estimates)

  • Ubuntu Build & Test: ~8-12 minutes
  • Windows Build & Test: ~10-15 minutes (slower runner)
  • macOS Build & Test: ~12-18 minutes (slowest runner)
  • Total test time per PR: ~30-45 minutes

With cargo-nextest (Priority 1)

  • Ubuntu Build & Test: ~2-4 minutes (-70%)
  • Windows Build & Test: ~3-5 minutes (-70%)
  • macOS Build & Test: ~4-6 minutes (-70%)
  • Total test time per PR: ~9-15 minutes (-60-70%)

With All Improvements (Priorities 1-3)

  • Unit tests (fast): ~2-3 minutes total (all platforms)
  • Integration tests (medium): ~4-6 minutes total
  • Slow tests (nightly): ~10-15 minutes (runs only on schedule)
  • PR feedback time: ~6-9 minutes (-70-80%)

Implementation Roadmap

Phase 1: Quick Wins (1-2 days)

  • Install cargo-nextest in CI
  • Replace cargo test with cargo nextest run
  • Measure baseline test times

Phase 2: Coverage (2-3 days)

  • Add cargo-tarpaulin or cargo-llvm-cov
  • Configure coverage reporting (Codecov or artifacts)
  • Set up coverage badge in README

Phase 3: Test Organization (3-5 days)

  • Categorize tests (unit/integration/slow attributes)
  • Split CI jobs by test type
  • Add feature matrix testing

Phase 4: Advanced (1-2 days)

  • Configure snapshot test review workflow
  • Add coverage gating (optional, requires consensus)
  • Optimize test fixtures for speed

Next Steps

  1. Decision Point: Do we want to adopt cargo-nextest? (High confidence, low risk)
  2. Decision Point: Which coverage tool? (tarpaulin = simpler, llvm-cov = faster/more accurate)
  3. Decision Point: Should we gate on minimum coverage? (Requires consensus on threshold)
  4. Action: Start with Phase 1 (nextest adoption) as a proof-of-concept

Related Issues

  • None yet (this is the research baseline)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions