Tests Module - Agent Scaffolding

Module Overview

Purpose: Comprehensive test suite execution and validation for the GNN processing pipeline

Pipeline Step: Step 2: Test suite execution (2_tests.py)

Category: Testing / Quality Assurance

Status: ✅ Production Ready

Version: 1.0.0

Last Updated: 2026-03-24

Core Functionality

Primary Responsibilities

Comprehensive test suite execution
Test result collection and analysis
Coverage analysis and reporting
Performance testing and benchmarking
Test environment management and validation

Key Capabilities

Multi-level test execution (unit, integration, performance)
Comprehensive test reporting and analysis
Coverage analysis and optimization
Performance benchmarking and profiling
Test environment validation and setup

API Reference

Public Functions

`run_tests(logger, output_dir, verbose=False, include_slow=False, fast_only=True, comprehensive=False, generate_coverage=False, auto_fallback=True) -> bool`

Description: Main test execution function called by orchestrator (2_tests.py). Routes to appropriate test execution mode based on parameters.

Parameters:

logger (logging.Logger): Logger instance for progress reporting
output_dir (Path): Output directory for test results
verbose (bool): Enable verbose output (default: False)
include_slow (bool): Include slow tests (deprecated, use comprehensive instead) (default: False)
fast_only (bool): Run only fast tests (default: True)
comprehensive (bool): Run comprehensive test suite - all tests (default: False)
generate_coverage (bool): Generate coverage reports (default: False)
auto_fallback (bool): If fast mode collects zero tests, retry with comprehensive mode (default: True)

Returns: True if tests passed, False otherwise

Behavior:

If comprehensive=True: Runs all tests via run_comprehensive_tests()
If fast_only=True and comprehensive=False: Runs fast tests via run_fast_pipeline_tests(). If that fails and auto_fallback=True and the execution report shows zero tests collected, falls back to run_comprehensive_tests()
Otherwise: Runs reliable fast tests via run_fast_reliable_tests()

Strict markers: pyproject.toml enables --strict-markers. Unregistered markers (for example anyio without pytest-anyio) break collection. Prefer sync tests that call asyncio.run() for short async checks when the environment may omit dev extras; with uv sync --extra dev, pytest-asyncio and @pytest.mark.asyncio are available.

Example:

from tests import run_tests

success = run_tests(
    logger=logger,
    output_dir=Path("output/2_tests_output"),
    verbose=True,
    fast_only=True,
    comprehensive=False
)

`run_fast_pipeline_tests(logger, output_dir, verbose=False) -> bool`

Description: Run fast test suite for quick pipeline validation

Parameters:

logger (logging.Logger): Logger instance for progress reporting
output_dir (Path): Output directory for test results
verbose (bool): Enable verbose output

Returns: True if tests passed or collection errors detected and reported

Features:

Automatic detection of collection errors (import errors, syntax errors)
Clear error messages with actionable suggestions
Fast test execution (skips slow tests)
Comprehensive error reporting

`run_comprehensive_tests(logger, output_dir, verbose=False, generate_coverage=False) -> bool`

Description: Run comprehensive test suite with all tests enabled. Includes slow tests, performance tests, and full coverage analysis.

Parameters:

logger (logging.Logger): Logger instance for progress reporting
output_dir (Path): Output directory for test results
verbose (bool): Enable verbose output (default: False)
generate_coverage (bool): Generate coverage reports (default: False)

Returns: True if tests passed, False otherwise

Features:

Executes all test categories from MODULAR_TEST_CATEGORIES
Includes slow and performance tests
Generates comprehensive coverage reports if enabled
Uses category-based execution with resource monitoring

`run_fast_reliable_tests(logger, output_dir, verbose=False) -> bool`

Description: Run a reliable subset of fast tests with improved error handling. Focuses on essential tests that should always pass.

Parameters:

logger (logging.Logger): Logger instance for progress reporting
output_dir (Path): Output directory for test results
verbose (bool): Enable verbose output (default: False)

Returns: True if tests passed, False otherwise

Features:

Runs only essential test files: test_core_modules.py, test_fast_suite.py, test_main_orchestrator.py
90-second timeout for reliability
Improved error handling and reporting
Used as recovery when fast pipeline tests are not suitable

`_extract_collection_errors(stdout, stderr) -> List[str]`

Description: Extract and parse collection errors from pytest output. Detects import errors, syntax errors, and other collection failures.

Parameters:

stdout (str): Standard output from pytest
stderr (str): Standard error from pytest

Returns: List of unique error messages (strings)

Error Types Detected:

ERROR collecting - Test file collection failures
NameError - Missing variable/import names
ImportError - Module import failures
SyntaxError - Code syntax issues

Example:

errors = _extract_collection_errors(pytest_stdout, pytest_stderr)
# Returns: ["test_file.py: ImportError: No module named 'missing_module'"]

Dependencies

Required Dependencies

pytest - Test framework
pytest-cov - Coverage analysis
pathlib - Path manipulation

Optional Dependencies

pytest-xdist - Parallel test execution
pytest-benchmark - Performance benchmarking
pytest-html - HTML test reports

Internal Dependencies

utils.pipeline_template - Pipeline utilities

Configuration

Test Settings

TEST_CONFIG = {
    'fast_tests': True,
    'standard_tests': True,
    'slow_tests': False,
    'performance_tests': False,
    'coverage_analysis': True,
    'parallel_execution': True,
    'timeout': 300
}

Test Categories

TEST_CATEGORIES = {
    'unit': ['test_*unit*.py'],
    'integration': ['test_*integration*.py'],
    'performance': ['test_*performance*.py'],
    'slow': ['test_*slow*.py']
}

Usage Examples

Run Test Suite

from tests.runner import run_tests

success = run_tests(
    logger=logger,
    output_dir=Path("output/2_tests_output"),
    verbose=True,
    comprehensive=True
)

Run Fast Tests Only

from tests import run_tests
from pathlib import Path
import logging

logger = logging.getLogger(__name__)
success = run_tests(
    logger=logger,
    output_dir=Path("output/2_tests_output"),
    verbose=True,
    fast_only=True,
    comprehensive=False
)

Run Comprehensive Test Suite

from tests import run_tests
from pathlib import Path
import logging

logger = logging.getLogger(__name__)
success = run_tests(
    logger=logger,
    output_dir=Path("output/2_tests_output"),
    verbose=True,
    comprehensive=True,
    generate_coverage=True
)

Run Fast Reliable Tests

from tests.runner import run_fast_reliable_tests
from pathlib import Path
import logging

logger = logging.getLogger(__name__)
success = run_fast_reliable_tests(
    logger=logger,
    output_dir=Path("output/2_tests_output"),
    verbose=True
)

Run Comprehensive Tests Directly

from tests.runner import run_comprehensive_tests
from pathlib import Path
import logging

logger = logging.getLogger(__name__)
success = run_comprehensive_tests(
    logger=logger,
    output_dir=Path("output/2_tests_output"),
    verbose=True,
    generate_coverage=True
)

Output Specification

Output Products

test_results.json - Test execution results
coverage.xml - Coverage analysis report
test_report.html - HTML test report
performance_report.json - Performance analysis
test_summary.md - Human-readable test summary

Output Directory Structure

output/2_tests_output/
├── test_results.json
├── coverage.xml
├── test_report.html
├── performance_report.json
├── test_summary.md
└── test_details/
    ├── unit_tests/
    ├── integration_tests/
    └── performance_tests/

Performance Characteristics

Latest Execution

Duration: ~5-15 minutes for comprehensive suite
Memory: ~100-300MB during test execution
Status: ✅ Production Ready

Expected Performance

Fast Tests: 1-3 minutes
Standard Tests: 3-8 minutes
Slow Tests: 5-15 minutes
Performance Tests: 10-30 minutes

Error Handling

Test Errors

Test Failures: Individual test case failures
Collection Errors: Import errors, syntax errors, or missing dependencies during test collection
Setup Errors: Test environment setup failures
Dependency Errors: Missing test dependencies
Timeout Errors: Test execution timeouts
Coverage Errors: Coverage analysis failures

Recovery Strategies

Collection Error Detection: Automatic detection and reporting of import/syntax errors during test collection
Test Isolation: Run tests in isolation on failure
Environment Reset: Reset test environment
Dependency Installation: Install missing dependencies
Timeout Adjustment: Adjust test timeouts
Error Reporting: Comprehensive error documentation with actionable suggestions

Integration Points

Orchestrated By

Script: 2_tests.py (Step 2)
Function: run_tests()

Imports From

utils.pipeline_template - Pipeline utilities

Imported By

main.py - Pipeline orchestration
tests.test_* - Individual test modules

Architecture

The test infrastructure follows the thin orchestrator pattern:

flowchart TD
    A["2_tests.py<br/>(Thin Orchestrator)"] -->|"Calls run_tests()"| B["runner.run_tests()"]
    B -->|"fast_only=True"| C["run_fast_pipeline_tests()"]
    B -->|"comprehensive=True"| D["run_comprehensive_tests()"]
    B -->|"recovery"| E["run_fast_reliable_tests()"]
    
    C --> F["ModularTestRunner"]
    D --> F
    E --> F
    
    F -->|"Category execution"| G["pytest execution"]
    G -->|"Test discovery"| H["Test files"]
    G -->|"Test execution"| I["Test results"]
    I -->|"Result collection"| J["Reporting & Analysis"]
    
    J --> K["JSON reports"]
    J --> L["Markdown summaries"]
    J --> M["Coverage reports"]
    
    style A fill:#e1f5ff
    style B fill:#fff4e1
    style F fill:#e8f5e9
    style G fill:#f3e5f5
    style J fill:#fce4ec

Component Responsibilities:

2_tests.py: Thin orchestrator that handles CLI arguments, logging setup, and delegates to test runner
runner.run_tests(): Main entry point that routes to appropriate test execution mode
run_fast_pipeline_tests(): Default mode - fast tests for quick pipeline validation
run_comprehensive_tests(): Comprehensive mode - all tests with full coverage
run_fast_reliable_tests(): Recovery mode - essential tests only
ModularTestRunner: Category-based test execution with resource monitoring
pytest: Test framework for actual test discovery and execution

Module Relationships

2_tests.py (Thin Orchestrator):

Handles command-line arguments
Sets up logging and output directories
Delegates to tests.run_tests() from tests/__init__.py
Returns standardized exit codes

runner.py (Core Implementation):

Contains all test execution logic
Provides run_tests(), run_fast_pipeline_tests(), run_comprehensive_tests(), etc.
Implements ModularTestRunner for category-based execution
Handles resource monitoring and error recovery

test_utils.py (Shared Utilities):

Provides test fixtures and helper functions
Defines test categories and markers
Provides test data creation utilities
Used by both test files and runner

conftest.py (Pytest Fixtures):

Defines pytest fixtures for all tests
Configures pytest markers
Provides shared test setup/teardown
Handles test environment configuration

Data Flow

Test Discovery → Environment Setup → Test Execution → Result Collection → Report Generation

Adding New Test Categories

To add a new test category to MODULAR_TEST_CATEGORIES in runner.py:

MODULAR_TEST_CATEGORIES["new_module"] = {
    "name": "New Module Tests",
    "description": "Tests for the new module",
    "files": [
        "test_template_overall.py",
        "test_new_module_integration.py"
    ],
    "markers": ["new_module"],  # Optional pytest markers
    "timeout_seconds": 120,      # Category timeout
    "max_failures": 8,           # Max failures before stopping
    "parallel": True              # Allow parallel execution
}

Creating New Test Files

Follow the naming convention:

test_MODULENAME_overall.py - Comprehensive module tests
test_MODULENAME_area.py - Specific area tests (e.g., test_gnn_parsing.py)
test_MODULENAME_integration.py - Integration tests

Example:

# src/tests/test_template_overall.py
import pytest
from pathlib import Path

@pytest.mark.fast
def test_new_module_basic():
    """Test basic functionality."""
    # Test implementation
    pass

@pytest.mark.slow
def test_new_module_complex():
    """Test complex scenarios."""
    # Test implementation
    pass

Testing

Test Files

120+ test_*.py modules under src/tests/ (exact count drifts; use find src/tests -maxdepth 1 -name 'test_*.py' | wc -l)
~1,906 tests in a typical full run with standard Ollama ignores (see root CLAUDE.md / README.md for the exact pytest command)
20+ test categories for organized execution
25+ test markers for selective execution

Test Coverage Statistics

Scale: Treat pytest --collect-only -q as ground truth for item counts; marker filters (-m not slow, ignores) change what runs in the pipeline fast suite.
Fast Tests: Many functions use @pytest.mark.fast
Integration Tests: @pytest.mark.integration
Unit Tests: @pytest.mark.unit
Performance Tests: @pytest.mark.performance
Safe-to-Fail Tests: @pytest.mark.safe_to_fail

Test Execution Modes

Fast Tests (--fast-only): 1-3 minutes, essential validation
Comprehensive Tests (--comprehensive): 5-15 minutes, all tests including slow/performance
Reliable Tests (recovery): Essential tests only, 90-second timeout

Key Test Scenarios

Module Import Validation: All modules can be imported and have expected structure
Core Functionality: All core functions execute correctly with real data
Integration Testing: Cross-module integration with real data flow
Error Handling: Comprehensive error scenario testing with real failure modes
Performance Benchmarking: Performance regression detection
Coverage Analysis: Code coverage tracking and reporting
Test Suite Execution: Test runner functionality and management

Test Quality Standards

No Simulated Usage: All tests use real implementations per testing policy
Real Data: All tests use real, representative data (no synthetic/placeholder data)
Real Dependencies: Tests use real dependencies (skip if unavailable, never simulated)
File-Based Assertions: Tests assert on real file outputs and artifacts
Error Recovery: Tests validate error handling with real failure modes
Performance Monitoring: Built-in timing and resource usage tracking

MCP Integration

Tools Registered

tests.run_suite - Run test suite
tests.run_fast - Run fast tests
tests.get_coverage - Get coverage report
tests.get_performance - Get performance metrics

Tool Endpoints

@mcp_tool("tests.run_suite")
def run_test_suite_tool(output_dir):
    """Run comprehensive test suite"""
    # Implementation

Documentation

README: Module Overview
AGENTS: Agentic Workflows
SPEC: Architectural Specification
SKILL: Capability API

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Tests Module - Agent Scaffolding

Module Overview

Core Functionality

Primary Responsibilities

Key Capabilities

API Reference

Public Functions

run_tests(logger, output_dir, verbose=False, include_slow=False, fast_only=True, comprehensive=False, generate_coverage=False, auto_fallback=True) -> bool

run_fast_pipeline_tests(logger, output_dir, verbose=False) -> bool

run_comprehensive_tests(logger, output_dir, verbose=False, generate_coverage=False) -> bool

run_fast_reliable_tests(logger, output_dir, verbose=False) -> bool

_extract_collection_errors(stdout, stderr) -> List[str]

Dependencies

Required Dependencies

Optional Dependencies

Internal Dependencies

Configuration

Test Settings

Test Categories

Usage Examples

Run Test Suite

Run Fast Tests Only

Run Comprehensive Test Suite

Run Fast Reliable Tests

Run Comprehensive Tests Directly

Output Specification

Output Products

Output Directory Structure

Performance Characteristics

Latest Execution

Expected Performance

Error Handling

Test Errors

Recovery Strategies

Integration Points

Orchestrated By

Imports From

Imported By

Architecture

Module Relationships

Data Flow

Adding New Test Categories

Creating New Test Files

Testing

Test Files

Test Coverage Statistics

Test Execution Modes

Key Test Scenarios

Test Quality Standards

MCP Integration

Tools Registered

Tool Endpoints

Documentation

`run_tests(logger, output_dir, verbose=False, include_slow=False, fast_only=True, comprehensive=False, generate_coverage=False, auto_fallback=True) -> bool`

`run_fast_pipeline_tests(logger, output_dir, verbose=False) -> bool`

`run_comprehensive_tests(logger, output_dir, verbose=False, generate_coverage=False) -> bool`

`run_fast_reliable_tests(logger, output_dir, verbose=False) -> bool`

`_extract_collection_errors(stdout, stderr) -> List[str]`