Test Suite Audit Report

Date: 2024-12-17
Auditor: Claude (with Aditya)
Scope: tpot-analyzer/tests/ (7,315 lines across 27 files)

Executive Summary

The test suite has strong algorithmic coverage but suffers from:

Test theater — 6 placeholder tests that just pytest.skip()
Misleading names — "integration" tests that mock everything
Weak assertions — tests that only check status == 200
Production data dependencies — 7 files skip when cache.db missing
Missing E2E coverage — no tests for cluster routes (831 lines) or frontend

Recommendation: Delete placeholders, rename misleading files, add contract tests, create E2E fixtures.

Part 1: Tests to DELETE

`test_list_scraping.py` lines 57-158

Six placeholder tests that provide zero value:

# These are marked @pytest.mark.skip and just call pytest.skip() again
class TestFetchListMembersIntegration:
    def test_fetch_real_list_members(self):
        pytest.skip("Integration test not implemented yet")
    
    def test_fetch_list_with_lazy_loading(self):
        pytest.skip("Integration test not implemented yet")
    
    def test_fetch_private_list_requires_auth(self):
        pytest.skip("Integration test not implemented yet")
    
    def test_list_scraping_end_to_end_workflow(self):
        pytest.skip("Integration test not implemented yet")

class TestListIDDetectionEndToEnd:
    def test_numeric_center_triggers_list_mode(self):
        pytest.skip("Integration test not implemented yet")
    
    def test_alphanumeric_center_triggers_username_mode(self):
        pytest.skip("Integration test not implemented yet")

Why delete: These are test plans masquerading as tests. They inflate test count, provide false sense of coverage, and the "Test Plan" comments should be in documentation.

Action:

# Delete lines 57-158 from test_list_scraping.py
# OR move the test plans to docs/test-plans/list-scraping-tests.md

Part 2: Tests to RENAME

`test_shadow_enrichment_integration.py` → `test_enricher_orchestration.py`

Problem: File name says "integration" but contains 201 mock references:

$ grep -c "mock\|Mock\|patch\|MagicMock" tests/test_shadow_enrichment_integration.py
201

Evidence of mocking everything:

# From the file - this is not integration testing
@pytest.fixture
def enricher(mock_shadow_store, mock_enrichment_policy, mock_enrichment_config, mock_selenium_worker):
    """Create enricher with all dependencies mocked."""
    enricher = ShadowEnricher(...)
    enricher._selenium = mock_selenium_worker  # Mock
    enricher._store = mock_shadow_store  # Mock
    return enricher

What it actually tests: Orchestration logic - that the enricher calls the right methods in the right order with the right arguments. This is valuable but calling it "integration" is misleading.

Action:

git mv tests/test_shadow_enrichment_integration.py tests/test_enricher_orchestration.py
# Update any imports

Part 3: Tests to STRENGTHEN

`test_api.py` - Weak Assertions

Current state (problematic patterns found on these lines):

# Line 41, 57, 80, 118, 194 - Only checks status code
assert response.status_code == 200

# Line 66-67, 136-137, 140, 199 - Only checks non-empty
assert len(data["nodes"]) > 0
assert len(data["edges"]) > 0

Problem: These tests pass if the API returns {"nodes": ["garbage"], "edges": [1]}. They don't validate:

Schema correctness
Required fields
Data types
Value ranges

Recommended fix - Add JSON schema validation:

# tests/schemas/api_schemas.py (new file)
CLUSTERS_RESPONSE_SCHEMA = {
    "type": "object",
    "required": ["clusters", "edges", "positions", "meta"],
    "properties": {
        "clusters": {
            "type": "array",
            "items": {
                "type": "object",
                "required": ["id", "size", "label"],
                "properties": {
                    "id": {"type": "string", "pattern": "^d_\\d+$"},
                    "size": {"type": "integer", "minimum": 1},
                    "label": {"type": "string"},
                    "parentId": {"type": ["string", "null"]},
                    "childrenIds": {"type": ["array", "null"]},
                }
            }
        },
        "positions": {
            "type": "object",
            "additionalProperties": {
                "type": "array",
                "items": {"type": "number"},
                "minItems": 2,
                "maxItems": 2
            }
        },
        "meta": {
            "type": "object",
            "required": ["budget", "budget_remaining"],
            "properties": {
                "budget": {"type": "integer", "minimum": 1},
                "budget_remaining": {"type": "integer", "minimum": 0}
            }
        }
    }
}

# In test_api.py
from jsonschema import validate
from tests.schemas.api_schemas import CLUSTERS_RESPONSE_SCHEMA

def test_clusters_response_shape(test_client):
    response = test_client.get('/api/clusters?n=10')
    assert response.status_code == 200
    data = json.loads(response.data)
    validate(data, CLUSTERS_RESPONSE_SCHEMA)  # Fails on schema violations

Part 4: Tests to ISOLATE (Production Data Dependencies)

These 7 files skip tests when production data/cache.db is missing:

File	Lines	Skips When
`test_api.py`	316	`cache.db` missing
`test_connection.py`	67	Supabase credentials missing
`test_shadow_archive_consistency.py`	165	`cache.db` or Supabase missing
`test_shadow_coverage.py`	127	`cache.db` missing
`test_shadow_enricher_utils.py`	234	`cache.db` missing
`test_shadow_enrichment_integration.py`	486	Production fixtures missing
`conftest.py`	213	Defines skip markers

Problem: Tests should use fixtures, not production data. This creates:

Non-deterministic test results across machines
CI failures when secrets aren't configured
False confidence from green tests on dev machines

Recommended fix: Create deterministic test fixtures:

# tests/fixtures/create_test_db.py
import sqlite3
from pathlib import Path

def create_test_cache_db(path: Path) -> None:
    """Create a minimal but realistic test database."""
    conn = sqlite3.connect(path)
    
    # Create tables matching production schema
    conn.executescript("""
        CREATE TABLE shadow_account (
            account_id TEXT PRIMARY KEY,
            username TEXT NOT NULL,
            display_name TEXT,
            bio TEXT,
            followers_total INTEGER DEFAULT 0,
            following_total INTEGER DEFAULT 0,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        );
        
        CREATE TABLE shadow_edge (
            source_id TEXT NOT NULL,
            target_id TEXT NOT NULL,
            edge_type TEXT NOT NULL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            PRIMARY KEY (source_id, target_id, edge_type)
        );
        
        CREATE INDEX idx_edge_source ON shadow_edge(source_id);
        CREATE INDEX idx_edge_target ON shadow_edge(target_id);
    """)
    
    # Insert deterministic test data (50 accounts, predictable relationships)
    accounts = [(f"id_{i}", f"user_{i}", f"User {i}", f"Bio {i}", i * 100, i * 50) 
                for i in range(50)]
    conn.executemany(
        "INSERT INTO shadow_account VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)",
        accounts
    )
    
    # Create a small-world network
    edges = []
    for i in range(50):
        # Each account follows next 3 accounts (wrapping)
        for j in range(1, 4):
            target = (i + j) % 50
            edges.append((f"id_{i}", f"id_{target}", "following"))
    
    conn.executemany(
        "INSERT INTO shadow_edge VALUES (?, ?, ?, CURRENT_TIMESTAMP)",
        edges
    )
    
    conn.commit()
    conn.close()

Part 5: Tests That Are Actually Good ✓

These test files demonstrate proper testing practices:

`test_selenium_worker_unit.py` (566 lines) ✓

Tests HTML parsing with real HTML fixtures
No mocks on the code under test
Edge cases covered (malformed HTML, missing fields)

`test_selenium_extraction.py` (554 lines) ✓

Tests DOM extraction with realistic payloads
Covers UserCell, ProfileOverview, pagination
Uses fixture files, not production data

`test_scoring.py` (528 lines) ✓

Tests PageRank, composite scoring, HITS
Uses fixture NetworkX graphs
Validates mathematical properties

`test_x_api_client.py` (458 lines) ✓

Tests rate limiting with time mocking
Tests retry logic with mock responses
Good isolation of API client behavior

`test_graph_metrics.py` (187 lines) ✓

Tests centrality algorithms on small graphs
Mathematical assertions (betweenness sums to N-1 choose 2)

`test_clusters.py` (145 lines) ✓

Tests cluster assignment, soft membership
Uses fixture linkage matrices
Validates properties (rows sum to 1)

`test_spectral.py` (72 lines) ✓

Tests Laplacian construction
Tests eigenvalue properties
Save/load round-trip

Part 6: Missing Test Coverage

Critical Gaps

Component	Lines	Tests	Priority
`src/api/cluster_routes.py`	831	0	🔴 HIGH
`src/api/discovery.py`	553	0	🔴 HIGH
`src/graph/hierarchy/builder.py`	500+	4	🟡 MEDIUM
`graph-explorer/src/ClusterView.jsx`	800+	0 E2E	🔴 HIGH
`graph-explorer/src/Discovery.jsx`	400+	0 E2E	🟡 MEDIUM

Existing E2E Tests

File	Status	Notes
`e2e/cluster_mock.spec.ts`	✅ 8/8 passing	Uses mocked API
`e2e/cluster_real.spec.ts`	⚠️ Requires real backend	Runs against `scripts.start_api_server`
`e2e/teleport_tagging_mock.spec.ts`	✅ Works	Mocked API

Part 7: Circular Mock Assertions (Anti-Pattern)

Found in test_shadow_enricher_utils.py and test_shadow_enrichment_integration.py:

# This pattern tests that code calls a mock, not that it works correctly
mock_store.get_shadow_list.return_value = fresh_list
# ... code runs ...
mock_store.get_shadow_list.assert_called_once_with("list123")

Why this is weak: If the implementation changes to use a different method that achieves the same result, the test fails even though behavior is correct.

Better approach: Test outcomes, not implementation:

# Instead of asserting mock calls, assert the result
result = enricher.process_list("list123")
assert result.members == expected_members
assert result.processed_at is not None

Part 8: Recommended Test Metrics

Current State

Metric	Value
Total test files	27
Total lines	7,315
Placeholder tests	6
Files requiring prod data	7
Mock references in "integration" test	201
API routes with 0 tests	2 (1,384 lines)

Target State

Metric	Target
Placeholder tests	0
Files requiring prod data	0
"Integration" tests that mock everything	0 (renamed)
Contract tests for API routes	1 per route
E2E coverage	All major views

Appendix A: File-by-File Summary

File	Lines	Verdict
`test_account_status_tracking.py`	89	✓ Good
`test_account_tags_store.py`	156	✓ Good
`test_accounts_search_teleport_tags.py`	234	⚠️ Some prod deps
`test_api.py`	316	⚠️ Weak assertions
`test_api_autocomplete.py`	98	✓ Good
`test_api_json_serialization.py`	67	✓ Good
`test_cluster_tag_summary.py`	145	✓ Good
`test_clusters.py`	145	✓ Good
`test_connection.py`	67	⚠️ Requires Supabase
`test_graph_builder.py`	234	✓ Good
`test_graph_metrics.py`	187	✓ Good
`test_hierarchy_focus_leaf.py`	89	✓ Good
`test_hierarchy_math.py`	180	✓ Good (new)
`test_hierarchy_members.py`	45	✓ Good
`test_list_scraping.py`	158	❌ 6 placeholders
`test_parse_compact_count.py`	56	✓ Good
`test_scoring.py`	528	✓ Good
`test_seeds.py`	123	✓ Good
`test_selenium_extraction.py`	554	✓ Good
`test_selenium_worker_unit.py`	566	✓ Good
`test_shadow_archive_consistency.py`	165	⚠️ Requires prod
`test_shadow_coverage.py`	127	⚠️ Requires prod
`test_shadow_enricher_utils.py`	234	⚠️ Mock overuse
`test_shadow_enrichment_integration.py`	486	❌ Rename needed
`test_shadow_store_retry.py`	58	✓ Good
`test_shadow_store_unit.py`	884	✓ Good
`test_spectral.py`	72	✓ Good
`test_x_api_client.py`	458	✓ Good

Appendix B: Commands to Run Audit

# Count lines per file
wc -l tests/*.py | sort -n

# Find mock usage
grep -c "mock\|Mock\|patch" tests/*.py | sort -t: -k2 -n

# Find skip markers
grep -r "pytest.skip\|@pytest.mark.skip" tests/

# Find prod data dependencies
grep -l "cache.db\|SUPABASE" tests/*.py

# Find weak assertions
grep -n "assert.*status_code.*200" tests/test_api.py
grep -n "assert.*len.*>" tests/test_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Suite Audit Report

Executive Summary

Part 1: Tests to DELETE

`test_list_scraping.py` lines 57-158

Part 2: Tests to RENAME

`test_shadow_enrichment_integration.py` → `test_enricher_orchestration.py`

Part 3: Tests to STRENGTHEN

`test_api.py` - Weak Assertions

Part 4: Tests to ISOLATE (Production Data Dependencies)

Part 5: Tests That Are Actually Good ✓

`test_selenium_worker_unit.py` (566 lines) ✓

`test_selenium_extraction.py` (554 lines) ✓

`test_scoring.py` (528 lines) ✓

`test_x_api_client.py` (458 lines) ✓

`test_graph_metrics.py` (187 lines) ✓

`test_clusters.py` (145 lines) ✓

`test_spectral.py` (72 lines) ✓

Part 6: Missing Test Coverage

Critical Gaps

Existing E2E Tests

Part 7: Circular Mock Assertions (Anti-Pattern)

Part 8: Recommended Test Metrics

Current State

Target State

Appendix A: File-by-File Summary

Appendix B: Commands to Run Audit

FilesExpand file tree

TEST_AUDIT.md

Latest commit

History

TEST_AUDIT.md

File metadata and controls

Test Suite Audit Report

Executive Summary

Part 1: Tests to DELETE

test_list_scraping.py lines 57-158

Part 2: Tests to RENAME

test_shadow_enrichment_integration.py → test_enricher_orchestration.py

Part 3: Tests to STRENGTHEN

test_api.py - Weak Assertions

Part 4: Tests to ISOLATE (Production Data Dependencies)

Part 5: Tests That Are Actually Good ✓

test_selenium_worker_unit.py (566 lines) ✓

test_selenium_extraction.py (554 lines) ✓

test_scoring.py (528 lines) ✓

test_x_api_client.py (458 lines) ✓

test_graph_metrics.py (187 lines) ✓

test_clusters.py (145 lines) ✓

test_spectral.py (72 lines) ✓

Part 6: Missing Test Coverage

Critical Gaps

Existing E2E Tests

Part 7: Circular Mock Assertions (Anti-Pattern)

Part 8: Recommended Test Metrics

Current State

Target State

Appendix A: File-by-File Summary

Appendix B: Commands to Run Audit

`test_list_scraping.py` lines 57-158

`test_shadow_enrichment_integration.py` → `test_enricher_orchestration.py`

`test_api.py` - Weak Assertions

`test_selenium_worker_unit.py` (566 lines) ✓

`test_selenium_extraction.py` (554 lines) ✓

`test_scoring.py` (528 lines) ✓

`test_x_api_client.py` (458 lines) ✓

`test_graph_metrics.py` (187 lines) ✓

`test_clusters.py` (145 lines) ✓

`test_spectral.py` (72 lines) ✓