Date: 2024-12-17
Auditor: Claude (with Aditya)
Scope: tpot-analyzer/tests/ (7,315 lines across 27 files)
The test suite has strong algorithmic coverage but suffers from:
- Test theater — 6 placeholder tests that just
pytest.skip() - Misleading names — "integration" tests that mock everything
- Weak assertions — tests that only check
status == 200 - Production data dependencies — 7 files skip when
cache.dbmissing - Missing E2E coverage — no tests for cluster routes (831 lines) or frontend
Recommendation: Delete placeholders, rename misleading files, add contract tests, create E2E fixtures.
Six placeholder tests that provide zero value:
# These are marked @pytest.mark.skip and just call pytest.skip() again
class TestFetchListMembersIntegration:
def test_fetch_real_list_members(self):
pytest.skip("Integration test not implemented yet")
def test_fetch_list_with_lazy_loading(self):
pytest.skip("Integration test not implemented yet")
def test_fetch_private_list_requires_auth(self):
pytest.skip("Integration test not implemented yet")
def test_list_scraping_end_to_end_workflow(self):
pytest.skip("Integration test not implemented yet")
class TestListIDDetectionEndToEnd:
def test_numeric_center_triggers_list_mode(self):
pytest.skip("Integration test not implemented yet")
def test_alphanumeric_center_triggers_username_mode(self):
pytest.skip("Integration test not implemented yet")Why delete: These are test plans masquerading as tests. They inflate test count, provide false sense of coverage, and the "Test Plan" comments should be in documentation.
Action:
# Delete lines 57-158 from test_list_scraping.py
# OR move the test plans to docs/test-plans/list-scraping-tests.mdProblem: File name says "integration" but contains 201 mock references:
$ grep -c "mock\|Mock\|patch\|MagicMock" tests/test_shadow_enrichment_integration.py
201Evidence of mocking everything:
# From the file - this is not integration testing
@pytest.fixture
def enricher(mock_shadow_store, mock_enrichment_policy, mock_enrichment_config, mock_selenium_worker):
"""Create enricher with all dependencies mocked."""
enricher = ShadowEnricher(...)
enricher._selenium = mock_selenium_worker # Mock
enricher._store = mock_shadow_store # Mock
return enricherWhat it actually tests: Orchestration logic - that the enricher calls the right methods in the right order with the right arguments. This is valuable but calling it "integration" is misleading.
Action:
git mv tests/test_shadow_enrichment_integration.py tests/test_enricher_orchestration.py
# Update any importsCurrent state (problematic patterns found on these lines):
# Line 41, 57, 80, 118, 194 - Only checks status code
assert response.status_code == 200
# Line 66-67, 136-137, 140, 199 - Only checks non-empty
assert len(data["nodes"]) > 0
assert len(data["edges"]) > 0Problem: These tests pass if the API returns {"nodes": ["garbage"], "edges": [1]}. They don't validate:
- Schema correctness
- Required fields
- Data types
- Value ranges
Recommended fix - Add JSON schema validation:
# tests/schemas/api_schemas.py (new file)
CLUSTERS_RESPONSE_SCHEMA = {
"type": "object",
"required": ["clusters", "edges", "positions", "meta"],
"properties": {
"clusters": {
"type": "array",
"items": {
"type": "object",
"required": ["id", "size", "label"],
"properties": {
"id": {"type": "string", "pattern": "^d_\\d+$"},
"size": {"type": "integer", "minimum": 1},
"label": {"type": "string"},
"parentId": {"type": ["string", "null"]},
"childrenIds": {"type": ["array", "null"]},
}
}
},
"positions": {
"type": "object",
"additionalProperties": {
"type": "array",
"items": {"type": "number"},
"minItems": 2,
"maxItems": 2
}
},
"meta": {
"type": "object",
"required": ["budget", "budget_remaining"],
"properties": {
"budget": {"type": "integer", "minimum": 1},
"budget_remaining": {"type": "integer", "minimum": 0}
}
}
}
}
# In test_api.py
from jsonschema import validate
from tests.schemas.api_schemas import CLUSTERS_RESPONSE_SCHEMA
def test_clusters_response_shape(test_client):
response = test_client.get('/api/clusters?n=10')
assert response.status_code == 200
data = json.loads(response.data)
validate(data, CLUSTERS_RESPONSE_SCHEMA) # Fails on schema violationsThese 7 files skip tests when production data/cache.db is missing:
| File | Lines | Skips When |
|---|---|---|
test_api.py |
316 | cache.db missing |
test_connection.py |
67 | Supabase credentials missing |
test_shadow_archive_consistency.py |
165 | cache.db or Supabase missing |
test_shadow_coverage.py |
127 | cache.db missing |
test_shadow_enricher_utils.py |
234 | cache.db missing |
test_shadow_enrichment_integration.py |
486 | Production fixtures missing |
conftest.py |
213 | Defines skip markers |
Problem: Tests should use fixtures, not production data. This creates:
- Non-deterministic test results across machines
- CI failures when secrets aren't configured
- False confidence from green tests on dev machines
Recommended fix: Create deterministic test fixtures:
# tests/fixtures/create_test_db.py
import sqlite3
from pathlib import Path
def create_test_cache_db(path: Path) -> None:
"""Create a minimal but realistic test database."""
conn = sqlite3.connect(path)
# Create tables matching production schema
conn.executescript("""
CREATE TABLE shadow_account (
account_id TEXT PRIMARY KEY,
username TEXT NOT NULL,
display_name TEXT,
bio TEXT,
followers_total INTEGER DEFAULT 0,
following_total INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE shadow_edge (
source_id TEXT NOT NULL,
target_id TEXT NOT NULL,
edge_type TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (source_id, target_id, edge_type)
);
CREATE INDEX idx_edge_source ON shadow_edge(source_id);
CREATE INDEX idx_edge_target ON shadow_edge(target_id);
""")
# Insert deterministic test data (50 accounts, predictable relationships)
accounts = [(f"id_{i}", f"user_{i}", f"User {i}", f"Bio {i}", i * 100, i * 50)
for i in range(50)]
conn.executemany(
"INSERT INTO shadow_account VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)",
accounts
)
# Create a small-world network
edges = []
for i in range(50):
# Each account follows next 3 accounts (wrapping)
for j in range(1, 4):
target = (i + j) % 50
edges.append((f"id_{i}", f"id_{target}", "following"))
conn.executemany(
"INSERT INTO shadow_edge VALUES (?, ?, ?, CURRENT_TIMESTAMP)",
edges
)
conn.commit()
conn.close()These test files demonstrate proper testing practices:
- Tests HTML parsing with real HTML fixtures
- No mocks on the code under test
- Edge cases covered (malformed HTML, missing fields)
- Tests DOM extraction with realistic payloads
- Covers UserCell, ProfileOverview, pagination
- Uses fixture files, not production data
- Tests PageRank, composite scoring, HITS
- Uses fixture NetworkX graphs
- Validates mathematical properties
- Tests rate limiting with time mocking
- Tests retry logic with mock responses
- Good isolation of API client behavior
- Tests centrality algorithms on small graphs
- Mathematical assertions (betweenness sums to N-1 choose 2)
- Tests cluster assignment, soft membership
- Uses fixture linkage matrices
- Validates properties (rows sum to 1)
- Tests Laplacian construction
- Tests eigenvalue properties
- Save/load round-trip
| Component | Lines | Tests | Priority |
|---|---|---|---|
src/api/cluster_routes.py |
831 | 0 | 🔴 HIGH |
src/api/discovery.py |
553 | 0 | 🔴 HIGH |
src/graph/hierarchy/builder.py |
500+ | 4 | 🟡 MEDIUM |
graph-explorer/src/ClusterView.jsx |
800+ | 0 E2E | 🔴 HIGH |
graph-explorer/src/Discovery.jsx |
400+ | 0 E2E | 🟡 MEDIUM |
| File | Status | Notes |
|---|---|---|
e2e/cluster_mock.spec.ts |
✅ 8/8 passing | Uses mocked API |
e2e/cluster_real.spec.ts |
Runs against scripts.start_api_server |
|
e2e/teleport_tagging_mock.spec.ts |
✅ Works | Mocked API |
Found in test_shadow_enricher_utils.py and test_shadow_enrichment_integration.py:
# This pattern tests that code calls a mock, not that it works correctly
mock_store.get_shadow_list.return_value = fresh_list
# ... code runs ...
mock_store.get_shadow_list.assert_called_once_with("list123")Why this is weak: If the implementation changes to use a different method that achieves the same result, the test fails even though behavior is correct.
Better approach: Test outcomes, not implementation:
# Instead of asserting mock calls, assert the result
result = enricher.process_list("list123")
assert result.members == expected_members
assert result.processed_at is not None| Metric | Value |
|---|---|
| Total test files | 27 |
| Total lines | 7,315 |
| Placeholder tests | 6 |
| Files requiring prod data | 7 |
| Mock references in "integration" test | 201 |
| API routes with 0 tests | 2 (1,384 lines) |
| Metric | Target |
|---|---|
| Placeholder tests | 0 |
| Files requiring prod data | 0 |
| "Integration" tests that mock everything | 0 (renamed) |
| Contract tests for API routes | 1 per route |
| E2E coverage | All major views |
| File | Lines | Verdict |
|---|---|---|
test_account_status_tracking.py |
89 | ✓ Good |
test_account_tags_store.py |
156 | ✓ Good |
test_accounts_search_teleport_tags.py |
234 | |
test_api.py |
316 | |
test_api_autocomplete.py |
98 | ✓ Good |
test_api_json_serialization.py |
67 | ✓ Good |
test_cluster_tag_summary.py |
145 | ✓ Good |
test_clusters.py |
145 | ✓ Good |
test_connection.py |
67 | |
test_graph_builder.py |
234 | ✓ Good |
test_graph_metrics.py |
187 | ✓ Good |
test_hierarchy_focus_leaf.py |
89 | ✓ Good |
test_hierarchy_math.py |
180 | ✓ Good (new) |
test_hierarchy_members.py |
45 | ✓ Good |
test_list_scraping.py |
158 | ❌ 6 placeholders |
test_parse_compact_count.py |
56 | ✓ Good |
test_scoring.py |
528 | ✓ Good |
test_seeds.py |
123 | ✓ Good |
test_selenium_extraction.py |
554 | ✓ Good |
test_selenium_worker_unit.py |
566 | ✓ Good |
test_shadow_archive_consistency.py |
165 | |
test_shadow_coverage.py |
127 | |
test_shadow_enricher_utils.py |
234 | |
test_shadow_enrichment_integration.py |
486 | ❌ Rename needed |
test_shadow_store_retry.py |
58 | ✓ Good |
test_shadow_store_unit.py |
884 | ✓ Good |
test_spectral.py |
72 | ✓ Good |
test_x_api_client.py |
458 | ✓ Good |
# Count lines per file
wc -l tests/*.py | sort -n
# Find mock usage
grep -c "mock\|Mock\|patch" tests/*.py | sort -t: -k2 -n
# Find skip markers
grep -r "pytest.skip\|@pytest.mark.skip" tests/
# Find prod data dependencies
grep -l "cache.db\|SUPABASE" tests/*.py
# Find weak assertions
grep -n "assert.*status_code.*200" tests/test_api.py
grep -n "assert.*len.*>" tests/test_api.py