Testing

Test Pyramid

          /\
         /  \      Security (127)
        /    \     - 3 files: injection, audit, invariants
       /------\
      /        \   Governance (30+)
     /          \  - policy, approval, operator, channel
    /------------\
   /              \ Routing & Adversarial (40+)
  /                \- eval, confusion, regression
 /------------------\
|                    | E2E Effectiveness (44)
|                    | - full agent flow, wiring
|--------------------+
|                    | Integration (34)
|                    | - cross-module, finance
|--------------------+
|                    | Unit (~580)
|                    | - individual modules
+--------------------+

Layer	Tests	Files	Token Cost
Unit	~580	28 test files	$0.00
Integration	34	`test_integration.py`	$0.00
E2E	44	`test_e2e_effectiveness.py`	$0.00
Security	127	`test_security.py`, `test_security_audit.py`, `test_security_invariants.py`	$0.00
Routing	40+	`test_routing_eval.py`, `test_routing_adversarial.py`, `test_routing_confusion.py`	$0.00
Governance	30+	`test_tool_governance.py`, `test_policy_regression.py`, `test_policy_simulation.py`, `test_approval_queue.py`, `test_multi_step_approval.py`, `test_operator_controls.py`, `test_channel_policy.py`	$0.00
Memory	30+	`test_provenance.py`, `test_memory_conflicts.py`, `test_memory_consolidation.py`, `test_memory_separation.py`, `test_memory_inspection.py`	$0.00
Finance	20+	`test_budget_policy.py`, `test_risk_templates.py`, `test_finance_approval.py`, `test_proposal_lifecycle.py`	$0.00
Workspace	15+	`test_workspace_persistence.py`, `test_workspace_recovery.py`, `test_workspace_limits.py`	$0.00
Regression	14	`test_audit_v2_regressions.py`	$0.00
Other	30+	`test_smoke.py`, `test_action_envelope.py`, `test_agent_status.py`, `test_explanation.py`, etc.	$0.00
Total	1,260+	current suite	$0.00

All tests are offline. No API calls, no network, no Docker needed.

Recent Coverage Additions (v1.5.0)

persisted JobPlan handoff record and execution trace coverage
workspace join and builder delivery lifecycle coverage
repo-aware verification discovery coverage
policy-driven post-build review coverage

Running Tests

# All tests
.venv/bin/python -m pytest tests/ -q

# Specific layer
.venv/bin/python -m pytest tests/test_security_audit.py -v
.venv/bin/python -m pytest tests/test_policy_regression.py -v
.venv/bin/python -m pytest tests/test_audit_v2_regressions.py -v

# Single test class
.venv/bin/python -m pytest tests/test_integration.py::TestFinanceIntegration -v

# With coverage
.venv/bin/python -m pytest tests/ --cov=agent --cov-report=term-missing

CI Quality Gates (Hard)

All gates are hard failures — no || true:

Gate	Tool	Threshold
Lint	ruff	0 errors
Type check	mypy	0 errors (10 core modules)
Tests	pytest	0 failures
Test count	pytest	>= 1000 tests
Performance	timeout	< 60 seconds
Architecture	grep	No duplicate persona, no hardcoded paths, sandbox default = "1"
Security	pytest	test_security_audit.py passes

Key Test Categories

Security Tests (127)

Prompt injection: 10+ attack patterns in EN + SK
Hardcoded secrets: AST scan of all .py files
SQL safety: parameterized queries only
Sandbox isolation: Docker flags, image whitelist
API auth: mutation endpoints require Bearer token

Governance Tests (30+)

Tool policy: deny-by-default, channel restriction, owner-only, safe mode
Approval: multi-step, TTL expiry, same-person dedup
Operator controls: disable/enable, lockdown/unlock
Channel policy: trust levels, response filtering

Regression Tests (14)

Restricted channel enforcement (5 tests)
Approval enforcement (2 tests)
Deny-by-default completeness (2 tests)
Status lifecycle — no stuck states (3 tests)
Channel file access policy (1 test)

Repo · CHANGELOG · Releases · Issues · MIT License

Agent Life Space

v1.35.0 · Latest Release

Getting started

Architecture

Subsystems

Development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing

Testing

Test Pyramid

Recent Coverage Additions (v1.5.0)

Running Tests

CI Quality Gates (Hard)

Key Test Categories

Security Tests (127)

Governance Tests (30+)

Regression Tests (14)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agent Life Space

Clone this wiki locally