-
Notifications
You must be signed in to change notification settings - Fork 0
Testing
Daniel Babjak edited this page Mar 27, 2026
·
21 revisions
/\
/ \ Security (127)
/ \ - 3 files: injection, audit, invariants
/------\
/ \ Governance (30+)
/ \ - policy, approval, operator, channel
/------------\
/ \ Routing & Adversarial (40+)
/ \- eval, confusion, regression
/------------------\
| | E2E Effectiveness (44)
| | - full agent flow, wiring
|--------------------+
| | Integration (34)
| | - cross-module, finance
|--------------------+
| | Unit (~580)
| | - individual modules
+--------------------+
| Layer | Tests | Files | Token Cost |
|---|---|---|---|
| Unit | ~580 | 28 test files | $0.00 |
| Integration | 34 | test_integration.py |
$0.00 |
| E2E | 44 | test_e2e_effectiveness.py |
$0.00 |
| Security | 127 |
test_security.py, test_security_audit.py, test_security_invariants.py
|
$0.00 |
| Routing | 40+ |
test_routing_eval.py, test_routing_adversarial.py, test_routing_confusion.py
|
$0.00 |
| Governance | 30+ |
test_tool_governance.py, test_policy_regression.py, test_policy_simulation.py, test_approval_queue.py, test_multi_step_approval.py, test_operator_controls.py, test_channel_policy.py
|
$0.00 |
| Memory | 30+ |
test_provenance.py, test_memory_conflicts.py, test_memory_consolidation.py, test_memory_separation.py, test_memory_inspection.py
|
$0.00 |
| Finance | 20+ |
test_budget_policy.py, test_risk_templates.py, test_finance_approval.py, test_proposal_lifecycle.py
|
$0.00 |
| Workspace | 15+ |
test_workspace_persistence.py, test_workspace_recovery.py, test_workspace_limits.py
|
$0.00 |
| Regression | 14 | test_audit_v2_regressions.py |
$0.00 |
| Other | 30+ |
test_smoke.py, test_action_envelope.py, test_agent_status.py, test_explanation.py, etc. |
$0.00 |
| Total | 1,260+ | current suite | $0.00 |
All tests are offline. No API calls, no network, no Docker needed.
- persisted
JobPlanhandoff record and execution trace coverage - workspace join and builder delivery lifecycle coverage
- repo-aware verification discovery coverage
- policy-driven post-build review coverage
# All tests
.venv/bin/python -m pytest tests/ -q
# Specific layer
.venv/bin/python -m pytest tests/test_security_audit.py -v
.venv/bin/python -m pytest tests/test_policy_regression.py -v
.venv/bin/python -m pytest tests/test_audit_v2_regressions.py -v
# Single test class
.venv/bin/python -m pytest tests/test_integration.py::TestFinanceIntegration -v
# With coverage
.venv/bin/python -m pytest tests/ --cov=agent --cov-report=term-missingAll gates are hard failures — no || true:
| Gate | Tool | Threshold |
|---|---|---|
| Lint | ruff | 0 errors |
| Type check | mypy | 0 errors (10 core modules) |
| Tests | pytest | 0 failures |
| Test count | pytest | >= 1000 tests |
| Performance | timeout | < 60 seconds |
| Architecture | grep | No duplicate persona, no hardcoded paths, sandbox default = "1" |
| Security | pytest | test_security_audit.py passes |
- Prompt injection: 10+ attack patterns in EN + SK
- Hardcoded secrets: AST scan of all .py files
- SQL safety: parameterized queries only
- Sandbox isolation: Docker flags, image whitelist
- API auth: mutation endpoints require Bearer token
- Tool policy: deny-by-default, channel restriction, owner-only, safe mode
- Approval: multi-step, TTL expiry, same-person dedup
- Operator controls: disable/enable, lockdown/unlock
- Channel policy: trust levels, response filtering
- Restricted channel enforcement (5 tests)
- Approval enforcement (2 tests)
- Deny-by-default completeness (2 tests)
- Status lifecycle — no stuck states (3 tests)
- Channel file access policy (1 test)
v1.35.0 · Latest Release
Getting started
Architecture
Subsystems
- Security model
- Vault
- Tiered logging
- Runtime LLM control
- Build pipeline
- Review pipeline
- Finance
- Cron & Maintenance
Development