Testing

Agent Life Space is built test-first. The whole suite is offline (no API calls, no Docker required to run the unit/integration layers, no network), runs in under 30 seconds locally, and is enforced as a hard CI gate.

Total: 1762 passed, 4 skipped, 0 failures as of v1.35.0. The 4 skips are legacy semantic-router tests that need the optional sentence_transformers model — they're skipped by default, not failing.

Test pyramid

                       ▲
                       │
                       │   Security (129)
                       │   - injection, audit, invariants, vault, telegram guards, headless CLI
                       │
                       │   Architecture invariants
                       │   - no hardcoded paths, sandbox default = "1", no orchestrator imports back
                       │
                       │   Governance (60+)
                       │   - tool policy, approval queue, multi-step approval, channel policy,
                       │     operator controls, control plane, deployment contracts
                       │
                       │   Routing & Adversarial (40+)
                       │   - eval, confusion matrix, regression, semantic guard
                       │
                       │   E2E effectiveness (44)
                       │   - full agent wiring, e2e flows
                       │
                       │   Integration (34)
                       │   - cross-module flows, finance, control plane jobs
                       │
                       │   Domain (300+)
                       │   - build, review, brain, memory, finance, control plane, vault
                       │
                       │   Unit (~800)
                       │   - individual modules
                       │
                       ▼

Layer	Tests	Files
Unit	~800	30+ test files (one per source module)
Domain	~300	`test_build_domain.py`, `test_review_domain.py`, `test_brain_core.py`, ...
Integration	34	`test_integration.py`
E2E	44	`test_e2e_effectiveness.py`
Security	129	`test_security.py`, `test_security_audit.py`, `test_security_invariants.py`
Routing	40+	`test_routing_eval.py`, `test_routing_adversarial.py`, `test_routing_confusion.py`
Governance	60+	`test_tool_governance.py`, `test_policy_regression.py`, `test_approval_queue.py`, `test_multi_step_approval.py`, `test_operator_controls.py`, `test_channel_policy.py`, `test_control_plane*.py`, `test_deployment_contracts.py`
Memory	30+	`test_provenance.py`, `test_memory_*.py`
Finance	20+	`test_budget_policy.py`, `test_risk_templates.py`, `test_finance_approval.py`, `test_proposal_lifecycle.py`
Workspace	15+	`test_workspace_persistence.py`, `test_workspace_recovery.py`, `test_workspace_limits.py`
Logging	50+	`test_logger.py`, `test_log_retention.py`
Vault	31	`test_vault.py` (v2 format, legacy migration, wrong-key fail-fast, crash safety)
Operator API	50+	`test_operator_api.py`, `test_dashboard_settlement.py`
Telegram	50+	`test_telegram_operator.py`, `test_telegram_handler.py`
Architecture	14	`test_architecture_invariants.py`
Total	1762	across the whole `tests/` tree

All tests are offline. No API calls. Full suite token cost: $0.00.

Running tests

# Whole suite — under 30s
.venv/bin/python -m pytest tests/ -q

# A single test file
.venv/bin/python -m pytest tests/test_vault.py -v

# A single class
.venv/bin/python -m pytest tests/test_brain_core.py::TestTelegramCliProgrammingDenyGuard -v

# A single test
.venv/bin/python -m pytest tests/test_log_retention.py::TestRetentionEnvContractIsUnified::test_legacy_days_env_is_promoted_to_hours -v

# With coverage
.venv/bin/python -m pytest tests/ --cov=agent --cov-report=term-missing

# Skip slow / network tests (none in this repo, but the marker exists)
.venv/bin/python -m pytest tests/ -q -m "not slow"

The default -q mode prints one dot per test. tests/conftest.py configures pytest to:

treat DeprecationWarning as error (CI level)
enable asyncio mode (so async def test_* works without decorators)
use a temp directory for every test that needs filesystem state (no cross-test pollution)

CI gates (hard, no `|| true`)

GitHub Actions runs everything below on every push and PR. Any failure blocks merge.

Gate	Tool	Threshold
Lint	`ruff check agent/ tests/`	0 errors
Auto-fix imports	`ruff check --fix --select I`	applied automatically
Type check	`mypy agent --ignore-missing-imports`	0 errors across 112 source files
Tests	`pytest tests/ -q -W error::DeprecationWarning`	0 failures
Test count	`pytest -q` final line	≥ 1300 (currently 1762)
Performance	`timeout 90 pytest tests/`	Under 90 seconds wallclock
Architecture invariants	`pytest tests/test_architecture_invariants.py -v`	All pass
Security audit	`pytest tests/test_security_audit.py -v`	All pass
Review eval (smoke)	`pytest tests/test_review_eval_smoke.py -q`	All pass
Review eval (golden)	`pytest tests/test_review_eval_golden.py -q`	All pass
Release readiness	`python -m agent --release-readiness ...`	`ready: true` (with `AGENT_RELEASE_READINESS_SKIP_LLM_PROBE=1` on CI runners)
Operator typecheck	`cd operator && npm ci && npm run typecheck`	0 errors

The release-readiness gate skips the live LLM probe on CI because the GitHub Actions runners don't have the Claude CLI installed. The skipped probe is still recorded in the readiness payload — see Logging and the AGENT_RELEASE_READINESS_SKIP_LLM_PROBE env var.

Architecture invariants

tests/test_architecture_invariants.py enforces structural rules at CI time. These are the rules that "obviously" should be true but degrade silently if you don't test them:

Test	Invariant
`test_no_hardcoded_paths`	No `os.path.expanduser("~")` or hardcoded `/Users/` / `/home/` paths in `agent/`
`test_no_duplicate_persona`	The system prompt template appears in exactly one place
`test_sandbox_default_is_one`	`AGENT_SANDBOX_ONLY` defaults to `"1"` in `llm_provider.py`
`test_skip_permissions_is_guarded`	`--dangerously-skip-permissions` is always behind a sandbox check
`test_no_orchestrator_import_back`	No module under `agent/build/`, `agent/review/`, `agent/control/`, etc. imports `AgentOrchestrator` directly
`test_security_model_doc_exists`	`docs/SECURITY_MODEL.md` is present and covers required sections
`test_no_circular_imports`	The dependency graph has no cycles at module level
`test_storage_layers_use_parameterized_queries`	No raw `f"SELECT ... {var}"` patterns in storage layers

These tests catch architecture drift early. If a refactor accidentally introduces a circular import or a hardcoded path, CI tells you on the first push.

Security tests (129)

Three files, all run on every commit:

`tests/test_security.py` (66 tests)

Class	Tests
`TestPromptInjection`	10+ attack patterns (EN + SK), hard block + soft block
`TestSafeMode`	Non-owner restrictions in groups, command list filtering
`TestOwnerEnforcement`	Whitelist enforcement, owner-only commands, identity capture
`TestChannelPolicy`	Trust levels, response filtering per channel
`TestApprovalGate`	Approval-required tools, multi-step paths

`tests/test_security_audit.py` (50+ tests)

Class	Tests
`TestNoHardcodedSecrets`	AST scan for embedded API keys, tokens, passwords
`TestSqlSafety`	All queries use parameterization; dynamic DDL uses whitelist
`TestEvalExecBan`	No `eval()` / `exec()` in `agent/`
`TestVaultIntegration`	Vault is encrypted at rest, fail-fast without key
`TestSandboxIsolation`	Docker flags present, image whitelist enforced
`TestApiAuth`	Mutation endpoints require Bearer; no `?key=` fallback
`TestLogRedaction`	Secrets never reach log output
`TestSubprocessSafety`	All subprocess calls quoted, no shell injection
`TestEnvVarSecurity`	No `os.environ[KEY]` (use `.get(KEY, default)`)

`tests/test_security_invariants.py` (13 tests)

Architecture-level invariants pulled out from the broader audit suite:

Sandbox default = "1"
--dangerously-skip-permissions is guarded
Tool capability manifest is complete (every executor tool has an entry)
HIGH risk tools are owner-only AND safe-mode-blocked
EXTERNAL side-effect tools are owner-only
Read-only tools have NONE side effects
Persona module is the only place that defines the system prompt

Notable test files added in v1.35.0

File	Purpose
`tests/test_vault.py`	v2 format spec, legacy v1 migration, wrong-key fail-fast, crash safety (31 tests)
`tests/test_log_retention.py`	tier resolver, retention manager, env contract unification (50+ tests)
`tests/test_llm_runtime.py`	runtime override resolver, brain backend resolution (8 tests)
`tests/test_brain_core.py::TestTelegramCliProgrammingDenyGuard`	5 scenarios for the brain fail-closed guard
`tests/test_brain_core.py::TestExplicitWorkQueueDetector`	10 scenarios for the anti-echo work-queue guard
`tests/test_brain_core.py::TestShortFollowupGetsHistory`	regression for "ano" losing conversation history
`tests/test_telegram_operator.py::TestTypingIndicatorCleanup`	typing task leak detection
`tests/test_telegram_operator.py::TestAgentCronStop`	cron stop awaits cancelled tasks
`tests/test_review_domain.py::TestReviewStorageSqlHardening`	SQL injection guard for review storage

Test fixtures

All test fixtures use neutral hostnames (acme-host-*, example.com, agent-test) — no operator-specific identifiers. A fresh clone of the repo has zero personal data baked into tests.

This was a deliberate cleanup in v1.35.0 after the credential leak post-mortem. See docs/SECURITY_INCIDENT_2026-04-07.md.

What we don't test

Live LLM calls. Every test uses a MagicMock or a recorded fixture. The release-readiness gate runs a single live probe on operator hosts, not in CI.
Docker container execution. Sandbox tests verify the flags passed to docker run, not that an actual container runs. Container-level tests are out-of-scope.
Cross-process integration. SQLite WAL behaviour with two processes is tested at the unit level (locking, retry) but not via real concurrent processes.
Network ops at scale. Rate limiting is tested with mocked clocks. Real load testing is the operator's job.
Telegram Bot API. Mocked. We test our handler, not Telegram's wire protocol.

Adding a new test

Find the right file. Module → tests/test_<module>.py. New cross-cutting concern → new file with a descriptive name (e.g. tests/test_log_retention.py).
Use the existing fixtures. conftest.py has tmp_path, monkeypatch, async support. Don't reinvent.
Name the class after the behaviour. TestVaultV2Format, not TestVault. Each class is one specific contract.
Make it deterministic. No real time (time.time()), no real network, no real Docker. Use freezegun or monkeypatch if you need clock control.
Run it. pytest tests/test_<your_file>.py -v.
Confirm full suite still passes. pytest tests/ -q. CI runs this for you, but better to know locally first.

Test count history

Release	Tests
v1.35.0	1762
v1.34.0	~1734
v1.33.0	~1700
v1.32.0	~1670
v1.31.0	1631
v1.30.0	1627
v1.29.0	1617
v1.25.0	1541

Tests grow with features. We don't delete tests when features change — we update them. The suite is the contract, not the documentation.

Repo · CHANGELOG · Releases · Issues · MIT License

Agent Life Space

v1.35.0 · Latest Release

Getting started

Architecture

Subsystems

Development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing

Testing

Test pyramid

Running tests

CI gates (hard, no `|| true`)

Architecture invariants

Security tests (129)

`tests/test_security.py` (66 tests)

`tests/test_security_audit.py` (50+ tests)

`tests/test_security_invariants.py` (13 tests)

Notable test files added in v1.35.0

Test fixtures

What we don't test

Adding a new test

Test count history

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agent Life Space

Clone this wiki locally

Testing

Testing

Test pyramid

Running tests

CI gates (hard, no || true)

Architecture invariants

Security tests (129)

tests/test_security.py (66 tests)

tests/test_security_audit.py (50+ tests)

tests/test_security_invariants.py (13 tests)

Notable test files added in v1.35.0

Test fixtures

What we don't test

Adding a new test

Test count history

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agent Life Space

Clone this wiki locally

CI gates (hard, no `|| true`)

`tests/test_security.py` (66 tests)

`tests/test_security_audit.py` (50+ tests)

`tests/test_security_invariants.py` (13 tests)