Security

This page describes the threat model, the security boundaries, and the layered controls that enforce them. Code is the source of truth — every claim here is backed by tests in tests/test_security*.py.

Core principle: deny-by-default. Unknown tools are blocked. Restricted channels cannot reach high-risk operations. Vault writes with the wrong key are refused. Programming tasks from untrusted-permission channels never get host file access. There is no AGENT_DEV_MODE bypass.

Threat model

Adversary	Goal	Mitigation
Curious LAN attacker	Probe `/api/*` for soft endpoints	Bearer token on mutation, rate limiting, no `?key=` query string fallback, replay protection (HMAC + nonce cache)
Compromised dependency	Run code with agent privileges	Sandbox (Docker, no-network, read-only FS, image whitelist), tool policy deny-by-default, capability manifest
Operator typo on master key	Re-encrypt vault with wrong key, lose secrets	Wrong-key writes raise `VaultDecryptionError` (fail-fast), the encrypted blob is never touched
Power loss mid-write	Vault on disk in inconsistent state	Single-file v2 format (`ALSv2` magic + embedded salt + Fernet token), atomic `os.replace` + `fsync` (file + parent dir)
Prompt injection	Make the agent ignore instructions or leak secrets	Hard-block + soft-block patterns in EN + SK, secrets never reach prompt context, `redact_secrets()` on log records, channel-aware response filtering
Echoed agent suggestions	Make the operator's own message accidentally schedule destructive jobs	`_detect_explicit_work_queue` anti-echo guard rejects pasted assistant text
Unattended group chat	Non-owner runs privileged commands	Owner whitelist (`TELEGRAM_USER_ID`), safe mode for non-owners in groups, owner-only commands enforced before dispatch
Concurrent finance approve	Two `approve()` calls on the same tx_id both succeed and double-spend	Per-transaction `asyncio.Lock`
Unbounded state growth	Disk fill / replay nonce explosion	Bounded ring buffers (audit log, explanation log), age-based eviction (`request_identity._seen_nonces`), tier-based log retention sweep

Layered controls

1. Input sanitization

Every user-supplied text passes through _sanitize_input() (in agent/social/telegram_handler.py) before any pipeline runs.

Hard block (returns None, request rejected):

ignore all previous instructions
forget all previous context
you are now DAN
<system> override
override your rules/instructions
Slovak: zabudni na všetko, ignoruj všetky pravidlá, nové inštrukcie

Soft block (logged, redacted, but allowed):

pretend you are, act as if, system:, teraz si

Tested in tests/test_security.py::TestPromptInjection.

2. Owner identification + safe mode

Telegram user IDs are whitelisted via TELEGRAM_USER_ID (comma-separated).
Owner gets the full owner-name resolution and unlimited command access.
Non-owners in group chats drop into safe mode: only /start, /help, /status, /health. Everything else returns "Príkaz X je dostupný len pre ownera" — checked before command dispatch (tests/test_security.py::TestSafeMode).
Non-owners in private chats are rejected with "Unauthorized. This bot only responds to its owner."

3. Tool policy (deterministic, deny-by-default)

Every tool has an entry in TOOL_CAPABILITIES (agent/core/tool_policy.py):

Tool	Risk	Side effect	Owner only	Safe mode	Approval
`query_memory`	LOW	none	no	allowed	never
`store_memory`	LOW	internal	no	allowed	never
`list_tasks`	LOW	none	no	allowed	never
`check_health`	LOW	none	no	allowed	never
`get_status`	LOW	none	no	allowed	never
`search_knowledge`	LOW	none	no	allowed	never
`create_task`	MEDIUM	internal	yes	blocked	safe_mode
`web_fetch`	MEDIUM	external	yes	blocked	safe_mode
`run_code`	HIGH	external	yes	blocked	safe_mode
`run_tests`	HIGH	external	yes	blocked	safe_mode

Decision flow:

Unknown tool → always blocked (UNKNOWN_TOOL denial)
Restricted channel + high-risk tool → blocked (RESTRICTED_CHANNEL)
Safe mode + blocked tool → blocked (SAFE_MODE)
Non-owner + owner-only tool → blocked (OWNER_ONLY)
approval=ALWAYS and not approved → blocked (APPROVAL_REQUIRED)
Otherwise → allowed

Every decision is recorded in PolicyAuditLog (ring buffer, max 1000) and surfaced through ActionEnvelope.

4. Channel policy

Channels carry different trust levels:

Channel	Trust	File access	High-risk tools
`telegram` (owner, private)	FULL	yes	yes
`telegram` (group, non-owner)	SAFE_MODE	no	no
`agent_api`	RESTRICTED	no	no
`webhook`	RESTRICTED	no	no
`public`	RESTRICTED	no	no
`internal`	FULL	yes	yes

Enforcement happens at two levels:

Tool policy — restricted channels block high-risk tools even for the owner.
CLI provider — restricted channels never set allow_file_access=True.

5. Telegram + Claude CLI deny guard

Programming tasks sent from Telegram cannot use the Claude CLI backend in default sandbox-only mode. The CLI requires an interactive permission prompt that is unreachable from Telegram, so the request would hang in a typing indicator until errormaxturns kills it.

The guard fires when all four conditions are true:

message.channel_type == "telegram"
task_type            == "programming"
effective_backend    == "cli"          (resolved via runtime LLM control)
AGENT_SANDBOX_ONLY   != "0"

When triggered, the brain returns a deterministic operator-friendly Slovak message that names the two unblock paths:

Switch the runtime to API backend via /runtime or POST /api/operator/llm. ToolUseLoop in API mode does not need an interactive approval.
Set AGENT_SANDBOX_ONLY=0 on the server (explicit host opt-in). The CLI then runs with --dangerously-skip-permissions.

Conversational tasks (status, memory, finance, plain Q&A) on the CLI backend continue to work — the guard is task-specific. See agent/core/brain.py::_process_inner (Layer 5.1) and tests/test_brain_core.py::TestTelegramCliProgrammingDenyGuard.

6. Headless CLI auto-approve (server deployments)

When the agent runs as a daemon (systemd/Docker/nohup), there is no operator clicking "Allow" on Claude Code permission prompts. The CLI provider auto-detects the missing TTY (or honours an explicit AGENT_CLI_AUTO_APPROVE env var) and adds:

--dangerously-skip-permissions
--disallowed-tools "Bash,Edit,Write,NotebookEdit"

The first flag skips the interactive prompt. The second one preserves sandbox isolation: even with permissions bypassed, the LLM can read and search but never mutate the host filesystem. Tested in tests/test_llm_provider.py::TestClaudeCliProvider.

7. Docker sandbox

Code submitted via /sandbox runs inside a Docker container with the following flags:

docker run --rm \
  --network=none \
  --memory=256m --cpus=1.0 \
  --read-only \
  --security-opt=no-new-privileges \
  --pids-limit=50 \
  --timeout=120s        # max 300s
  python:3.12-slim      # image whitelist

Image whitelist: python:3.12-slim, node:20-slim, alpine:latest, ruby:3.2-slim. No other images can be invoked. AGENT_SANDBOX_ONLY=1 is the default and is verified by tests/test_security_invariants.py::TestSandboxDefault.

8. Encrypted vault (single-file v2)

Cipher: Fernet (AES-128-CBC + HMAC-SHA256), audited primitives, no DIY crypto.
KDF: PBKDF2-HMAC-SHA256 with 480 000 iterations (current OWASP recommendation).
Format: b"ALSv2\n" magic header + 16 bytes random salt + Fernet token.
Atomic writes: every save is tmp.write → fsync(fd) → os.replace → fsync(parent) — no temp leftover, no partial state.
Wrong-key writes fail-fast: _load() raises VaultDecryptionError on InvalidToken. The encrypted blob on disk is never touched.
Wrong-key reads tolerant: get_secret, list_secrets, has_secret return empty/None so the agent can boot, log a warning, and let the operator fix .env without crashing.
Legacy v1 migration: the old salt.bin sidecar layout is auto-detected on first open and atomically migrated to v2. salt.bin is removed afterwards (best effort, non-fatal).

Full spec: Vault. Tests: tests/test_vault.py.

9. SQL & database safety

Every agent/*/storage.py uses parameterized queries (? placeholders) for runtime data. SQLite does not parameterize identifiers, so dynamic DDL (ALTER TABLE) goes through:

Whitelist of tables (_ALLOWED_TABLES: ClassVar[frozenset[str]])
Identifier regex for columns (^[A-Za-z_][A-Za-z0-9_]*$)
Single-quote escape on default literals (safe_default = default.replace("'", "''"))

Implemented in agent/build/storage.py::BuildStorage._ensure_text_column and agent/review/storage.py::ReviewStorage._ensure_text_column. Any new storage layer that needs ALTER TABLE must follow the same pattern.

10. HTTP API authentication

All mutation endpoints require Authorization: Bearer <AGENT_API_KEY>. There is no ?key= query string fallback.
Without AGENT_API_KEY configured, every mutation request is rejected. There is no dev mode.
Rate limiting: 10 req/min for external IPs, 60 req/min for 127.0.0.1/localhost/::1.
Replay protection: HMAC-SHA256 signing on operator endpoints, nonce cache with age-based eviction.
Invalid JSON on operator endpoints returns 400 (no silent fallback to {}).
Auth failures log a SHA256 hash of the offered token (not a prefix), so log readers can grep for repeated bad tokens without seeing the value.
Dashboard renders user-controlled fields through an esc() helper that builds a text node and returns the escaped innerHTML — no XSS vector.

11. Approval queue

Risk-sensitive actions go through structured approval (agent/core/approval.py):

Categories: FINANCE, TOOL, EXTERNAL, HOST
Lifecycle: PROPOSED → APPROVED / DENIED / PARTIALLY_APPROVED → COMPLETED / EXPIRED
Multi-step: required_approvals field for risky paths (default 1)
TTL expiry: stale proposals are auto-cancelled after the dead-man-switch window (14 days)
Same-person dedup: the same operator cannot approve a step twice in a multi-step path

Persisted in SQLite. Queryable via /api/operator/approvals and the dashboard.

12. Operator runtime controls

Runtime intervention without restart (agent/core/operator.py):

disable(tool_name) / enable(tool_name) — per-tool override (in-memory, resets on restart)
lockdown() / unlock() — global kill switch that disables every external tool

Combined with the dashboard panel and the /api/operator/lockdown HTTP endpoint, an operator can take the agent out of the live loop in under five seconds.

13. Status model

AgentStatusModel is a state machine that prevents stuck states:

States: IDLE, THINKING, EXECUTING, WAITING_APPROVAL, BLOCKED, DEGRADED, MAINTENANCE
process() is wrapped in try/finally so the brain always resets to IDLE on exit (no orphaned THINKING states after a crash).
Tool denials transition to BLOCKED with the reason exposed in /status.

14. Log redaction & audit trail

All log output passes through redact_secrets() (agent/logs/logger.py). Keys, tokens, passwords, OAuth secrets — never appear in logs.
ActionEnvelope records the full request/policy/execute/result lifecycle for every tool call.
PolicyAuditLog is a 1000-entry ring buffer of recent policy decisions.
ExplanationLog records why every decision was made (routing signals, policy verdicts, learning context, memory provenance breakdown).

15. Tiered logging retention

Long-tier file (~30 days default) retains lifecycle, build, finance, audit, security, vault, and ERROR/CRITICAL/AUDIT events. Short-tier file (~6 hours default) catches verbose pipeline diagnostics and gets pruned aggressively. The cron loop runs LogRetentionManager.prune_all() hourly. See Tiered logging for the full event-prefix tables.

Automated security tests

129 tests across 3 files run on every commit:

File	Tests	Coverage
`tests/test_security.py`	66	Prompt injection (EN + SK), safe mode, owner enforcement, channel policy
`tests/test_security_audit.py`	50+	Hardcoded secrets scan (AST), SQL safety, eval/exec ban, vault integration, sandbox isolation, API auth, log redaction
`tests/test_security_invariants.py`	13	Architecture invariants (no hardcoded paths, no duplicate persona, sandbox default = "1", tool policy completeness)

Plus 27+ regression tests added in the v1.35.0 release across vault, finance race lock, telegram cleanup, log retention, brain conversation, runtime LLM resolver.

Run the security suite directly:

.venv/bin/python -m pytest tests/test_security.py tests/test_security_audit.py tests/test_security_invariants.py -v

What the agent NEVER does without explicit approval

Send money. Wallet balance/receive only — no autonomous outflow. No smart contracts. No DeFi. No trading.
Print, log, transmit, or otherwise reveal a private key, mnemonic, or wallet address.
Execute code on the host filesystem unless AGENT_SANDBOX_ONLY=0 is explicitly set.
Comply with prompt-injection attempts.
Share internal system information (CPU/RAM/budget/secrets) with non-owners or restricted channels.
Bypass the sandbox for untrusted code.
Modify its own security rules at runtime.

Known limits

Static tool policy. No runtime learning on security rules. Capability manifest changes require a code change + test update + redeploy. (This is intentional.)
Audit log is a ring buffer. PolicyAuditLog and ExplanationLog cap at 1000 entries each in memory; long-term audit lives in the long-tier log file. We do not currently persist the in-memory rings to a separate audit store.
No formal red-team suite yet. Multi-step escalation and cross-channel attacks are tested ad hoc. A structured red-team test suite is on the roadmap.

Disclosing a vulnerability

See SECURITY.md at the repo root for the disclosure policy. Brief: open a private security advisory on GitHub. Critical bugs get a fix ASAP, high-severity within 7 days, medium within 30 days.

Repo · CHANGELOG · Releases · Issues · MIT License

Agent Life Space

v1.35.0 · Latest Release

Getting started

Architecture

Subsystems

Development

Security

Security

Threat model

Layered controls

1. Input sanitization

2. Owner identification + safe mode

3. Tool policy (deterministic, deny-by-default)

4. Channel policy

5. Telegram + Claude CLI deny guard

6. Headless CLI auto-approve (server deployments)

7. Docker sandbox

8. Encrypted vault (single-file v2)

9. SQL & database safety

10. HTTP API authentication

11. Approval queue

12. Operator runtime controls

13. Status model

14. Log redaction & audit trail

15. Tiered logging retention

Automated security tests

What the agent NEVER does without explicit approval

Known limits

Disclosing a vulnerability

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agent Life Space

Clone this wiki locally