Skip to content

Security

Daniel Babjak edited this page Apr 8, 2026 · 4 revisions

Security

This page describes the threat model, the security boundaries, and the layered controls that enforce them. Code is the source of truth — every claim here is backed by tests in tests/test_security*.py.

Core principle: deny-by-default. Unknown tools are blocked. Restricted channels cannot reach high-risk operations. Vault writes with the wrong key are refused. Programming tasks from untrusted-permission channels never get host file access. There is no AGENT_DEV_MODE bypass.


Threat model

Adversary Goal Mitigation
Curious LAN attacker Probe /api/* for soft endpoints Bearer token on mutation, rate limiting, no ?key= query string fallback, replay protection (HMAC + nonce cache)
Compromised dependency Run code with agent privileges Sandbox (Docker, no-network, read-only FS, image whitelist), tool policy deny-by-default, capability manifest
Operator typo on master key Re-encrypt vault with wrong key, lose secrets Wrong-key writes raise VaultDecryptionError (fail-fast), the encrypted blob is never touched
Power loss mid-write Vault on disk in inconsistent state Single-file v2 format (ALSv2 magic + embedded salt + Fernet token), atomic os.replace + fsync (file + parent dir)
Prompt injection Make the agent ignore instructions or leak secrets Hard-block + soft-block patterns in EN + SK, secrets never reach prompt context, redact_secrets() on log records, channel-aware response filtering
Echoed agent suggestions Make the operator's own message accidentally schedule destructive jobs _detect_explicit_work_queue anti-echo guard rejects pasted assistant text
Unattended group chat Non-owner runs privileged commands Owner whitelist (TELEGRAM_USER_ID), safe mode for non-owners in groups, owner-only commands enforced before dispatch
Concurrent finance approve Two approve() calls on the same tx_id both succeed and double-spend Per-transaction asyncio.Lock
Unbounded state growth Disk fill / replay nonce explosion Bounded ring buffers (audit log, explanation log), age-based eviction (request_identity._seen_nonces), tier-based log retention sweep

Layered controls

1. Input sanitization

Every user-supplied text passes through _sanitize_input() (in agent/social/telegram_handler.py) before any pipeline runs.

Hard block (returns None, request rejected):

  • ignore all previous instructions
  • forget all previous context
  • you are now DAN
  • <system> override
  • override your rules/instructions
  • Slovak: zabudni na všetko, ignoruj všetky pravidlá, nové inštrukcie

Soft block (logged, redacted, but allowed):

  • pretend you are, act as if, system:, teraz si

Tested in tests/test_security.py::TestPromptInjection.

2. Owner identification + safe mode

  • Telegram user IDs are whitelisted via TELEGRAM_USER_ID (comma-separated).
  • Owner gets the full owner-name resolution and unlimited command access.
  • Non-owners in group chats drop into safe mode: only /start, /help, /status, /health. Everything else returns "Príkaz X je dostupný len pre ownera" — checked before command dispatch (tests/test_security.py::TestSafeMode).
  • Non-owners in private chats are rejected with "Unauthorized. This bot only responds to its owner."

3. Tool policy (deterministic, deny-by-default)

Every tool has an entry in TOOL_CAPABILITIES (agent/core/tool_policy.py):

Tool Risk Side effect Owner only Safe mode Approval
query_memory LOW none no allowed never
store_memory LOW internal no allowed never
list_tasks LOW none no allowed never
check_health LOW none no allowed never
get_status LOW none no allowed never
search_knowledge LOW none no allowed never
create_task MEDIUM internal yes blocked safe_mode
web_fetch MEDIUM external yes blocked safe_mode
run_code HIGH external yes blocked safe_mode
run_tests HIGH external yes blocked safe_mode

Decision flow:

  1. Unknown tool → always blocked (UNKNOWN_TOOL denial)
  2. Restricted channel + high-risk tool → blocked (RESTRICTED_CHANNEL)
  3. Safe mode + blocked tool → blocked (SAFE_MODE)
  4. Non-owner + owner-only tool → blocked (OWNER_ONLY)
  5. approval=ALWAYS and not approved → blocked (APPROVAL_REQUIRED)
  6. Otherwise → allowed

Every decision is recorded in PolicyAuditLog (ring buffer, max 1000) and surfaced through ActionEnvelope.

4. Channel policy

Channels carry different trust levels:

Channel Trust File access High-risk tools
telegram (owner, private) FULL yes yes
telegram (group, non-owner) SAFE_MODE no no
agent_api RESTRICTED no no
webhook RESTRICTED no no
public RESTRICTED no no
internal FULL yes yes

Enforcement happens at two levels:

  • Tool policy — restricted channels block high-risk tools even for the owner.
  • CLI provider — restricted channels never set allow_file_access=True.

5. Telegram + Claude CLI deny guard

Programming tasks sent from Telegram cannot use the Claude CLI backend in default sandbox-only mode. The CLI requires an interactive permission prompt that is unreachable from Telegram, so the request would hang in a typing indicator until errormaxturns kills it.

The guard fires when all four conditions are true:

message.channel_type == "telegram"
task_type            == "programming"
effective_backend    == "cli"          (resolved via runtime LLM control)
AGENT_SANDBOX_ONLY   != "0"

When triggered, the brain returns a deterministic operator-friendly Slovak message that names the two unblock paths:

  1. Switch the runtime to API backend via /runtime or POST /api/operator/llm. ToolUseLoop in API mode does not need an interactive approval.
  2. Set AGENT_SANDBOX_ONLY=0 on the server (explicit host opt-in). The CLI then runs with --dangerously-skip-permissions.

Conversational tasks (status, memory, finance, plain Q&A) on the CLI backend continue to work — the guard is task-specific. See agent/core/brain.py::_process_inner (Layer 5.1) and tests/test_brain_core.py::TestTelegramCliProgrammingDenyGuard.

6. Headless CLI auto-approve (server deployments)

When the agent runs as a daemon (systemd/Docker/nohup), there is no operator clicking "Allow" on Claude Code permission prompts. The CLI provider auto-detects the missing TTY (or honours an explicit AGENT_CLI_AUTO_APPROVE env var) and adds:

--dangerously-skip-permissions
--disallowed-tools "Bash,Edit,Write,NotebookEdit"

The first flag skips the interactive prompt. The second one preserves sandbox isolation: even with permissions bypassed, the LLM can read and search but never mutate the host filesystem. Tested in tests/test_llm_provider.py::TestClaudeCliProvider.

7. Docker sandbox

Code submitted via /sandbox runs inside a Docker container with the following flags:

docker run --rm \
  --network=none \
  --memory=256m --cpus=1.0 \
  --read-only \
  --security-opt=no-new-privileges \
  --pids-limit=50 \
  --timeout=120s        # max 300s
  python:3.12-slim      # image whitelist

Image whitelist: python:3.12-slim, node:20-slim, alpine:latest, ruby:3.2-slim. No other images can be invoked. AGENT_SANDBOX_ONLY=1 is the default and is verified by tests/test_security_invariants.py::TestSandboxDefault.

8. Encrypted vault (single-file v2)

  • Cipher: Fernet (AES-128-CBC + HMAC-SHA256), audited primitives, no DIY crypto.
  • KDF: PBKDF2-HMAC-SHA256 with 480 000 iterations (current OWASP recommendation).
  • Format: b"ALSv2\n" magic header + 16 bytes random salt + Fernet token.
  • Atomic writes: every save is tmp.write → fsync(fd) → os.replace → fsync(parent) — no temp leftover, no partial state.
  • Wrong-key writes fail-fast: _load() raises VaultDecryptionError on InvalidToken. The encrypted blob on disk is never touched.
  • Wrong-key reads tolerant: get_secret, list_secrets, has_secret return empty/None so the agent can boot, log a warning, and let the operator fix .env without crashing.
  • Legacy v1 migration: the old salt.bin sidecar layout is auto-detected on first open and atomically migrated to v2. salt.bin is removed afterwards (best effort, non-fatal).

Full spec: Vault. Tests: tests/test_vault.py.

9. SQL & database safety

Every agent/*/storage.py uses parameterized queries (? placeholders) for runtime data. SQLite does not parameterize identifiers, so dynamic DDL (ALTER TABLE) goes through:

  • Whitelist of tables (_ALLOWED_TABLES: ClassVar[frozenset[str]])
  • Identifier regex for columns (^[A-Za-z_][A-Za-z0-9_]*$)
  • Single-quote escape on default literals (safe_default = default.replace("'", "''"))

Implemented in agent/build/storage.py::BuildStorage._ensure_text_column and agent/review/storage.py::ReviewStorage._ensure_text_column. Any new storage layer that needs ALTER TABLE must follow the same pattern.

10. HTTP API authentication

  • All mutation endpoints require Authorization: Bearer <AGENT_API_KEY>. There is no ?key= query string fallback.
  • Without AGENT_API_KEY configured, every mutation request is rejected. There is no dev mode.
  • Rate limiting: 10 req/min for external IPs, 60 req/min for 127.0.0.1/localhost/::1.
  • Replay protection: HMAC-SHA256 signing on operator endpoints, nonce cache with age-based eviction.
  • Invalid JSON on operator endpoints returns 400 (no silent fallback to {}).
  • Auth failures log a SHA256 hash of the offered token (not a prefix), so log readers can grep for repeated bad tokens without seeing the value.
  • Dashboard renders user-controlled fields through an esc() helper that builds a text node and returns the escaped innerHTML — no XSS vector.

11. Approval queue

Risk-sensitive actions go through structured approval (agent/core/approval.py):

  • Categories: FINANCE, TOOL, EXTERNAL, HOST
  • Lifecycle: PROPOSED → APPROVED / DENIED / PARTIALLY_APPROVED → COMPLETED / EXPIRED
  • Multi-step: required_approvals field for risky paths (default 1)
  • TTL expiry: stale proposals are auto-cancelled after the dead-man-switch window (14 days)
  • Same-person dedup: the same operator cannot approve a step twice in a multi-step path

Persisted in SQLite. Queryable via /api/operator/approvals and the dashboard.

12. Operator runtime controls

Runtime intervention without restart (agent/core/operator.py):

  • disable(tool_name) / enable(tool_name) — per-tool override (in-memory, resets on restart)
  • lockdown() / unlock() — global kill switch that disables every external tool

Combined with the dashboard panel and the /api/operator/lockdown HTTP endpoint, an operator can take the agent out of the live loop in under five seconds.

13. Status model

AgentStatusModel is a state machine that prevents stuck states:

  • States: IDLE, THINKING, EXECUTING, WAITING_APPROVAL, BLOCKED, DEGRADED, MAINTENANCE
  • process() is wrapped in try/finally so the brain always resets to IDLE on exit (no orphaned THINKING states after a crash).
  • Tool denials transition to BLOCKED with the reason exposed in /status.

14. Log redaction & audit trail

  • All log output passes through redact_secrets() (agent/logs/logger.py). Keys, tokens, passwords, OAuth secrets — never appear in logs.
  • ActionEnvelope records the full request/policy/execute/result lifecycle for every tool call.
  • PolicyAuditLog is a 1000-entry ring buffer of recent policy decisions.
  • ExplanationLog records why every decision was made (routing signals, policy verdicts, learning context, memory provenance breakdown).

15. Tiered logging retention

Long-tier file (~30 days default) retains lifecycle, build, finance, audit, security, vault, and ERROR/CRITICAL/AUDIT events. Short-tier file (~6 hours default) catches verbose pipeline diagnostics and gets pruned aggressively. The cron loop runs LogRetentionManager.prune_all() hourly. See Tiered logging for the full event-prefix tables.


Automated security tests

129 tests across 3 files run on every commit:

File Tests Coverage
tests/test_security.py 66 Prompt injection (EN + SK), safe mode, owner enforcement, channel policy
tests/test_security_audit.py 50+ Hardcoded secrets scan (AST), SQL safety, eval/exec ban, vault integration, sandbox isolation, API auth, log redaction
tests/test_security_invariants.py 13 Architecture invariants (no hardcoded paths, no duplicate persona, sandbox default = "1", tool policy completeness)

Plus 27+ regression tests added in the v1.35.0 release across vault, finance race lock, telegram cleanup, log retention, brain conversation, runtime LLM resolver.

Run the security suite directly:

.venv/bin/python -m pytest tests/test_security.py tests/test_security_audit.py tests/test_security_invariants.py -v

What the agent NEVER does without explicit approval

  1. Send money. Wallet balance/receive only — no autonomous outflow. No smart contracts. No DeFi. No trading.
  2. Print, log, transmit, or otherwise reveal a private key, mnemonic, or wallet address.
  3. Execute code on the host filesystem unless AGENT_SANDBOX_ONLY=0 is explicitly set.
  4. Comply with prompt-injection attempts.
  5. Share internal system information (CPU/RAM/budget/secrets) with non-owners or restricted channels.
  6. Bypass the sandbox for untrusted code.
  7. Modify its own security rules at runtime.

Known limits

  • Static tool policy. No runtime learning on security rules. Capability manifest changes require a code change + test update + redeploy. (This is intentional.)
  • Audit log is a ring buffer. PolicyAuditLog and ExplanationLog cap at 1000 entries each in memory; long-term audit lives in the long-tier log file. We do not currently persist the in-memory rings to a separate audit store.
  • No formal red-team suite yet. Multi-step escalation and cross-channel attacks are tested ad hoc. A structured red-team test suite is on the roadmap.

Disclosing a vulnerability

See SECURITY.md at the repo root for the disclosure policy. Brief: open a private security advisory on GitHub. Critical bugs get a fix ASAP, high-severity within 7 days, medium within 30 days.

Clone this wiki locally