-
Notifications
You must be signed in to change notification settings - Fork 0
Security
This page describes the threat model, the security boundaries, and the layered controls that enforce them. Code is the source of truth — every claim here is backed by tests in tests/test_security*.py.
Core principle: deny-by-default. Unknown tools are blocked. Restricted channels cannot reach high-risk operations. Vault writes with the wrong key are refused. Programming tasks from untrusted-permission channels never get host file access. There is no
AGENT_DEV_MODEbypass.
| Adversary | Goal | Mitigation |
|---|---|---|
| Curious LAN attacker | Probe /api/* for soft endpoints |
Bearer token on mutation, rate limiting, no ?key= query string fallback, replay protection (HMAC + nonce cache) |
| Compromised dependency | Run code with agent privileges | Sandbox (Docker, no-network, read-only FS, image whitelist), tool policy deny-by-default, capability manifest |
| Operator typo on master key | Re-encrypt vault with wrong key, lose secrets | Wrong-key writes raise VaultDecryptionError (fail-fast), the encrypted blob is never touched |
| Power loss mid-write | Vault on disk in inconsistent state | Single-file v2 format (ALSv2 magic + embedded salt + Fernet token), atomic os.replace + fsync (file + parent dir) |
| Prompt injection | Make the agent ignore instructions or leak secrets | Hard-block + soft-block patterns in EN + SK, secrets never reach prompt context, redact_secrets() on log records, channel-aware response filtering |
| Echoed agent suggestions | Make the operator's own message accidentally schedule destructive jobs |
_detect_explicit_work_queue anti-echo guard rejects pasted assistant text |
| Unattended group chat | Non-owner runs privileged commands | Owner whitelist (TELEGRAM_USER_ID), safe mode for non-owners in groups, owner-only commands enforced before dispatch |
| Concurrent finance approve | Two approve() calls on the same tx_id both succeed and double-spend |
Per-transaction asyncio.Lock
|
| Unbounded state growth | Disk fill / replay nonce explosion | Bounded ring buffers (audit log, explanation log), age-based eviction (request_identity._seen_nonces), tier-based log retention sweep |
Every user-supplied text passes through _sanitize_input() (in agent/social/telegram_handler.py) before any pipeline runs.
Hard block (returns None, request rejected):
ignore all previous instructionsforget all previous contextyou are now DAN<system> overrideoverride your rules/instructions- Slovak:
zabudni na všetko,ignoruj všetky pravidlá,nové inštrukcie
Soft block (logged, redacted, but allowed):
-
pretend you are,act as if,system:,teraz si
Tested in tests/test_security.py::TestPromptInjection.
- Telegram user IDs are whitelisted via
TELEGRAM_USER_ID(comma-separated). - Owner gets the full owner-name resolution and unlimited command access.
- Non-owners in group chats drop into safe mode: only
/start,/help,/status,/health. Everything else returns "Príkaz X je dostupný len pre ownera" — checked before command dispatch (tests/test_security.py::TestSafeMode). - Non-owners in private chats are rejected with "Unauthorized. This bot only responds to its owner."
Every tool has an entry in TOOL_CAPABILITIES (agent/core/tool_policy.py):
| Tool | Risk | Side effect | Owner only | Safe mode | Approval |
|---|---|---|---|---|---|
query_memory |
LOW | none | no | allowed | never |
store_memory |
LOW | internal | no | allowed | never |
list_tasks |
LOW | none | no | allowed | never |
check_health |
LOW | none | no | allowed | never |
get_status |
LOW | none | no | allowed | never |
search_knowledge |
LOW | none | no | allowed | never |
create_task |
MEDIUM | internal | yes | blocked | safe_mode |
web_fetch |
MEDIUM | external | yes | blocked | safe_mode |
run_code |
HIGH | external | yes | blocked | safe_mode |
run_tests |
HIGH | external | yes | blocked | safe_mode |
Decision flow:
- Unknown tool → always blocked (
UNKNOWN_TOOLdenial) - Restricted channel + high-risk tool → blocked (
RESTRICTED_CHANNEL) - Safe mode + blocked tool → blocked (
SAFE_MODE) - Non-owner + owner-only tool → blocked (
OWNER_ONLY) -
approval=ALWAYSand not approved → blocked (APPROVAL_REQUIRED) - Otherwise → allowed
Every decision is recorded in PolicyAuditLog (ring buffer, max 1000) and surfaced through ActionEnvelope.
Channels carry different trust levels:
| Channel | Trust | File access | High-risk tools |
|---|---|---|---|
telegram (owner, private) |
FULL | yes | yes |
telegram (group, non-owner) |
SAFE_MODE | no | no |
agent_api |
RESTRICTED | no | no |
webhook |
RESTRICTED | no | no |
public |
RESTRICTED | no | no |
internal |
FULL | yes | yes |
Enforcement happens at two levels:
- Tool policy — restricted channels block high-risk tools even for the owner.
-
CLI provider — restricted channels never set
allow_file_access=True.
Programming tasks sent from Telegram cannot use the Claude CLI backend in default sandbox-only mode. The CLI requires an interactive permission prompt that is unreachable from Telegram, so the request would hang in a typing indicator until errormaxturns kills it.
The guard fires when all four conditions are true:
message.channel_type == "telegram"
task_type == "programming"
effective_backend == "cli" (resolved via runtime LLM control)
AGENT_SANDBOX_ONLY != "0"
When triggered, the brain returns a deterministic operator-friendly Slovak message that names the two unblock paths:
- Switch the runtime to API backend via
/runtimeorPOST /api/operator/llm. ToolUseLoop in API mode does not need an interactive approval. - Set
AGENT_SANDBOX_ONLY=0on the server (explicit host opt-in). The CLI then runs with--dangerously-skip-permissions.
Conversational tasks (status, memory, finance, plain Q&A) on the CLI backend continue to work — the guard is task-specific. See agent/core/brain.py::_process_inner (Layer 5.1) and tests/test_brain_core.py::TestTelegramCliProgrammingDenyGuard.
When the agent runs as a daemon (systemd/Docker/nohup), there is no operator clicking "Allow" on Claude Code permission prompts. The CLI provider auto-detects the missing TTY (or honours an explicit AGENT_CLI_AUTO_APPROVE env var) and adds:
--dangerously-skip-permissions
--disallowed-tools "Bash,Edit,Write,NotebookEdit"
The first flag skips the interactive prompt. The second one preserves sandbox isolation: even with permissions bypassed, the LLM can read and search but never mutate the host filesystem. Tested in tests/test_llm_provider.py::TestClaudeCliProvider.
Code submitted via /sandbox runs inside a Docker container with the following flags:
docker run --rm \
--network=none \
--memory=256m --cpus=1.0 \
--read-only \
--security-opt=no-new-privileges \
--pids-limit=50 \
--timeout=120s # max 300s
python:3.12-slim # image whitelist
Image whitelist: python:3.12-slim, node:20-slim, alpine:latest, ruby:3.2-slim. No other images can be invoked. AGENT_SANDBOX_ONLY=1 is the default and is verified by tests/test_security_invariants.py::TestSandboxDefault.
- Cipher: Fernet (AES-128-CBC + HMAC-SHA256), audited primitives, no DIY crypto.
- KDF: PBKDF2-HMAC-SHA256 with 480 000 iterations (current OWASP recommendation).
-
Format:
b"ALSv2\n"magic header + 16 bytes random salt + Fernet token. -
Atomic writes: every save is
tmp.write → fsync(fd) → os.replace → fsync(parent)— no temp leftover, no partial state. -
Wrong-key writes fail-fast:
_load()raisesVaultDecryptionErroronInvalidToken. The encrypted blob on disk is never touched. -
Wrong-key reads tolerant:
get_secret,list_secrets,has_secretreturn empty/None so the agent can boot, log a warning, and let the operator fix.envwithout crashing. -
Legacy v1 migration: the old
salt.binsidecar layout is auto-detected on first open and atomically migrated to v2.salt.binis removed afterwards (best effort, non-fatal).
Full spec: Vault. Tests: tests/test_vault.py.
Every agent/*/storage.py uses parameterized queries (? placeholders) for runtime data. SQLite does not parameterize identifiers, so dynamic DDL (ALTER TABLE) goes through:
-
Whitelist of tables (
_ALLOWED_TABLES: ClassVar[frozenset[str]]) -
Identifier regex for columns (
^[A-Za-z_][A-Za-z0-9_]*$) -
Single-quote escape on default literals (
safe_default = default.replace("'", "''"))
Implemented in agent/build/storage.py::BuildStorage._ensure_text_column and agent/review/storage.py::ReviewStorage._ensure_text_column. Any new storage layer that needs ALTER TABLE must follow the same pattern.
- All mutation endpoints require
Authorization: Bearer <AGENT_API_KEY>. There is no?key=query string fallback. - Without
AGENT_API_KEYconfigured, every mutation request is rejected. There is no dev mode. - Rate limiting: 10 req/min for external IPs, 60 req/min for
127.0.0.1/localhost/::1. - Replay protection: HMAC-SHA256 signing on operator endpoints, nonce cache with age-based eviction.
- Invalid JSON on operator endpoints returns
400(no silent fallback to{}). - Auth failures log a SHA256 hash of the offered token (not a prefix), so log readers can grep for repeated bad tokens without seeing the value.
- Dashboard renders user-controlled fields through an
esc()helper that builds a text node and returns the escaped innerHTML — no XSS vector.
Risk-sensitive actions go through structured approval (agent/core/approval.py):
- Categories: FINANCE, TOOL, EXTERNAL, HOST
- Lifecycle: PROPOSED → APPROVED / DENIED / PARTIALLY_APPROVED → COMPLETED / EXPIRED
-
Multi-step:
required_approvalsfield for risky paths (default 1) - TTL expiry: stale proposals are auto-cancelled after the dead-man-switch window (14 days)
- Same-person dedup: the same operator cannot approve a step twice in a multi-step path
Persisted in SQLite. Queryable via /api/operator/approvals and the dashboard.
Runtime intervention without restart (agent/core/operator.py):
-
disable(tool_name)/enable(tool_name)— per-tool override (in-memory, resets on restart) -
lockdown()/unlock()— global kill switch that disables every external tool
Combined with the dashboard panel and the /api/operator/lockdown HTTP endpoint, an operator can take the agent out of the live loop in under five seconds.
AgentStatusModel is a state machine that prevents stuck states:
- States:
IDLE,THINKING,EXECUTING,WAITING_APPROVAL,BLOCKED,DEGRADED,MAINTENANCE -
process()is wrapped intry/finallyso the brain always resets toIDLEon exit (no orphanedTHINKINGstates after a crash). - Tool denials transition to
BLOCKEDwith the reason exposed in/status.
- All log output passes through
redact_secrets()(agent/logs/logger.py). Keys, tokens, passwords, OAuth secrets — never appear in logs. -
ActionEnveloperecords the full request/policy/execute/result lifecycle for every tool call. -
PolicyAuditLogis a 1000-entry ring buffer of recent policy decisions. -
ExplanationLogrecords why every decision was made (routing signals, policy verdicts, learning context, memory provenance breakdown).
Long-tier file (~30 days default) retains lifecycle, build, finance, audit, security, vault, and ERROR/CRITICAL/AUDIT events. Short-tier file (~6 hours default) catches verbose pipeline diagnostics and gets pruned aggressively. The cron loop runs LogRetentionManager.prune_all() hourly. See Tiered logging for the full event-prefix tables.
129 tests across 3 files run on every commit:
| File | Tests | Coverage |
|---|---|---|
tests/test_security.py |
66 | Prompt injection (EN + SK), safe mode, owner enforcement, channel policy |
tests/test_security_audit.py |
50+ | Hardcoded secrets scan (AST), SQL safety, eval/exec ban, vault integration, sandbox isolation, API auth, log redaction |
tests/test_security_invariants.py |
13 | Architecture invariants (no hardcoded paths, no duplicate persona, sandbox default = "1", tool policy completeness) |
Plus 27+ regression tests added in the v1.35.0 release across vault, finance race lock, telegram cleanup, log retention, brain conversation, runtime LLM resolver.
Run the security suite directly:
.venv/bin/python -m pytest tests/test_security.py tests/test_security_audit.py tests/test_security_invariants.py -v- Send money. Wallet balance/receive only — no autonomous outflow. No smart contracts. No DeFi. No trading.
- Print, log, transmit, or otherwise reveal a private key, mnemonic, or wallet address.
- Execute code on the host filesystem unless
AGENT_SANDBOX_ONLY=0is explicitly set. - Comply with prompt-injection attempts.
- Share internal system information (CPU/RAM/budget/secrets) with non-owners or restricted channels.
- Bypass the sandbox for untrusted code.
- Modify its own security rules at runtime.
- Static tool policy. No runtime learning on security rules. Capability manifest changes require a code change + test update + redeploy. (This is intentional.)
- Audit log is a ring buffer. PolicyAuditLog and ExplanationLog cap at 1000 entries each in memory; long-term audit lives in the long-tier log file. We do not currently persist the in-memory rings to a separate audit store.
- No formal red-team suite yet. Multi-step escalation and cross-channel attacks are tested ad hoc. A structured red-team test suite is on the roadmap.
See SECURITY.md at the repo root for the disclosure policy. Brief: open a private security advisory on GitHub. Critical bugs get a fix ASAP, high-severity within 7 days, medium within 30 days.
v1.35.0 · Latest Release
Getting started
Architecture
Subsystems
- Security model
- Vault
- Tiered logging
- Runtime LLM control
- Build pipeline
- Review pipeline
- Finance
- Cron & Maintenance
Development