Skip to content

Architecture

Daniel Babjak edited this page Apr 8, 2026 · 4 revisions

Architecture

This page describes how Agent Life Space is structured at runtime, what the data flow looks like, and why each layer exists. It is the canonical reference — the per-module details are on the Modules page, the security boundaries are on Security, and the on-disk formats are on Vault and Logging.


System overview

                                  ┌──────────────────────────────────┐
                                  │       AgentOrchestrator          │
                                  │   (lifecycle + dependency wire)  │
                                  └────────────┬─────────────────────┘
                                               │
        ┌──────────────┬──────────────────────┼──────────────────────┬───────────────┐
        │              │                      │                      │               │
   ┌────▼────┐   ┌─────▼────┐         ┌───────▼────────┐      ┌──────▼──────┐  ┌─────▼─────┐
   │ Brain   │   │ Memory   │         │ Tasks / Work   │      │ Build       │  │ Review    │
   │ pipeline│   │ store +  │         │ queue + proj   │      │ pipeline    │  │ pipeline  │
   │ (9 layer│   │ RAG +    │         │                │      │ (codegen+   │  │ (audit +  │
   │  cascade│   │ persist  │         │                │      │  Docker)    │  │  PR + rls)│
   └────┬────┘   └────┬─────┘         └────────┬───────┘      └──────┬──────┘  └─────┬─────┘
        │             │                        │                     │               │
        └─────────────┴────────┬───────────────┴──────┬──────────────┴───────────────┘
                               │                      │
                       ┌───────▼──────┐       ┌───────▼──────────┐
                       │ Control Plane│       │  Governance      │
                       │  ─ policy    │       │  ─ ToolPolicy    │
                       │  ─ intake    │       │  ─ ApprovalQueue │
                       │  ─ gateway   │       │  ─ OperatorCtrl  │
                       │  ─ state     │       │  ─ StatusModel   │
                       │  ─ reporting │       │  ─ ExplanationLog│
                       └───────┬──────┘       └──────────────────┘
                               │
              ┌────────────────┼─────────────────┐
              │                │                 │
       ┌──────▼─────┐   ┌──────▼─────┐    ┌──────▼─────┐
       │ Vault (v2) │   │  Logs      │    │  Finance   │
       │  AES-128   │   │  tiered    │    │  budget +  │
       │  +HMAC-256 │   │  long+short│    │  approval  │
       └────────────┘   └────────────┘    └────────────┘

The orchestrator (agent/core/agent.py::AgentOrchestrator) is the only place that wires modules together. Everything else holds references that were injected at construction time, never reaches across boundaries with getattr hacks, and never imports the orchestrator back. This is enforced by architecture invariant tests.


The 9-layer brain pipeline

The brain (agent/core/brain.py::AgentBrain) is channel-agnostic. It takes an IncomingMessage and returns a string. Every message goes through the same nine layers, in order. Layers 1 to 4 may early-return; layers 5 to 9 always run together.

process(IncomingMessage)
  │
  ├─ try / finally — status always resets to IDLE on exit
  │
  └─ _process_inner(message)
        │
        Layer 1   Multi-task detection → work queue
        ─────    Strict rules: explicit intent header (urob:, todo:, ...) OR
                  clean numbered list with no surrounding prose. Anti-echo guard
                  rejects pasted assistant text. → early return if multi-task
        │
        Layer 2   Internal dispatcher (deterministic, no LLM)
        ─────    status / health / tasks / budget / identity / skills.
                  → early return if handled
        │
        Layer 3   Semantic cache lookup
        ─────    sentence-transformers similarity ≥ 0.90 → early return on hit
        │
        Layer 4   RAG retrieval
        ─────    knowledge base embedding search.
                  "direct" → early return.  "augment" → context injected into prompt.
        │
        Layer 5   Task classification + model selection
        ─────    classify_task() → tier (FAST/BALANCED/POWERFUL) → model.
                  Learning-based override (adapt_model). Channel enforcement.
                  Telegram + CLI + sandbox-only deny guard (fail-closed).
        │
        Layer 5.5 Runtime facts injection (anti-confabulation)
        ─────    Real CPU/RAM/uptime/budget injected so the model has verified
                  ground truth even when it can't call agent tools.
        │
        Layer 6   LLM call via provider abstraction
        ─────    API backend → ToolUseLoop (multi-turn function calling).
                  CLI backend → direct generate (with channel-enforced file access).
        │
        Layer 7   Post-routing quality escalation
        ─────    assess_quality(). If response is generic and budget allows,
                  re-run with stronger model. Skipped for tool-loop responses
                  to preserve tool context.
        │
        Layer 8   Learning feedback + skill auto-update
        ─────    process_outcome(model, task, reply) →
                  confidence adjustment, prompt augmentation hints, skill discovery.
        │
        Layer 9   Channel policy filter + explanation log
        ─────    classify_response() → can_send_response().
                  ExplanationLog records routing signals, policy decisions,
                  learning context, memory provenance breakdown.
        │
        return reply

Key invariant: the LLM is the most expensive layer. Every cheaper layer that can answer must run first. Most messages never reach layer 6 because they were handled by dispatcher, cache, or RAG.


Module map

Path Purpose LOC (approx)
agent/core/ Orchestrator, brain pipeline, LLM provider, tool policy, approval, status, explanation, models, sandbox executor, cron loops, paths ~9,400
agent/build/ Build service, codegen, capabilities, verification, acceptance criteria, storage, models, Docker executor ~6,200
agent/social/ Telegram bot + handler, Agent HTTP API, dashboard, channel policy, request identity ~5,800
agent/control/ Policy, intake, gateway, state, reporting, evidence export, llm_runtime, settlement, recurring workflows, pipelines, storage ~5,400
agent/brain/ Internal dispatcher, semantic router, programmer, learning, decision engine, tool router, skills, knowledge ~3,200
agent/review/ Review service, analyzers, verifier, redaction, quality, storage, models ~2,900
agent/memory/ 4-type store + provenance, persistent conversation, RAG, semantic cache, consolidation, inspection ~2,400
agent/finance/ Tracker, budget policy, risk templates, approval flow, settlement requests ~1,300
agent/logs/ Structured logging, secret redaction, tiered routing, retention manager ~620
agent/tasks/ Task lifecycle (CRUD, priority queue) ~410
agent/work/ SQLite-backed workspaces, audit trail, recovery, hash chain ~470
agent/projects/ Project scoping ~330
agent/vault/ Encrypted secrets (v2 single-file format) ~470
operator/ TypeScript contracts for the operator dashboard (separate package)

Total Python in agent/: ~70,000 LOC across 112 source files. Full per-file inventory: Modules.


Data flow

Brain pipeline

The 9-layer cascade described above. See agent/core/brain.py::AgentBrain.process for the entry point and _process_inner for the body. Each layer is unit-tested in tests/test_brain_core.py.

Build pipeline

operator → /build or /intake (telegram or HTTP)
  │
intake.qualify → plan → submit
  │
BuildService.run_build()
  │
  ├─ workspace setup (isolated, hash-chained audit trail)
  │
  ├─ codegen (LLM call → BuildOperation[])
  │     │
  │     └─ AUDIT_MARKER_ONLY guard: refuse to pass verify if codegen failed
  │
  ├─ apply mutations (10 types: create_file, edit_file, copy_file, ...)
  │
  ├─ verification (test/lint/typecheck plan, discovered or explicit)
  │
  ├─ Docker isolation (256MB, no network, image whitelist)
  │
  ├─ acceptance evaluation (auto + verify + review)
  │
  ├─ artifacts persisted via BuildStorage (WAL SQLite)
  │
  └─ delivery package (preview → approve → handed off)

Full detail: Build pipeline.

Review pipeline

operator → /review or /intake
  │
ReviewIntake (validated)
  │
ReviewService.run_review() →
  │
  ├─ repo audit (RepoStructureAnalyzer + SecurityAnalyzer)
  ├─ pr_review  (DiffAnalyzer + security pass on changed files)
  └─ release_review (audit + release-specific checks)
  │
verifier → false-positive reduction
  │
ReviewReport (verdict, findings, executive summary, open questions, assumptions)
  │
artifacts (markdown report, finding list JSON, reviewer handoff pack)
  │
evidence_export (internal or client_safe; redacts paths/secrets dynamically)
  │
delivery package

Full detail: Review pipeline.

Control plane

intake (/intake)
  │
qualify_operator_intake → preview_operator_intake → submit_operator_intake
  │
policy.evaluate_runtime_action (deterministic, deny-by-default)
  │
budget check (hard cap / stop-loss / approval cap)
  │
  ├─ approved → product job (build or review or analysis)
  └─ blocked  → structured denial with category + reason
  │
on completion:
  - control_plane.record_trace (RELEASE | BUILD | REVIEW | DELIVERY)
  - cost ledger entry
  - operator inbox surface
  - settlement attention if 402 was triggered

Policies live in agent/control/policy.py. Intake routing in agent/control/intake.py. Trace + cost storage in agent/control/state.py.

Vault writes

Every vault write is one atomic operation:

set_secret(name, value)
  │
_load() → fail-fast on InvalidToken (VaultDecryptionError)
  │
secrets[name] = value
  │
_save(secrets):
  │
  ├─ token = self._fernet.encrypt(orjson.dumps(secrets))
  │
  ├─ v2_blob = b"ALSv2\n" + self._current_salt + token
  │
  └─ _atomic_write(secrets_file, v2_blob):
        │
        ├─ open secrets.enc.tmp with O_CREAT|O_WRONLY|O_TRUNC mode 0600
        │
        ├─ os.write all bytes
        │
        ├─ os.fsync(fd)        ← contents durable
        │
        ├─ os.close(fd)
        │
        ├─ os.replace(tmp, secrets_file)   ← POSIX atomic rename
        │
        └─ os.fsync(parent_dir)            ← rename durable

A SIGKILL between any two of these steps leaves the vault in exactly one of two states: the previous good blob, or the new good blob. Never a partial / mismatched mix. Full spec: Vault.

Tiered logging

structlog event
  │
processors: add_log_level + TimeStamper + StackInfoRenderer + format_exc_info + JSONRenderer
  │
stdlib LoggerFactory (BoundLogger)
  │
root logger handlers: _TierRouter
  │
  ├─ resolve_tier(level, event)  → "long" or "short"
  │
  ├─ long  → TimedRotatingFileHandler (daily, agent-long.log)
  │
  └─ short → TimedRotatingFileHandler (hourly, agent-short.log)
  │
cron loop (hourly):
  │
  └─ LogRetentionManager.prune_all()
        │
        ├─ long  files older than AGENT_LOG_LONG_RETENTION_HOURS  → delete
        └─ short files older than AGENT_LOG_SHORT_RETENTION_HOURS → delete

Full spec: Tiered logging.


Technology stack

Layer Choice Why
Language Python 3.11+ Async first-class, structural pattern matching, mature crypto
LLM Provider-agnostic (Claude CLI, Anthropic API, OpenAI-compatible API) No lock-in. Operator picks per session.
Database SQLite (aiosqlite + sqlite3 with WAL mode) Single file per concern, no separate server, durable, fast enough
Serialization orjson 5–10× faster than stdlib json, strict UTF-8
Validation Pydantic v2 + jsonschema Pydantic for runtime models, jsonschema for LLM-output validation
Logging structlog (JSON via stdlib) Structured events, tier-routable, secret-redactable
Encryption cryptography (Fernet AES-128-CBC + HMAC-SHA256, PBKDF2 480K iterations) Audited primitives, no DIY crypto
Sandbox Docker (read-only, no-network, resource limits, image whitelist) Real isolation, well-understood blast radius
Embeddings sentence-transformers (paraphrase-multilingual-MiniLM-L12-v2) Local, no API, multilingual (EN + SK)
HTTP aiohttp (server + client) One library for both sides, async-native
Scheduling Plain asyncio loops with await asyncio.sleep No APScheduler footgun, deterministic, observable
Process supervision psutil Cross-platform, battle-tested
Type checking mypy strict on the whole agent/ tree Catch wiring bugs at CI time
Lint / format ruff Fast, opinionated, replaces flake8 + isort

We deliberately avoid: APScheduler, Celery, Redis, RabbitMQ, Kubernetes, vendor SDKs that pull in 50+ transitive deps. The whole agent fits in pip install -e . with a tiny pyproject.toml.


Storage layout

<AGENT_DATA_DIR>/                  ← .agent_runtime/ by default
├── memory/
│   ├── memories.db                ← 4-type memory store + provenance
│   ├── conversations.db           ← persistent conversation context
│   └── rag/                       ← embedding index cache
├── tasks/
│   └── tasks.db
├── finance/
│   └── finance.db                 ← propose/approve/complete + budget snapshots
├── projects/
│   └── projects.db
├── workspaces/
│   ├── <workspace_id>/            ← per-job isolated workspace
│   └── workspaces.db              ← audit trail with hash chain
├── build/
│   └── builds.db                  ← jobs + artifacts (WAL mode)
├── review/
│   └── reviews.db                 ← jobs + artifacts (WAL mode)
├── control/
│   ├── control.db                 ← plans, traces, cost ledger, settlement
│   └── llm_runtime.json           ← operator runtime LLM override
├── approval/
│   └── approvals.db               ← multi-step approval queue
├── identity/
│   └── owner_profile.json         ← agent + owner identity (post-onboarding)
└── logs/                          ← AGENT_LOG_DIR (default: <data_dir>/logs)
    ├── long/
    │   └── agent-long.log[.YYYY-MM-DD]
    └── short/
        └── agent-short.log[.YYYY-MM-DD-HH]

<AGENT_PROJECT_ROOT>/agent/vault/
└── secrets.enc                    ← v2 single-file (header + salt + Fernet token)

AGENT_DATA_DIR defaults to .agent_runtime/ for fresh installs and agent/ for legacy installs (so existing operators don't have data move under their feet). The vault deliberately stays in the project tree because it's the only file that's both encrypted and required at boot.


Boot sequence

python -m agent
  │
1.  load .env (operator-managed, gitignored)
2.  resolve data_dir (env > legacy detection > .agent_runtime)
3.  pin AGENT_DATA_DIR + AGENT_LOG_DIR + AGENT_PIDFILE_PATH into env
4.  setup_tiered_logging() — installs _TierRouter on root logger,
     switches structlog to stdlib BoundLogger
5.  _check_pidfile() — refuse to start if another instance is running
6.  AgentOrchestrator(data_dir).initialize()
       │
       ├─ memory store (open SQLite, replay WAL)
       ├─ task manager
       ├─ finance tracker (asyncio.Lock per tx)
       ├─ project manager
       ├─ workspace manager (recover orphaned workspaces from SQLite)
       ├─ build storage (WAL mode)
       ├─ review storage
       ├─ approval queue
       ├─ control plane state (plans/traces/cost/settlement)
       ├─ runtime model + LLM runtime control
       ├─ gateway (provider routes)
       ├─ build service + review service
       ├─ intake router
       ├─ recurring workflows + pipeline orchestrator
       ├─ settlement service
       ├─ reporting service
       ├─ vault (open secrets.enc, migrate v1→v2 if needed)
       ├─ message router
       ├─ watchdog
       ├─ job runner (12 cron jobs registered)
       ├─ tool executor (with operator controls)
       ├─ agent brain (wires tool executor)
       ├─ telegram bot + handler
       └─ HTTP API + dashboard
  │
7.  signal handlers (SIGINT, SIGTERM)
8.  enter run loop

If any step fails, the process exits with a clear error message and a non-zero exit code. There is no silent degradation.


Design principles (and the tests that enforce them)

Principle Enforced by
Anti-stochastika — LLM only when no cheaper layer answered test_brain_core.py, test_routing_eval.py
Deny-by-default — unknown tools and unknown channels are blocked test_tool_governance.py, test_policy_regression.py, test_security_invariants.py
Fail-fast — wrong vault key, missing config, corrupt state surface immediately test_vault.py::TestVaultWrongKeyWriteFailFast, test_security_audit.py
Human-in-the-loop for money + host access + external writes test_finance_approval.py, test_multi_step_approval.py, test_approval_queue.py
Persistent state — SQLite everywhere, survives crashes test_workspace_recovery.py, test_persistent_conversation.py, test_control_plane.py
Crash-safe vault writes — single atomic os.replace per write test_vault.py::TestVaultV2Format, TestVaultV2MigrationCrashSafety
Explainability — every decision recorded test_explanation.py, test_action_envelope.py
Sovereign by default — no telemetry leaks test_security_audit.py::TestNoSecretsInLogs, gateway audit

Where to read next

Clone this wiki locally