Skip to content

Roadmap

Daniel Babjak edited this page Mar 28, 2026 · 22 revisions

Roadmap

Completed (v1.0.0 — v1.9.0)

  • Core architecture (orchestrator, router, watchdog, job runner)
  • 4-type memory system (working, episodic, semantic, procedural) + provenance model
  • Epistemic memory: observed/user_asserted/inferred/verified/stale + MemoryKind
  • Memory consolidation (episodic -> semantic/procedural) with conflict detection
  • Persistent conversation (SQLite + FTS5 full-text search)
  • Telegram bot with 17 commands
  • 9-layer processing pipeline (dispatch → cache → RAG → classify → LLM → escalation → learning → filter → explain)
  • Model routing (Haiku/Sonnet/Opus based on task type + learning escalation)
  • Docker sandbox (read-only, no-network, resource limits)
  • Encrypted vault (Fernet AES-128, PBKDF2 480K iterations)
  • Agent-to-Agent API (HTTP, port 8420, replay protection)
  • Learning system (skill tracking, model escalation, prompt augmentation, skill auto-update)
  • Security hardening (safe mode, input sanitization, PID lockfile)
  • Tool governance — capability manifest, deny-by-default policy, 4-step action pipeline
  • Channel policy — per-channel trust levels, file-access enforcement
  • Approval queue — propose → approve → execute, multi-step, TTL, categories
  • Operator controls — disable/enable tools, lockdown/unlock
  • Status model — 7 states, try/finally lifecycle (no stuck states)
  • Explanation log — full decision context (routing, policy, learning, memory)
  • Workspace persistence — SQLite, audit trail, owner_id recovery, TTL cleanup
  • Finance — budget policy (hard/soft caps), risk templates, stale proposal detection
  • CI quality gates — mypy, ruff, 1000+ test count, 60s performance budget, architecture invariants
  • 1,286+ automated tests across the current suite, $0.00 token cost
  • Documentation: SECURITY_MODEL.md, LEARNING_MODEL.md, OPERATOR_HANDBOOK.md, PRODUCT_IDENTITY.md

Newly closed in v1.9.0

  • Build execution now resolves explicit source-aware execution policies before mutable workspace work starts and records those decisions as control-plane traces
  • Runtime model now exposes higher-level local_owner, operator_controlled, and enterprise_hardened operating profiles layered over the lower-level execution environment profiles
  • Builder verification now persists suite-level plus per-step verification artifacts and surfaces them through the build delivery bundle
  • Acceptance reports now flow into delivery as richer operator handoff summaries grouped by criterion status
  • Release workflow repeated successfully with GitHub merge, verified merge commit, GitHub Release, and wiki update

Next — Product Closure (not scope expansion)

Priority 1 — Review And Policy Boundary Closure

  • Route Telegram/API review entrypoints through ReviewService
  • Bring repo/diff analysis under the shared execution and policy boundary
  • Support approvals for risky execution and external delivery

Priority 2 — Runtime Budget Control

  • Add hard budget, soft budget, and stop-loss behavior
  • Surface cost and margin hints to the operator
  • Make escalation budget-aware

Priority 3 — Honest Operator Workflow

  • Reject unsupported remote-acquisition work cleanly and honestly
  • Track richer job failures, retries, and durations
  • Add compliance-friendly retention and evidence packaging
  • Build a live operator backend/UI on top of the current CLI and TS contracts

Priority 4 — External Identity

  • Email for John (Gmail/Proton — Daniel must create, captcha)
  • X.com account (Daniel must create, phone verification)
  • Claim John on Moltbook (already registered as john-b2jk, pending_claim)

Blocked by: Daniel needs to create email and X.com accounts manually.

Priority 5 — Earning Module

  • Agent finds work opportunities
  • Proposes to Daniel
  • Human approves
  • Agent executes (with sandbox)
  • Revenue tracked in finance module

Goal: "Agent earned $1" milestone.

Known Limits

  • CLI token overhead: 26k tokens for simple "Ahoj" (Claude adds CLAUDE.md context)
  • No operator dashboard UI yet (CLI + mock TS surface only)
  • Shared persistence, retention, runtime budget enforcement, and review-focused evidence packaging now exist, but broader execution-policy unification and live operator/backend surfaces are still incomplete
  • Planner output is now durable and queryable, but not yet a full distributed execution history
  • Semantic model uses ~1.5GB RAM
  • trycloudflare URLs change on restart (no named tunnel)

Clone this wiki locally