Skip to content

Roadmap

Daniel Babjak edited this page Apr 8, 2026 · 22 revisions

Roadmap

What's done, what's open, and what's intentionally out of scope. Updated for v1.35.0 (2026-04-08).

This page tracks the direction. For specific in-flight tickets see the GitHub issues. For the per-release breakdown see the CHANGELOG.


Phases

Phase Theme Status
Phase 0 Foundation ✅ Closed (v1.0 → v1.7)
Phase 1 Reviewer ✅ Closed (v1.8 → v1.16)
Phase 2 Builder ✅ Closed (v1.17 → v1.24)
Phase 3 Operatorization ✅ Closed (v1.25 → v1.28)
Phase 4 Enterprise hardening ✅ Closed (v1.29 → v1.34)
Phase 5 Reliability + crash safety ✅ Closed (v1.35)
Phase 6 God-object refactor 🔵 Planned
Phase 7 Multi-channel 🔵 Future
Phase 8 Earning 🔵 Future

Phase 5 — Reliability + Crash Safety (v1.35.0, current)

Closed in v1.35.0:

  • ✅ Tiered structured logging with deterministic per-tier retention
  • ✅ Vault single-file v2 format (atomic writes, embedded salt, crash-safe)
  • ✅ Wrong-key vault writes fail-fast (VaultDecryptionError)
  • ✅ Runtime LLM operator control (persistent override, dashboard + HTTP + CLI)
  • ✅ Brain conversation history fix on short follow-ups ("ano" no longer loses context)
  • ✅ Anti-echo work-queue detector (pasted assistant text doesn't spawn duplicate jobs)
  • ✅ Telegram + Claude CLI fail-closed deny guard (programming task in sandbox-only mode)
  • ✅ Headless CLI auto-approve (AGENT_CLI_AUTO_APPROVE, sandbox lockdown via --disallowed-tools)
  • ✅ Build pipeline AUDIT_MARKER_ONLY fail-closed (codegen failure no longer passes verify)
  • ✅ Dashboard XSS escapes + Bearer-only auth (no ?key= query string fallback)
  • ✅ SQL injection hardening in agent/build/storage.py and agent/review/storage.py
  • ✅ Finance per-tx asyncio.Lock against concurrent approve race
  • ✅ Telegram in-flight task tracking (no GC mid-execution)
  • ✅ Nonce cache age-based eviction (replay-protection state cannot grow unbounded)
  • ✅ PID file handle leak fix
  • ✅ CI release-readiness skip env (AGENT_RELEASE_READINESS_SKIP_LLM_PROBE=1 for runners without Claude CLI)
  • ✅ Log retention env contract unification (_HOURS is canonical, _DAYS deprecated)
  • ✅ mypy 147 errors → 0 across 112 source files
  • ✅ Operator setup guide + security incident post-mortem

Phase 6 — God-object refactor (planned)

Several control-plane classes are over their LOC budget and need extraction. None of these are immediate bugs — they're medium-term regression risk.

Target Current LOC Risk
agent/build/service.py::BuildService 4103 LOC, 103 methods 🔴 Critical Hardest seam to read; verification, codegen, mutation, acceptance, delivery all in one file
agent/social/telegram_handler.py::TelegramHandler 2364 LOC, 42 methods 🔴 Critical 22 /cmd_* methods + 504-line _handle_text
agent/core/agent.py::AgentOrchestrator ~2000 LOC, 68 methods 🟠 High __init__ is 240 lines; intake submission 200+ lines
agent/social/agent_api.py::AgentAPI 1211 LOC, 37 methods 🟠 High Per-route handlers in one file
agent/core/brain.py::AgentBrain ~1080 LOC, 15 methods 🟠 High 9-layer pipeline + helpers, hard to extend without refactor

Planned extractions (in order of risk, lowest first):

  1. ExternalServicesBootstrap — extract from agent/core/agent.py:100-238 (init phase 1: query services + module wiring). Single caller (AgentOrchestrator.__init__), no business logic, just wiring. Low risk.
  2. BuildVerificationOrchestrator — extract from agent/build/service.py:1420-2020 (verification step discovery + execution + artifact capture). 2-3 callsites, clear seam. Medium risk.
  3. OperatorIntakeProcessor — extract from agent/core/agent.py:574-885 (qualify + preview + submit + budget logic). HIGH risk because of tight coupling to finance, approval, control plane. Needs new unit tests first.

These are tracked but not yet scheduled. Doing them right needs a quiet patch window with no security backlog.


Phase 7 — Multi-channel (future)

Today the production channels are Telegram + dashboard + HTTP API. Other channels exist as interfaces but are not implemented:

Channel Status
Telegram ✅ Production
HTTP API ✅ Production
Dashboard ✅ Production
Discord 🔵 Interface only
Email 🔵 Interface only
Matrix 🔵 Not started
Slack 🔵 Not started
Voice 🔵 Not started

The brain pipeline is already channel-agnosticIncomingMessage carries channel_type and the policy enforces accordingly. Adding a channel is a matter of writing a new adapter that emits IncomingMessage and consumes the response. The hard parts (auth, safe mode, channel policy, file access enforcement) are already in place.


Phase 8 — Earning (future)

The agent module that finds work, proposes a price, gets approval, executes, and collects payment. This is the long-tail vision but it's not on the immediate roadmap because the upstream pieces aren't in place yet:

  1. ✅ Build pipeline (Phase 2)
  2. ✅ Review pipeline (Phase 1)
  3. ✅ Approval queue (Phase 0)
  4. ✅ Cost ledger (Phase 3)
  5. ✅ Settlement workflow (Phase 4)
  6. ⬜ Outbound work discovery — agent searches for matching tasks
  7. ⬜ Pricing engine — agent estimates cost + margin
  8. ⬜ Customer-facing delivery surface — invoices, receipts, contract artifacts
  9. ⬜ Multi-tenant identity — separate operator profiles per customer

Each of these is a real piece of work. The earning loop closes when all 9 are in production.


Open backlog (immediate)

Tracked in GitHub issues:

  • [refactor] Extract ExternalServicesBootstrap from AgentOrchestrator.__init__ (Phase 6, low risk)
  • [ops] Add log retention smoke test to CI (currently only unit-tested)
  • [docs] Wiki page for the recurring workflow + multi-job pipeline subsystems (covered in Build/Review pipelines, but they deserve their own page)
  • [security] Red-team test suite (multi-step escalation, cross-channel attack) (Phase 9 candidate)
  • [ops] Telegram alert when CI release-readiness gate fails on main

Known limits

What the agent does NOT currently do, and won't until somebody opens an issue with a real use case:

Area Limit
Memory Conflict detection is tag-based, not semantic. No automatic merge of contradictory facts.
Tool governance Build execution is governed but not yet by a single unified engine — review/intake have a tighter loop.
Workspace No automatic cleanup scheduler beyond the dead-man switch. Manual --prune-expired-retained-artifacts for now.
Routing Keyword + signal heuristics. No ML classification beyond the optional sentence-transformers fallback.
Learning Model failure tracking is per-process. Resets on restart. No persistent eval set.
Finance Approval + settlement work, but multi-user/multi-currency/tax accounting are out of scope.
Multi-channel Telegram only in production. Discord/email are stubs.
Dashboard API-protected operator UI, but it's still single-operator. Multi-operator with role separation is future work.
Gateway Only one external provider (obolos.tech). Seller-side publishing is on Phase 8's TBD list.
Semantic model ~1.5 GB RAM resident. There's no smaller alternative wired in.
CLI ~26k tokens overhead per call (Claude Code adds context). Use API backend for cost-sensitive workloads.
Cloudflare tunnel Uses trycloudflare quick tunnels which change URL on restart. Named tunnels work but require operator setup.

Things that will NEVER be on the roadmap

These are out-of-scope by design:

  • Managed SaaS hosting. The whole point is sovereignty.
  • Closed-source enterprise tier. MIT forever.
  • Telemetry / phone home. No call-home, no usage stats, no opt-in analytics.
  • Auto-merge to main. Builds produce delivery packages; merging is human-only.
  • Autonomous money sending. Wallets are read-only inside the agent. Always.
  • DeFi / trading / smart contracts. Out of scope.
  • Replacing the operator. This is a power tool for an operator, not an operator replacement.

How to influence the roadmap

  1. Open a GitHub issue describing what you need and why.
  2. If the use case is real and reproducible, it gets a milestone.
  3. Security findings get prioritized over features. Always.
  4. Operator UX over architecture purity. The wiki and CHANGELOG are the contract.
  5. Backwards compatibility unless we're explicit about a breaking change in the CHANGELOG. We hate forced migrations as much as you do.

Clone this wiki locally