-
Notifications
You must be signed in to change notification settings - Fork 0
Roadmap
What's done, what's open, and what's intentionally out of scope. Updated for v1.35.0 (2026-04-08).
This page tracks the direction. For specific in-flight tickets see the GitHub issues. For the per-release breakdown see the CHANGELOG.
| Phase | Theme | Status |
|---|---|---|
| Phase 0 | Foundation | ✅ Closed (v1.0 → v1.7) |
| Phase 1 | Reviewer | ✅ Closed (v1.8 → v1.16) |
| Phase 2 | Builder | ✅ Closed (v1.17 → v1.24) |
| Phase 3 | Operatorization | ✅ Closed (v1.25 → v1.28) |
| Phase 4 | Enterprise hardening | ✅ Closed (v1.29 → v1.34) |
| Phase 5 | Reliability + crash safety | ✅ Closed (v1.35) |
| Phase 6 | God-object refactor | 🔵 Planned |
| Phase 7 | Multi-channel | 🔵 Future |
| Phase 8 | Earning | 🔵 Future |
Closed in v1.35.0:
- ✅ Tiered structured logging with deterministic per-tier retention
- ✅ Vault single-file v2 format (atomic writes, embedded salt, crash-safe)
- ✅ Wrong-key vault writes fail-fast (
VaultDecryptionError) - ✅ Runtime LLM operator control (persistent override, dashboard + HTTP + CLI)
- ✅ Brain conversation history fix on short follow-ups ("ano" no longer loses context)
- ✅ Anti-echo work-queue detector (pasted assistant text doesn't spawn duplicate jobs)
- ✅ Telegram + Claude CLI fail-closed deny guard (programming task in sandbox-only mode)
- ✅ Headless CLI auto-approve (
AGENT_CLI_AUTO_APPROVE, sandbox lockdown via--disallowed-tools) - ✅ Build pipeline
AUDIT_MARKER_ONLYfail-closed (codegen failure no longer passes verify) - ✅ Dashboard XSS escapes + Bearer-only auth (no
?key=query string fallback) - ✅ SQL injection hardening in
agent/build/storage.pyandagent/review/storage.py - ✅ Finance per-tx
asyncio.Lockagainst concurrent approve race - ✅ Telegram in-flight task tracking (no GC mid-execution)
- ✅ Nonce cache age-based eviction (replay-protection state cannot grow unbounded)
- ✅ PID file handle leak fix
- ✅ CI release-readiness skip env (
AGENT_RELEASE_READINESS_SKIP_LLM_PROBE=1for runners without Claude CLI) - ✅ Log retention env contract unification (
_HOURSis canonical,_DAYSdeprecated) - ✅ mypy 147 errors → 0 across 112 source files
- ✅ Operator setup guide + security incident post-mortem
Several control-plane classes are over their LOC budget and need extraction. None of these are immediate bugs — they're medium-term regression risk.
| Target | Current | LOC | Risk |
|---|---|---|---|
agent/build/service.py::BuildService |
4103 LOC, 103 methods | 🔴 Critical | Hardest seam to read; verification, codegen, mutation, acceptance, delivery all in one file |
agent/social/telegram_handler.py::TelegramHandler |
2364 LOC, 42 methods | 🔴 Critical | 22 /cmd_* methods + 504-line _handle_text
|
agent/core/agent.py::AgentOrchestrator |
~2000 LOC, 68 methods | 🟠 High |
__init__ is 240 lines; intake submission 200+ lines |
agent/social/agent_api.py::AgentAPI |
1211 LOC, 37 methods | 🟠 High | Per-route handlers in one file |
agent/core/brain.py::AgentBrain |
~1080 LOC, 15 methods | 🟠 High | 9-layer pipeline + helpers, hard to extend without refactor |
Planned extractions (in order of risk, lowest first):
-
ExternalServicesBootstrap— extract fromagent/core/agent.py:100-238(init phase 1: query services + module wiring). Single caller (AgentOrchestrator.__init__), no business logic, just wiring. Low risk. -
BuildVerificationOrchestrator— extract fromagent/build/service.py:1420-2020(verification step discovery + execution + artifact capture). 2-3 callsites, clear seam. Medium risk. -
OperatorIntakeProcessor— extract fromagent/core/agent.py:574-885(qualify + preview + submit + budget logic). HIGH risk because of tight coupling to finance, approval, control plane. Needs new unit tests first.
These are tracked but not yet scheduled. Doing them right needs a quiet patch window with no security backlog.
Today the production channels are Telegram + dashboard + HTTP API. Other channels exist as interfaces but are not implemented:
| Channel | Status |
|---|---|
| Telegram | ✅ Production |
| HTTP API | ✅ Production |
| Dashboard | ✅ Production |
| Discord | 🔵 Interface only |
| 🔵 Interface only | |
| Matrix | 🔵 Not started |
| Slack | 🔵 Not started |
| Voice | 🔵 Not started |
The brain pipeline is already channel-agnostic — IncomingMessage carries channel_type and the policy enforces accordingly. Adding a channel is a matter of writing a new adapter that emits IncomingMessage and consumes the response. The hard parts (auth, safe mode, channel policy, file access enforcement) are already in place.
The agent module that finds work, proposes a price, gets approval, executes, and collects payment. This is the long-tail vision but it's not on the immediate roadmap because the upstream pieces aren't in place yet:
- ✅ Build pipeline (Phase 2)
- ✅ Review pipeline (Phase 1)
- ✅ Approval queue (Phase 0)
- ✅ Cost ledger (Phase 3)
- ✅ Settlement workflow (Phase 4)
- ⬜ Outbound work discovery — agent searches for matching tasks
- ⬜ Pricing engine — agent estimates cost + margin
- ⬜ Customer-facing delivery surface — invoices, receipts, contract artifacts
- ⬜ Multi-tenant identity — separate operator profiles per customer
Each of these is a real piece of work. The earning loop closes when all 9 are in production.
Tracked in GitHub issues:
-
[refactor] Extract ExternalServicesBootstrap from AgentOrchestrator.__init__(Phase 6, low risk) -
[ops] Add log retention smoke test to CI(currently only unit-tested) -
[docs] Wiki page for the recurring workflow + multi-job pipeline subsystems(covered in Build/Review pipelines, but they deserve their own page) -
[security] Red-team test suite (multi-step escalation, cross-channel attack)(Phase 9 candidate) [ops] Telegram alert when CI release-readiness gate fails on main
What the agent does NOT currently do, and won't until somebody opens an issue with a real use case:
| Area | Limit |
|---|---|
| Memory | Conflict detection is tag-based, not semantic. No automatic merge of contradictory facts. |
| Tool governance | Build execution is governed but not yet by a single unified engine — review/intake have a tighter loop. |
| Workspace | No automatic cleanup scheduler beyond the dead-man switch. Manual --prune-expired-retained-artifacts for now. |
| Routing | Keyword + signal heuristics. No ML classification beyond the optional sentence-transformers fallback. |
| Learning | Model failure tracking is per-process. Resets on restart. No persistent eval set. |
| Finance | Approval + settlement work, but multi-user/multi-currency/tax accounting are out of scope. |
| Multi-channel | Telegram only in production. Discord/email are stubs. |
| Dashboard | API-protected operator UI, but it's still single-operator. Multi-operator with role separation is future work. |
| Gateway | Only one external provider (obolos.tech). Seller-side publishing is on Phase 8's TBD list. |
| Semantic model | ~1.5 GB RAM resident. There's no smaller alternative wired in. |
| CLI | ~26k tokens overhead per call (Claude Code adds context). Use API backend for cost-sensitive workloads. |
| Cloudflare tunnel | Uses trycloudflare quick tunnels which change URL on restart. Named tunnels work but require operator setup. |
These are out-of-scope by design:
- Managed SaaS hosting. The whole point is sovereignty.
- Closed-source enterprise tier. MIT forever.
- Telemetry / phone home. No call-home, no usage stats, no opt-in analytics.
- Auto-merge to main. Builds produce delivery packages; merging is human-only.
- Autonomous money sending. Wallets are read-only inside the agent. Always.
- DeFi / trading / smart contracts. Out of scope.
- Replacing the operator. This is a power tool for an operator, not an operator replacement.
- Open a GitHub issue describing what you need and why.
- If the use case is real and reproducible, it gets a milestone.
- Security findings get prioritized over features. Always.
- Operator UX over architecture purity. The wiki and CHANGELOG are the contract.
- Backwards compatibility unless we're explicit about a breaking change in the CHANGELOG. We hate forced migrations as much as you do.
v1.35.0 · Latest Release
Getting started
Architecture
Subsystems
- Security model
- Vault
- Tiered logging
- Runtime LLM control
- Build pipeline
- Review pipeline
- Finance
- Cron & Maintenance
Development