Build Pipeline

Build pipeline

The build pipeline is the bounded context that turns "implement this feature" into actual files on disk inside an isolated workspace, verifies them, runs acceptance evaluation, and packages a delivery the operator can ship.

Code: agent/build/. Tests: tests/test_build_domain.py.

Design contract. A build is a deterministic transformation of a workspace. The codegen step (LLM) is the only stochastic component. Everything around it — workspace setup, mutation apply, verification, acceptance, delivery — is deterministic and resumable.

Lifecycle

operator (telegram /build, /intake, or HTTP /api/build)
    │
    ▼
intake.qualify_operator_intake()
    │   ├─ validate work_type, repo, description, focus_areas
    │   └─ structured denial if blocked
    ▼
intake.preview_operator_intake()  ← optional dry-run
    │
    ▼
intake.submit_operator_intake()
    │   ├─ budget check (hard cap, stop-loss, single-tx approval cap)
    │   ├─ multi-step approval if required
    │   └─ schedules BuildJob
    ▼
BuildService.run_build()
    │
    ├─ 1. workspace setup
    │     - WorkspaceManager creates an isolated dir under workspaces/<id>/
    │     - hash-chained audit trail records every file mutation
    │     - workspace_id linked to build job in SQLite
    │
    ├─ 2. capability validation
    │     - check the requested capability is in the catalog
    │     - block unsupported scopes early
    │
    ├─ 3. discover or accept implementation plan
    │     - if operator provided plan_file → use it
    │     - else call codegen (LLM → BuildOperation[])
    │     - AUDIT_MARKER_ONLY guard: refuse if codegen failed or plan is empty
    │
    ├─ 4. apply mutations
    │     - 10 supported mutation types (see below)
    │     - each mutation hashed into the workspace audit chain
    │     - capability scope enforced per-mutation
    │
    ├─ 5. verification
    │     - discover or accept verification plan
    │     - run test/lint/typecheck steps in Docker (build/docker_executor.py)
    │     - 256MB RAM, no network, image whitelist
    │     - capture artifacts: stdout, stderr, exit code, duration
    │
    ├─ 6. acceptance evaluation
    │     - acceptance criteria from operator (or auto-discovered)
    │     - 3 evaluator types: auto, verify, review
    │     - each criterion scored independently
    │
    ├─ 7. artifact persistence
    │     - BuildStorage (SQLite WAL mode) saves job + every artifact
    │     - artifacts: patch.diff, build_report.md, verification.json, ...
    │     - whitelisted DDL identifiers, parameterized writes
    │
    └─ 8. delivery package
          - prepare bundle (workspace tarball + artifacts)
          - awaiting_approval state
          - operator approves via /deliver, /api/operator/deliveries, or dashboard
          - handed_off → operator can rsync/scp/git push the bundle

Build job model

agent/build/models.py::BuildJob:

Field	Type	Purpose
`id`	str	UUID-like, used everywhere as the join key
`build_type`	enum	`IMPLEMENT_FROM_SCRATCH`, `IMPLEMENT_FROM_PLAN`, `BOUNDED_REFACTOR`, ...
`status`	enum	`PROPOSED`, `RUNNING`, `VERIFYING`, `COMPLETED`, `FAILED`, `BLOCKED`, `DELIVERED`
`implementation_mode`	enum	`LLM_GENERATED`, `OPERATOR_PLAN`, `AUDIT_MARKER_ONLY`
`repo_path`	str	Where the source code lives
`description`	str	Operator's natural-language ask
`target_files`	list[str]	Optional scoping hint
`implementation_plan`	list[BuildOperation]	The mutation plan (from codegen or operator)
`implementation_results`	list[BuildOperationResult]	What actually happened per mutation
`verification_plan`	list[VerificationStep]	test/lint/typecheck steps to run
`verification_results`	list[VerificationResult]	Per-step outcome
`acceptance_criteria`	list[AcceptanceCriterion]	Operator's "is this done?" definition
`acceptance_results`	list[AcceptanceResult]	Per-criterion verdict
`workspace_id`	str	Link to `agent/work/workspace.py`
`delivery_state`	enum	`NONE`, `PREPARED`, `AWAITING_APPROVAL`, `APPROVED`, `HANDED_OFF`, `REJECTED`
`timing`	object	created_at, started_at, completed_at, durations
`usage`	object	input_tokens, output_tokens, cost_usd

Mutation types

Defined in BuildOperation.kind:

Kind	What it does
`create_file`	Write a new file (no overwrite — fails if exists)
`overwrite_file`	Replace an existing file's contents
`edit_file`	Targeted edit (find + replace, with context match)
`delete_file`	Remove a file (capability scope must allow it)
`copy_file`	Copy from `source_path` to `target_path`
`move_file`	Atomic rename within the workspace
`create_dir`	mkdir -p
`append_file`	Append text to an existing file
`prepend_file`	Insert text at the top
`apply_patch`	Apply a unified diff

Every mutation:

Validates its target_path against the capability scope (allowed prefixes/suffixes per build type).
Records itself in the workspace audit chain (each entry hashed with the previous, so tampering is detectable).
Returns a structured BuildOperationResult with success, error, bytes_written, path.

If any mutation fails and stop_on_first_failure=True, the rest of the plan is skipped and the job moves to FAILED with the failed operation listed in artifacts.

The `AUDIT_MARKER_ONLY` fail-closed guard

Before v1.35.0, a build with implementation_mode == AUDIT_MARKER_ONLY (used for builds that only emit a marker file rather than mutate code) could pass verify even if the codegen LLM call failed and returned no real plan. The job would silently complete with an empty result.

The fix in agent/build/service.py rejects this combination explicitly:

if (
    job.implementation_mode == BuildImplementationMode.AUDIT_MARKER_ONLY
    and (codegen_failed or no_real_plan)
):
    job.status = BuildJobStatus.FAILED
    job.fail_reason = "AUDIT_MARKER_ONLY: codegen failed or returned no real plan"
    return job

Tested in tests/test_build_domain.py::TestCodegenFallbackGuard.

Verification

Verification steps live in agent/build/verification.py. Each step has:

kind: pytest, ruff, mypy, docker_build, custom_command, ...
command: shell command (run inside Docker if requires_isolation=True)
working_dir: relative to the workspace root
timeout_seconds: hard cap
success_signal: regex or exit-code condition

The discovered plan is the default (auto-detect pyproject.toml, package.json, Cargo.toml, ...). The operator can override with an explicit --build-verification-file.

Verification results are persisted as BuildArtifact(kind=VERIFICATION).

Docker isolation

Build commands marked requires_isolation=True run inside a Docker container managed by agent/build/docker_executor.py:

docker run --rm \
  --network=none \                 # No network during tests
  --memory=512m --cpus=1.0 \       # 512 MB / 1 CPU per build
  --read-only \                    # Workspace mounted read-only
  --tmpfs /work:size=512m \        # Writable copy of the workspace
  --security-opt=no-new-privileges \
  --pids-limit=50 \
  --timeout=300s \                 # 5-minute hard cap
  python:3.12-slim                 # Image whitelist (same as sandbox)

The workspace is mounted read-only and the build container gets a writable tmpfs copy. After verification, the diff between the original workspace and the tmpfs is captured as patch.diff artifact and applied back to the workspace audit trail.

This isolation is what makes builds safe to run without operator approval per-mutation. The blast radius is contained by the kernel.

Acceptance criteria

AcceptanceCriterion has three evaluator types:

Evaluator	What it does
`auto`	Static check on the workspace (file exists, function exists, line count below threshold, ...)
`verify`	Re-uses a verification step's result (e.g. "all pytest tests pass")
`review`	Requires operator approval — blocks `COMPLETED` until manually accepted

Example acceptance file:

{
  "criteria": [
    {
      "id": "tests-green",
      "evaluator": "verify",
      "verification_step_id": "pytest",
      "expected": "passed"
    },
    {
      "id": "no-todo-comments",
      "evaluator": "auto",
      "check": "grep_count",
      "pattern": "TODO|FIXME",
      "max_count": 0
    },
    {
      "id": "operator-sign-off",
      "evaluator": "review",
      "description": "operator confirms the new endpoint behaves correctly"
    }
  ]
}

A criterion that fails marks the whole job as BLOCKED (not FAILED) so the operator can either fix it manually or override.

Delivery packaging

After acceptance, the build moves into the delivery lifecycle:

PREPARED  ← bundle is built (workspace tarball + manifest + artifacts)
   │
   ▼
AWAITING_APPROVAL  ← operator must approve via /deliver, /api/operator/deliveries, or dashboard
   │
   ├─ APPROVED  → HANDED_OFF (operator picks up the bundle)
   │
   └─ REJECTED  → operator can re-run, modify, or close the job

Delivery state changes are recorded as control plane traces and surfaced in the operator inbox (agent --report).

Storage layout (`agent/build/storage.py`)

CREATE TABLE build_jobs (
    id TEXT PRIMARY KEY,
    data TEXT NOT NULL,           -- full BuildJob JSON
    status TEXT,
    build_type TEXT,
    created_at TEXT
);

CREATE TABLE build_artifacts (
    id TEXT PRIMARY KEY,
    job_id TEXT NOT NULL,
    artifact_kind TEXT,
    content TEXT,
    content_json TEXT,
    format TEXT DEFAULT 'text',
    created_at TEXT
);

WAL mode (PRAGMA journal_mode=WAL) for safe concurrent reads.
check_same_thread=False because the asyncio loop hops between threads when running blocking work in executors.
Dynamic DDL (_ensure_text_column) goes through the whitelist + identifier regex + escape pattern — same approach as agent/review/storage.py.
Artifact content is capped at 5 MB per row; oversized payloads are truncated with a warning event.

Operator surface

Channel	Command	What
Telegram	`/build <description>`	Quick build with auto-plan
Telegram	`/intake build <description>`	Unified intake (qualify → preview → submit)
Telegram	`/jobs`	List recent build jobs with status
Telegram	`/deliver <job_id>`	Show delivery state, approve/reject
HTTP	`POST /api/operator/intake`	Same as `/intake build`
HTTP	`GET /api/operator/jobs`	Same as `/jobs`
HTTP	`POST /api/operator/deliveries/<id>/approve`	Approve delivery
CLI	`python -m agent --build-repo . --build-description "..."`	Direct build invocation
CLI	`python -m agent --build-repo . ... --build-plan-file plan.json`	Use a pre-built plan
CLI	`python -m agent --build-repo . ... --build-acceptance-file acceptance.json`	Custom acceptance criteria
Dashboard	Build panel	Job listing, delivery preview, approve/reject buttons

Things the build pipeline does NOT do

It doesn't push to git. The build produces a workspace bundle and a delivery package. The operator picks it up and pushes (or runs git apply, or rsyncs to a deploy host). This is intentional — pushing is irreversible and human-only.
It doesn't run untrusted shell commands on the host. Every command marked requires_isolation=True runs inside Docker. Verification commands that operate on the workspace (e.g. grep) can run on the host because they're read-only.
It doesn't auto-merge with main. Build deliveries are bundles, not PRs. The operator decides how to integrate.
It doesn't bypass the budget. Every codegen LLM call goes through the budget check. A build that would exceed the daily hard cap is blocked at intake.
It doesn't remember failed plans. If codegen fails twice in a row, the operator has to either provide a manual plan or wait for the rate-limit/budget reset. There is no auto-retry loop on stochastic failures.

Repo · CHANGELOG · Releases · Issues · MIT License

Agent Life Space

v1.35.0 · Latest Release

Getting started

Architecture

Subsystems

Development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build Pipeline

Build pipeline

Lifecycle

Build job model

Mutation types

The `AUDIT_MARKER_ONLY` fail-closed guard

Verification

Docker isolation

Acceptance criteria

Delivery packaging

Storage layout (`agent/build/storage.py`)

Operator surface

Things the build pipeline does NOT do

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agent Life Space

Clone this wiki locally

Build Pipeline

Build pipeline

Lifecycle

Build job model

Mutation types

The AUDIT_MARKER_ONLY fail-closed guard

Verification

Docker isolation

Acceptance criteria

Delivery packaging

Storage layout (agent/build/storage.py)

Operator surface

Things the build pipeline does NOT do

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agent Life Space

Clone this wiki locally

The `AUDIT_MARKER_ONLY` fail-closed guard

Storage layout (`agent/build/storage.py`)