| date | 2026-04-12 | ||||
|---|---|---|---|---|---|
| title | Verge MVP Implementation Plan | ||||
| tags |
|
This document turns the control-plane idea in Verge CI/CD Control Plane into a practical first implementation plan.
The goal of the MVP is not to build a perfect scheduler or a giant workflow engine. The goal is to prove the core loop:
event in -> plan -> run -> record evidence -> query current repository health
If the first version cannot persist evidence and answer useful health questions from it, then it is only a job runner.
The intended public runtime model is:
run -> step -> process -> observation
In plain terms:
- a
runis one commit-level, PR-level, or manual evaluation - a
stepis a major check inside a run, likebuild,test, orlint - a
processis the smallest meaningful thing Verge tracks as its own result - an
observationis the recorded result
A process should not be an execution chunk, group, or shard. It should be the real unit of evidence, such as one test, one lint target, one build target, or one document check.
The first MVP should let a single repository:
- ingest GitHub push and pull request events
- register a small set of steps and processes
- decide which steps and processes to run for a change
- execute those steps and processes through workers
- record evidence, logs, artifacts, and run lifecycle
- show current run progress and current repository health
- reuse at least one safe cached result
- resume at least one cooperative process from a checkpoint
- run Verge's own processes against the Verge repository itself
The MVP should be able to answer these queries reliably:
- what steps and processes ran for commit X?
- what is still running right now?
- what repo areas were observed by the latest runs?
- what evidence is fresh, stale, unknown, or failed?
- what work was reused instead of rerun?
- what evidence exists for this pull request?
If a proposed feature does not materially improve one of those answers, it should probably wait.
- single-repository support
- self-hosting on the Verge repository as the primary test target
- GitHub App webhook ingestion
- static step registry
- Postgres-backed planner and work queue
- worker execution with leases and heartbeats
- process-level evidence for all processes
- optional finer-grained evidence for cooperative process types when needed
- logs and artifacts in an object storage abstraction
- live run status in an API and dashboard
- one safe cache reuse path
- one cooperative checkpoint/resume path
The MVP should standardize on VoidZero-aligned frontend and JavaScript tooling wherever that choice is available.
At minimum, the implementation should use:
pnpmworkspaces for package management- Vite for frontend development and build
- Oxlint for linting
- Oxfmt for formatting
Where adjacent tooling choices are needed, prefer the same ecosystem and a minimal toolchain surface over mixing multiple overlapping tools.
This matters for two reasons:
- Verge should dogfood a modern, fast toolchain in its own repository
- the first managed steps should exercise those tools directly on the Verge repo
All application code should be written in valid TypeScript. Plain .js application files should not be introduced.
UI and visual design work should use the Impeccable skill set from https://impeccable.style/ as the default design guidance layer.
This should be used for:
- page and dashboard design
- typography and spacing decisions
- layout refinement
- visual polish and anti-pattern review
For Verge, this means frontend implementation should not stop at functional correctness. UI work should be reviewed and refined through the Impeccable workflow before it is treated as complete.
The MVP should keep object storage as an explicit product concept, but local development should not depend on a separately running S3-compatible service.
The initial approach should be:
- define a storage interface for logs, artifacts, and checkpoints
- use a filesystem-backed adapter for local development and early MVP work
- preserve object-store-style semantics in the metadata model
- add a real S3-compatible adapter later without changing the domain model
This keeps Phase 0 simple while still designing the system around durable artifact storage from the start.
The repo should use a small pnpm workspace with a strict separation between deployable apps, shared packages, infrastructure assets, and docs.
The package setup should follow the same broad shape as ../openclaw-1:
- a root workspace package for orchestration scripts and shared config
apps/*for deployable servicespackages/*for reusable libraries- no feature code living at the repository root
The initial layout should be:
/
apps/
api/
web/
worker/
packages/
contracts/
core/
db/
infra/
local/
k8s/
scripts/
docs/
package.json
pnpm-workspace.yaml
tsconfig.base.json
vitest.config.ts
Layout rules:
apps/contains only deployable servicespackages/contains shared TypeScript libraries with no deployment-specific concernsinfra/local/contains local development infrastructure such as Compose files and seed helpersinfra/k8s/contains deployment manifests or Helm/Kustomize material when Kubernetes support is addedscripts/contains small repository automation scripts only- the repository root should hold workspace-level config, not feature code
- the root
package.jsonshould act as a workspace orchestrator, not as the main application package pnpm-workspace.yamlshould include.,apps/*, andpackages/*
The first workspace packages should have these responsibilities:
apps/api
- Fastify control-plane API
- webhook ingestion
- run planning orchestration
- SSE endpoints
- GitHub integration
apps/web
- React + Vite dashboard
- repository health views
- run detail views
- live updates
apps/worker
- process execution
- lease claiming
- heartbeat and observation emission
- log and artifact reporting
packages/contracts
- shared Zod schemas
- API request and response contracts
- worker protocol payloads
- event payload schemas
packages/core
- domain types that are not transport-specific
- planning rules
- evidence and freshness logic
- step and process helpers
- fingerprinting and reuse decision helpers
packages/db
- Kysely database types
- migrations
- query helpers
- transaction boundaries for core persistence paths
The first storage adapter should be filesystem-backed for local development. A real S3-compatible adapter should be added after the core persistence and query paths are stable.
This is intentionally small. New packages should only be added when a dependency direction problem appears, not preemptively.
The following decisions should be treated as fixed for the first implementation pass.
- package manager:
pnpm - workspace shape: root package plus
apps/*andpackages/* - root package responsibility: shared scripts, dependency policy, and workspace-level config only
- root workspace file:
packages:
- .
- apps/*
- packages/*- root TypeScript config:
tsconfig.base.json - root test config:
vitest.config.ts - package naming convention:
@verge/api@verge/web@verge/worker@verge/contracts@verge/core@verge/db
- Phase 0 and Phase 1 worker execution target: local subprocess runner
- Kubernetes execution remains a later deployment target, not the first bootstrap path
- initial event ingestion order:
- manual runs
- GitHub webhook ingestion
- unit and integration test runner:
vitest - docs validation: lightweight frontmatter and link validation, not a heavyweight docs toolchain
- formatting:
oxfmt - linting:
oxlint
- Verge should provide a generic step and process materialization model in TypeScript for all projects that use the library
- each project should define its own concrete processes in TypeScript config
- each process should map to the smallest meaningful tracked unit for that step
- the initial materialization kinds should be:
singleProcessnamedProcessesdiscoveredProcesses
- process metadata may include a file path when that exists, but file paths should not be required for every process
- raw glob-heavy process selection should not be the primary long-term API
- local infrastructure tool:
docker compose - local services for Phase 0: Postgres only
- artifact and checkpoint storage in local development: filesystem-backed storage adapter
- query builder and migrations: Kysely in
packages/db - schema changes should be migration-driven from the first commit
The root package should expose these workspace-level scripts:
devbuildlintformatformat:checktypechecktestdocs:validate
The first implementation may add narrower scripts such as dev:api or test:watch, but the commands above should exist from the start.
- general multi-repo tenancy
- sophisticated information-gain scheduling
- universal checkpointing for arbitrary scripts
- full flaky-signal scoring
- deep historical identity repair
- replacing test runners or build tools
- Temporal or another heavyweight workflow engine
The first MVP should start with a tiny process catalog. Choose processes that exercise the model without requiring too much adapter complexity.
Suggested initial processes:
lintunit-testsbuilddocs-validate
Suggested cooperative process for checkpoint support:
e2e-smokeor another scenario-based process that can emit scenario-level progress and resume from explicit phase boundaries
For the first real proving ground, these processes should run on the Verge repository itself. The system should dogfood its own control plane instead of relying on a separate sample repository.
For the Verge repo specifically, the initial process set should map to real project commands built around the selected toolchain, including oxlint, oxfmt, and vite-backed build and dev validation where applicable.
The first implementation should use a boring control-plane architecture:
- React + Vite dashboard
- Fastify API
- Postgres as source of truth
- an object storage abstraction for logs, artifacts, and checkpoints
- worker service for process execution
- Kubernetes Jobs or Deployments for workers
- GitHub App integration for webhook ingestion and commit status updates
The orchestration model should stay simple:
- GitHub sends an event.
- The API stores a top-level run trigger record.
- The planner creates the steps and concrete processes for that run.
- The planner either marks a step as reused or enqueues it.
- A worker claims queued work using a lease.
- The worker executes the process and streams heartbeats and progress.
- The API stores evidence, events, logs, artifacts, and checkpoint metadata.
- The dashboard and query API read from Postgres.
The MVP should implement the following core records.
repositoriesstep_specsprocesses
event_ingestionsruns
step_runsprocess_runsrun_eventsartifactscheckpoints
observationsrepo_areasrepo_area_state
The public naming should stay run -> step -> process -> observation.
step_specs
- stable step key
- display name
- step kind
- execution config
- reuse policy
- checkpoint capability
- declared observed areas
processes
- step spec
- stable process key
- display name
- process metadata
runs
- source event type
- repository
- commit SHA
- pull request number, if any
- changed files snapshot
- overall status
- started/finished timestamps
step_runs
- run id
- step key
- step snapshot
- config fingerprint
- execution fingerprint
- current status
- plan reason
- started/finished timestamps
- reused-from step run id, if any
- checkpoint source step run id, if any
process_runs
- step run id
- process key
- process display name
- process kind
- status
- selection payload
- started/finished timestamps
- attempt count
observations
- step run id
- process run id, nullable for step-level evidence
- process key, nullable for step-level evidence
- execution scope fields
- status
- observed at
- summary payload
area_freshness_state
- repo area
- last observed at
- last successful observation at
- freshness bucket
- latest status
The MVP should not wait for a perfect identity system. Implement the minimum durable version:
- Accept explicit stable process IDs from cooperative adapters.
- Otherwise derive a canonical string from step kind, config key, path, logical path, title, and parameterization.
- Hash that canonical string for storage and joins.
For MVP, do not implement aggressive history-repair heuristics. Store enough metadata to add that later.
Each observation should record an execution scope separate from process identity. The initial scope should include:
- commit SHA
- step version or config hash
- runtime version
- platform or runner class
- dependency lock hash, if available
This is enough to make reuse decisions auditable.
The planner should be deterministic and rule-based.
Inputs:
- event type
- changed files
- step definitions
- observed areas per process
- existing evidence freshness
- reuse policy
Outputs:
- a top-level run plus its planned steps
- the processes inside each planned step, when the step materializes more than one process
- a decision reason for each planned step
- a reuse decision, if applicable
The initial planning rules should be simple:
- always run required baseline processes on pull requests
- run area-specific processes when changed files match observed areas
- reuse a recent compatible result when declared inputs and execution scope still match
- mark untouched areas as still stale or unknown rather than pretending they were validated
Do not attempt probabilistic scheduling in the MVP.
Verge should use a simple runtime model:
run -> step -> process -> observation
A step is a major check inside the run. A process is one standalone computation inside that step with a stable ID.
The library should provide the generic materialization mechanism. Each project should provide the actual step and process materialization rules in TypeScript.
That means:
- Verge defines materialization kinds and process lifecycle rules
- a repository defines its own process names and boundaries in TypeScript
- the planner materializes concrete steps and processes from those definitions for each run
For MVP, the supported materialization kinds should be:
singleProcessnamedProcessesdiscoveredProcesses
The preferred default should be whatever maps cleanly to the real processes a step needs to track. For some steps that may be namedProcesses. For others, especially tests, that should be discoveredProcesses.
For tests, that means a project should define processes as the real tests Verge wants to track, not repo-level groupings. For other steps, the same rule applies: define the smallest meaningful tracked process for that step, such as one lint target, one build target, or one document check.
This is important for checkpointing. Checkpoints should record which processes finished, failed, or remain pending. In practice, that means Verge checkpoints completed processes, not raw process memory.
The worker contract should be explicit and narrow.
Workers must be able to:
- claim queued work using a lease
- start the process with the resolved execution config
- send a heartbeat at a fixed interval
- emit lifecycle events such as
started,passed,failed,timed_out,interrupted - upload logs and artifact metadata
- emit zero or more process observations
- publish checkpoint metadata if the process supports it
Workers should not contain planning logic. They execute resolved work and report what happened.
Implement one narrow, auditable reuse path.
Suggested reuse policy:
- the step explicitly allows reuse
- execution scope hash matches
- declared input fingerprint matches
- prior run status is successful
- prior result age is within a configured freshness window
When reuse happens, Verge should still create a run record for the current request. It should be marked as reused and linked to the source run so the decision is visible in the UI and API.
Implement checkpointing only for a cooperative process type.
The initial checkpoint contract should include:
- checkpoint key
- run id
- completed process keys
- pending process keys
- serialized payload location in object storage
- creation timestamp
- resumable-until timestamp
The planner can prefer resume over fresh execution only when:
- the step supports checkpoints
- the checkpoint is still valid
- the input fingerprint still matches the resumable boundary rules
Do not try to snapshot arbitrary shell state.
The Fastify API should expose a small control-plane surface.
POST /webhooks/githubPOST /run-requests/manual
GET /run-requests/:idGET /runs/:idGET /runs/:id/processesGET /runs/:id/eventsPOST /workers/claimPOST /workers/:runId/heartbeatPOST /workers/:runId/eventsPOST /workers/:runId/observationsPOST /workers/:runId/artifactsPOST /workers/:runId/checkpoints
GET /repositories/:repo/healthGET /repositories/:repo/areasGET /repositories/:repo/commits/:shaGET /repositories/:repo/pull-requests/:numberGET /process-specs
These endpoint names can stay implementation-oriented for MVP even though the public model should be described as runs, steps, and processes.
GET /streams/runs/:idGET /streams/repositories/:repo/health
The dashboard should prove the model, not try to be a full observability product.
It needs four screens:
- repository commit list
- commit or pull request detail
- runs history
- run detail
- step registry
The repository commit list should show:
- commits newest first
- commit message
- short SHA
- current commit status
- commit health coverage
- attempt count
The commit or pull request detail should show:
- the converged commit state
- step summary for the commit
- reused versus executed work
- current status
- linked observations and artifacts
- attempt history for that commit
The runs history should show:
- attempt rows
- execution debugging information
- filters for recent run behavior
The run detail should show:
- top-level run summary
- step list
- links into step detail
The step detail should show:
- process status
- logs and artifacts
- process observations, if available
- checkpoint creation and resume information
- search, filtering, or pagination when there are many processes
The MVP should validate itself by running Verge on the Verge repo.
That means the first supported repository should be this repository, with steps that execute Verge's own:
- lint checks
- type checks
- tests
- build
- docs validation
Those steps should be implemented using the chosen VoidZero-oriented toolchain for this repository, with oxlint and oxfmt as the default lint/format layer and vite as the frontend build foundation.
This requirement matters because it forces the product to handle real iteration loops instead of a toy demo path.
The self-hosting bar for MVP should be:
- a commit to the Verge repo triggers Verge
- Verge plans work for the Verge repo
- Verge executes at least one real Verge step through its own worker path
- Verge records the resulting evidence and exposes it in its own UI
The Verge repo should remain the self-hosting sanity check, but it should not be the only proving ground.
Verge should also validate itself against a separate managed fixture repository such as dutifulbob/verge-testbed.
The roles should be:
- the Verge repo proves self-hosting on the actual product codebase
verge-testbedproves deterministic failure, flake, slow, and checkpoint-resume behavior under controlled conditions
New behavior should be expected to hold up against both repositories, not only one of them.
Create the initial monorepo or workspace layout:
apps/apiapps/webapps/workerpackages/contractspackages/corepackages/dbinfra/localinfra/k8sscripts
Bootstrap:
pnpm-workspace.yaml- root
package.jsonfor workspace scripts only - shared
tsconfig.base.json - TypeScript project config
- linting and formatting wired through
oxlintandoxfmt - Fastify app skeleton
- React + Vite app skeleton
- worker app skeleton
- shared contracts package
- shared core package
- Postgres migration setup
- local dev stack with Postgres
- filesystem-backed local artifact and checkpoint storage
- local self-hosting steps for the Verge repo
Exit criteria:
- all apps boot locally
- migrations run
- local artifact and checkpoint storage is reachable through the storage interface
- the repo has working
oxlint,oxfmt, andvite-based commands - the workspace dependency graph is clean, with shared logic living in
packages/instead of cross-importing between apps
Implement:
- GitHub webhook receiver
- signature validation
- repository registration
- static step storage
- basic top-level run creation
- initial Verge-on-Verge steps
Exit criteria:
- a GitHub push or pull request event creates a stored top-level run record
- step definitions can be listed from the API
- the Verge repository is registered as the first managed repository
- step definitions exist for
oxlint,oxfmtvalidation, andvite-based build validation where relevant
Implement:
- repo areas
- changed-file ingestion
- deterministic planner
- planned step records
- queueing and lease model
- planning decision records
Exit criteria:
- a top-level run generates planned steps
- planned steps are marked as
run,reuse, orskip - workers can claim queued work safely
Implement:
- worker claim endpoint
- heartbeats
- lifecycle events
- status transitions
- stdout and stderr capture
- artifact upload metadata
Exit criteria:
- a worker can execute a real process end to end
- run state is visible through the API
- lease expiry and stale heartbeat detection work
- at least one real Verge repo step runs through the worker path
Implement:
- process-level observations
- optional finer-grained observations for at least one cooperative process
- area freshness rollups
- repository health queries
- commit and pull request detail queries
Exit criteria:
- the system can answer what was observed for a commit
- the system can show stale, fresh, failed, and unknown areas
Implement:
- repository commit list page
- commit or pull request detail page
- runs history page
- run detail page
- SSE-driven live updates
Exit criteria:
- users can inspect commit health from a commit-first repository page
- users can watch a run progress live
- users can inspect current health and attempt history without looking at raw database rows
- users can inspect a real Verge-on-Verge run from the dashboard
Implement:
- input fingerprinting for one process family
- compatible-run lookup
- reused run records
- UI and API visibility for reuse decisions
Exit criteria:
- the planner can safely reuse one prior result
- the dashboard clearly shows reused versus freshly executed work
Implement:
- cooperative checkpoint contract
- checkpoint storage metadata
- resume-aware planning for one process type
- resume visibility in run detail
Exit criteria:
- one cooperative process can resume from a saved checkpoint
- resume decisions are auditable
The recommended order is:
- bootstrap
- event ingestion
- planner and queue
- worker execution
- evidence model
- health queries
- dashboard
- reuse
- checkpointing
This order matters because the product value comes from stored state and queryable evidence, not from advanced scheduling tricks.
The MVP should be validated in layers, from local developer checks to full self-hosting.
The repository itself should have working root commands for:
pnpm lintpnpm format:checkpnpm typecheckpnpm testpnpm build
These are the baseline health checks for the Verge repo and should stay green before higher-level validation is trusted.
Unit tests should cover the core logic directly, without requiring the full control plane to run.
At minimum, cover:
- process materialization rules
- planning rules
- reuse decisions
- checkpoint resume decisions
- repo area rollups
Integration tests should run against a real local Postgres instance and verify the main control-plane write paths.
At minimum, cover:
- creating a manual run
- creating step runs from step definitions
- claiming work
- recording observations
- updating repo area state
Worker integration tests should execute real local commands through the worker path.
For Verge, the first commands should include:
oxlintoxfmt --checkvitestvite build, where relevant
The most important end-to-end test is Verge running on the Verge repository itself.
That means:
- create a run for the current Verge repo state
- materialize real Verge steps such as
lint,test,build, anddocs:validate - execute them through the normal worker path
- persist observations, artifacts, and health state
- expose the results through the API and dashboard
Verge should also run against a separate managed fixture repository such as dutifulbob/verge-testbed.
That repository should be used to verify controlled scenarios that are awkward to prove on the Verge repo itself, including:
- deterministic failure
- flaky behavior
- slow processes
- checkpoint and resume flows where some processes pass before another fails
After the basic self-hosting path works, verify:
- safe reuse on repeated requests with matching inputs
- resume from a saved checkpoint for one cooperative process type
- only unfinished processes continue after resume
The acceptance bar is not just that commands ran. The acceptance bar is that Verge decided what to run, executed it, stored what it learned, and exposed that state back through its own interfaces.
The MVP is done when a user can:
- connect one GitHub repository
- use the Verge repository itself as one managed repository
- use at least one separate fixture repository as another managed repository
- receive a pull request event
- see Verge plan a small set of Verge repo processes
- watch at least one real Verge repo process execute live
- inspect logs, artifacts, and observations
- see repository areas marked fresh, stale, failed, or unknown
- see one run reused safely
- see one cooperative process resumed from a checkpoint
- verify tricky behavior such as deterministic failure and resume on the separate fixture repository
The MVP should also include a minimal but real operational baseline:
- structured logs
- OpenTelemetry traces for API and worker flows
- idempotent webhook ingestion
- retry-safe worker event writes
- lease expiry recovery
- object storage key conventions
- migration-based schema management
Without these, debugging the control plane will be harder than building it.
Mitigation:
- force every run to produce stored observations
- make repository health queries a first-class deliverable
Mitigation:
- start with explicit IDs and deterministic derived IDs
- defer history-repair heuristics
Mitigation:
- support one cooperative process only
- require explicit boundaries and payloads
Mitigation:
- keep the planner rule-based in v1
- store decision reasons so the behavior is explainable
After the MVP works, the next useful expansions are:
- richer area mapping and deeper process extraction
- flaky-signal tracking
- broader reuse support across more process types
- multi-repo support
- policy-based planning rules
- more agent-oriented query surfaces
The first version of Verge should be a narrow but complete control plane for one repository.
It should ingest change events, plan a small set of processes, execute them through workers, store evidence in a durable model, and expose repository health through a UI and API.
That is enough to validate the core thesis before investing in more advanced scheduling, broader adapters, or heavyweight orchestration.