Pi-native thin workflow harness for explicit, evidence-backed workflows.
Shout out to Ouroboros, OMX, pi-mono, and Chedex. Kapi borrows the spirit of durable workflow discipline from those projects while staying Pi-native and thin.
Kapi keeps ordinary Pi turns lightweight. It adds state, artifacts, worker awareness, and verification only after an explicit /kapi-* command or matching agent tool. The design is Chedex-inspired, but Pi-scoped: Kapi uses Pi extension surfaces instead of importing Chedex's Codex install/runtime machinery.
Use Kapi when a task needs one or more of these:
- resumable requirements, planning, execution, or validation state;
- durable artifacts such as
context.md,interview.md,run-contract-draft.md,IMPLEMENTATION_PLAN.md,handoff.json,contract.md,benchmark.sh,ledger.jsonl,merge-plan.md,integration-report.md, andverify.md; - evidence-gated completion instead of narrative claims;
- explicit worker planning for tmux terminals or isolated git worktrees;
- lightweight safety rails around Ralph validation evidence and Integrate merge boundaries.
The operating model is simple:
- ordinary work stays transparent;
- one non-terminal workflow owns a workspace at a time;
- human slash workflow commands intentionally switch workflows and hand off previous context;
- service/tool workflow starts require explicit
replaceor/kapi-clear; - selected or terminal workflow inspection does not steal active ownership;
- artifacts are durable checkpoints, not conversational scratchpads.
/kapi-deep-interview— high-rigor requirements work withcontext.md,interview.md,run-contract-draft.md,decision-report.md, andverify.md; it owns context extraction and a Markdown RunContractDraft, not implementation specs or durablecontract.jsonauthority./kapi-autoresearch— governed optimization setup and experiment ledger work withcontract.md,benchmark.sh,ledger.jsonl,ideas.md,checks.sh,decision-report.md, andverify.md./kapi-ralph— skill-driven planning/build execution that interviews when context is insufficient, keepsstate.json.ralphStateas the loop state, performs one-task iterations, and records hard verification evidence./kapi-integrate— Kapi-owned branch/worktree integration towarddevwith dependency, conflict, merge-plan, and verification reporting.
/kapi-status [workflow|slug]— inspect active or selected workflow status;/kapi-status alllists recorded workflows in the current project./kapi-status validate [workflow|slug]— check lifecycle, evidence, worker, and artifact consistency./kapi-status artifacts [list [workflow|slug]] | read <name> [workflow|slug] | write <name> [--replace] [--required] -- <content>— inspect or update workflow artifacts./kapi-status evidence [list [workflow|slug]] | add ...— inspect or record validation evidence./kapi-status complete ...— complete the active workflow with evidence./kapi-status fail --reason <reason> ...— fail the active workflow with evidence./kapi-status prepare-worker --tmux— tmux-focused worker preparation shortcut./kapi-status prepare-worker --worktree— git-worktree-focused worker preparation shortcut./kapi-status [workflow|slug]— inspect current or selected workflow status without taking active ownership./kapi-clear [--target workflow|slug] [--reason reason]— detach the active workflow attachment, or detach only when the selected workflow is active.
Run /kapi-status help for compact command syntax. Use focused support commands for artifact, evidence, validation, completion, failure, and worker operations.
The package exposes ilchul as the canonical public runtime CLI for explicit runtime control outside Pi turns. From a repo checkout, use the package-local command without global setup: npm exec -- ilchul --help. Internally, the runtime entrypoint uses runctl naming so reusable implementation modules stay product-neutral.
For a normal shell command during local development, run the setup helper once so npm installs the portable bin shim on PATH:
npm run setup:cli
ilchul --helpFor an isolated install prefix, pass --prefix; POSIX npm shims live under <prefix>/bin, while Windows shims live at <prefix>.
The bin shim runs the runtime through the repo-local TypeScript entrypoint, so the old direct workaround (node --import tsx ./src/cli/runctl-cli.ts ...) is not required for normal use.
ilchul start ralph "<goal>" [--from <repo>] [--slug <slug>] [--prompt-append-file <file>] [--dry-run] [--json]
ilchul start autoresearch "<goal>" [--from <repo>] [--slug <slug>] [--prompt-append-file <file>] [--dry-run] [--json]
ilchul status [slug] [--from <repo>] [--json]
ilchul list [--from <repo>] [--json]
ilchul attach <slug> [--from <repo>] [--json]
ilchul probe <slug> [--from <repo>] [--json]
ilchul report <slug> [--from <repo>] [--json] [--lines <n>]
ilchul events [slug] [--from <repo>] [--since <cursor>] [--stale-minutes <n>] [--json]
ilchul watch [slug] [--from <repo>] --once [--since <cursor>] [--stale-minutes <n>] [--json]
ilchul retain <slug> [--from <repo>] [--reason <reason>] [--json]
ilchul release <slug> [--from <repo>] [--json]
ilchul cleanup <slug> [--from <repo>] --safe [--json]
ilchul doctor [--from <repo>] [--json]start creates a same-slug worktree, branch, tmux session, and registry entry under .ilchul/registry/, launches Pi, waits for the tmux readiness marker, and dispatches the mode-specific planning prompt. The registry in the base repository is a control-plane pointer; execution truth and workflow artifacts live in the created worktree. Kapi rejects recorded slug, worktree, branch, and tmux collisions while the recorded owner is still non-terminal and not probed as stale-registry; entries with lifecycle completed, failed, cancelled, or inactive, and entries whose Pi launch status is stale-registry, remain inspectable until a same-slug retry replaces that registry record, and do not permanently block retries. list --from <repo> --json shows all recorded workers for that supervised repo, while each slug remains independently inspectable through status, report, probe, and attach. probe refreshes Pi readiness from tmux capture output, preserving exact KAPI_READY <slug> <nonce> success while distinguishing missing tmux sessions (stale-registry) from live panes without the marker (alive-but-unverified, running-with-output, or completed-output-present). attach prints the tmux attach command, and doctor checks registry consistency such as recorded worktree paths and prompt dispatch state without deleting retained worktrees.
Terminal worker tmux sessions are cleanup candidates by default. Use retain to mark a terminal live session as intentionally held for manual inspection, release to remove that hold, and cleanup --safe to close only terminal, unretained Kapi-owned tmux sessions. Safe cleanup never kills active registry entries and does not delete retained worktrees or branches.
Kapi's CLI runtime is the execution surface for coding work supervised by hermes, openclaw, not the final authority for accepting changes:
User request
→ hermes, openclaw classifies the work and starts bounded Kapi workers when useful
→ Kapi creates isolated worktrees, tmux sessions, registry entries, and workflow artifacts
→ workers produce candidate diffs, artifacts, and evidence
→ hermes, openclaw inspects status, tmux output, artifacts, git diff, and verification
→ hermes, openclaw accepts, rejects, integrates, or asks for another bounded worker slice
→ the human/project owner remains the final PR/merge/deploy authority
Supervisor contract:
--from <repo>selects the repository being supervised; Kapi behavior must not depend on this repository being Kapi itself.- Registry state is a control-plane pointer; source diffs and workflow artifacts in the worker worktree are the candidate output to inspect.
status,list,probe,doctor,attach, and future reporting commands are read-only supervisor inspection surfaces unless a command explicitly says otherwise.- Kapi worker output is incomplete until hermes, openclaw reviews the diff, checks evidence, runs verification, and decides how to integrate it.
- Kapi does not create final PR decisions, merge, deploy, or grant review bots implementation authority.
Issue #37 is intentionally split into reviewable child slices: executable CLI setup (#38), repo-generic planning (#39), probe/readiness reliability (#40), doctor diagnostics (#41), worker reporting (#42), and multi-worker orchestration (#43).
kapi-agent owns the deterministic Ilchul structured review adapter and GitHub formal review/check publication from the github-bot deployment. Ilchul keeps the branch protection contract and revision-comment rules documented here, but it no longer exposes a package ilchul-review or kapi-review bin for review automation. GitHub merge enforcement for formal kapi-agent approval lives in .github/workflows/kapi-agent-formal-approval-gate.yml; require require formal kapi-agent approval plus kapi-agent/review in branch protection/rulesets. Re-review requests after stale/non-approving kapi-agent reviews must put @kapi-agent review, the current head SHA, What changed, Why this closes the prior feedback, and Verification in the same author comment; see docs/kapi-agent-approval-gate.md.
Kapi exposes the same conceptual operations as tools for AI agents:
kapi_start_workflowkapi_get_statuskapi_resume_workflowkapi_get_workflow_contractkapi_list_workflowskapi_list_artifactskapi_list_evidencekapi_list_workerskapi_validate_workflowkapi_update_workflowkapi_record_evidencekapi_complete_workflowkapi_fail_workflowkapi_write_artifactkapi_write_artifacts— batch checkpoint writes for multiple active-workflow artifacts.kapi_read_artifactkapi_get_worker_capabilitieskapi_plan_worker_strategykapi_prepare_workerkapi_prepare_tmux_workerkapi_prepare_worktree_workerkapi_dispatch_worker_taskkapi_refresh_worker_statuskapi_clear_workflow
Artifact names reject / and \ path separators, tracked artifact paths must stay under the workflow artifact root, and reads distinguish missing files from existing empty artifacts.
Kapi does not own subagent orchestration. Use pi-subagents for agent delegation, chains, parallel subagents, and async/forked-context work. Kapi may record those results as artifacts or evidence, but it does not create or manage subagents.
The graph-execution preset treats TaskGraph as the concrete execute-phase runtime primitive for single-agent, sequential, DAG-parallel, and team-parallel work. Policy selection can provide advisory PolicyGraphSketch inputs, but the execute phase owns concrete task ids, titles/descriptions, dependencies, topological ordering, readiness reasons, blocked/downstream status projection, claim/lease ownership, worker dispatch state, heartbeat/staleness transitions, structured worker reports, stale-claim recovery, evidence expectations, and task-graph/readiness/claim event records.
Phase presets serialize as a versioned schemaVersion: 1 catalog. Legacy arrays migrate explicitly; unsupported versions, malformed catalogs, and gate evaluation with missing top-level required evidence refs or artifact ids fail closed.
Agent execution stays adapter-neutral. The AgentAdapterContract describes required launch/send/capture/health/readiness/report/substrate behavior for Pi, Codex, and Claude Code compatible workers without coupling the domain layer to any one CLI. Readiness requires a nonce-equivalent proof, worker reports require taskId, status, evidenceRefs, and summary, and health can be supported or best-effort but not absent.
The runtime state schema is separately versioned as RuntimeState.schemaVersion: 1. It defines additive boundaries for RunObjective, PolicySelection, TaskGraph refs, WorkerState, EvidenceRef, EvaluationResult, RewardRecord, and IntegrationCandidate data; unknown newer versions fail closed, and RunContract-facing artifact refs expose only objective, policy-selection, and evaluation artifacts.
The domain boundary is explicit: Decomposer creates the concrete graph, Scheduler computes ready tasks, WorkerRuntime dispatches and heartbeats work, WorkerExecutionState records dispatch/report progress for claimed tasks, Verifier validates evidence refs, and GateEngine decides transitions.
Runtime events are append-only schemaVersion: 1 envelopes with monotonic seq, stable eventId, idempotencyKey, type, category, timestamp, run id, and a typed payload. The event taxonomy covers run lifecycle, objective/policy decisions, graph readiness, claim/lease transitions, worker readiness/heartbeat/retention/safe-close, evidence, evaluation, repair, integration, and reward records. Replay applies events by ascending sequence into derived state; duplicate event ids must be byte-equivalent, stale leases and missing workers recover only through explicit events, and run.sealed is terminal.
- Direct Pi turns: no Kapi state unless explicitly requested.
/kapi-deep-interview: durable requirements trail under.ilchul/workflows/deep-interview/<slug>/; typicallycontext.md,interview.md,run-contract-draft.md,decision-report.md, andverify.md; downstream Ralph or Autoresearch owns implementation specs/plans.run-contract-draft.mdis a Markdown draft handoff with GoalSpec, ConstraintSpec, EvidenceStandard, DoneCriteria, and recommendedPreset sections; it is not a durablecontract.jsonsource of truth and should be referenced rather than duplicated by downstream modes. Completion is proposal-gated by the Deep Interview readiness judge, which can run inline or throughKAPI_DEEP_INTERVIEW_JUDGE=child-rpcto isolate snapshot review in a child process./kapi-autoresearch: durable optimization contract under.ilchul/workflows/autoresearch/<slug>/; preparescontract.md,benchmark.sh,ledger.jsonl,ideas.md,checks.sh,decision-report.md, andverify.mdfor bounded experiment loops./kapi-ralph: durable planning/build state under.ilchul/workflows/ralph/<slug>/; usesAGENTS.md,IMPLEMENTATION_PLAN.md,handoff.json,decision-report.md, andverify.md./kapi-integrate: durable integration state under.ilchul/workflows/integrate/<slug>/; usesmerge-plan.md,conflict-matrix.md,integration-report.md,decision-report.md, andverify.md.
/kapi-autoresearch: governed optimization loop with stable comparison boundaries and a durable ledger./kapi-ralph: uses thekapi-ralphskill to assess context, interview when needed, then plan/build exactly one highest-priority unfinished task at a time with verifier closeout./kapi-integrate: governed integration for Kapi-owned branches/worktrees, with dependency, conflict, and verification reporting before any merge towarddev.
Artifacts are durable checkpoints, not conversational scratchpads. Workflow contracts expose an artifact cadence so agents write context/spec/plan/progress/verify files at meaningful checkpoints, phase boundaries, blockers, verification results, or completion gates instead of after every turn.
Typical cadence:
- requirements workflows write after question batches or decision-critical answers;
- planning workflows write after the plan stabilizes or is approved;
- governed execution writes at phase boundaries, milestones, blockers, and verification points;
- autoresearch loops write one bounded experiment result per iteration;
- Ralph writes RED/GREEN or verifier evidence at task, blocker, validation, and closeout boundaries.
Use kapi_write_artifacts when one checkpoint updates multiple files.
All workflows share the lifecycle vocabulary from GOAL.md:
inactive -> active -> blocked|verifying|completed|failed|cancelled
Workflow-specific phases live under the shared lifecycle. Artifacts are stored under .ilchul/workflows/<short-workflow-name>/<slug>/, with the kapi- prefix removed from folder names. For example, /kapi-ralph uses .ilchul/workflows/ralph/<slug>/, while /kapi-autoresearch uses .ilchul/workflows/autoresearch/<slug>/. .ilchul/active.json points at the current non-terminal workflow.
State behavior:
- explicit slugs are sanitized before artifact paths are built;
- active-workflow conflicts and mutation failures report as Kapi UI/tool feedback;
- stale active pointers to terminal workflows, missing state files, corrupt JSON, or out-of-workspace state paths are cleared transparently;
- corrupt recorded workflow state is skipped during history listing;
- terminal workflow inspection does not reactivate the workflow or steal unrelated active ownership.
Context behavior:
- workflow start injects a project context snapshot into the prompt;
- Kapi persists that snapshot to
verify.mdonly for workflows that trackverify.md; - brownfield discovery includes guidance headings, package scripts, verification scripts, dependencies, tsconfig, git state, source/test counts, architecture signals, sample source files, and common source/test directories;
- generated/reference directories such as
references/,.omx/,.ilchul/, legacy.kapi/,dist/,build/, andcoverage/are excluded from active project guidance; - greenfield work receives a checklist for purpose, users/stakeholders, constraints, success criteria, architecture direction, acceptance criteria, initial execution plan, and non-goals.
Governed artifacts:
verify.mdis the resumable progress and closeout verification record for governed workflows that track it;handoff.jsonis the plan-to-execution ratchet for governed work;verify.mdis the durable evidence log;- governed
handoff.jsonfollows Chedex-style admission fields includingexecution_workflow,approved_at,acceptance_criteria,verification_targets,delegation_roster,source_artifacts, and structuredapprovals; - governed
verify.mdrecords closeout verification state plus a verifier review record when completion is satisfied; /kapi-ralphuses the shared Kapistate.jsonwith nestedralphStateas its loop state source of truth instead of a separate Ralph-specific progress file.
Validation checks lifecycle schema consistency, required artifacts, state artifact freshness, evidence shape, worker consistency, and workflow-specific closeout rules. Completed governed workflows require verifier pass evidence plus required architect/verifier handoff approvals when handoff artifacts are tracked. Completed /kapi-ralph workflows require a pi-subagents reviewer subagent verdict, passing validation command evidence, and structured closeout fields such as changed files, acceptance criteria, artifact references, and command references. RED/GREEN phase evidence may be recorded, but current validation no longer treats RED/GREEN evidence as a hard completion gate.
RunContract Harness is a generic, pre-workflow projection over existing Kapi run state. It does not create a second source of truth and it does not persist a durable contract.json; it reads the current WorkflowState plus WorkflowDefinition and exposes a compact run/contract/evidence/artifact/completion/quality view for supervisors.
Layer split:
- Kapi Core / RunContract Harness owns generic run state projection, evidence records, artifact references, contract preset shape, completion criteria, advisory quality hints, and generic steering hints.
- Runtime and presentation adapters decide how to show that generic status through CLI, tools, widgets, reports, or other read-only supervisor surfaces.
- External workflow adapters may interpret generic status for a specific operating environment, but those meanings stay outside core.
Core does not own repository review assumptions, GitHub issue or PR semantics, Discord lane semantics, kapi-agent policy, merge/tracker cleanup, or Kade/Ragna authority rules. Those are adapter interpretations layered on top of the generic run contract when needed.
The current GitHub workflow adapter is read-only and additive. It maps a projected RunContract plus existing worker registry issue/PR inspection into supervisor hints for linked issue context, PR state, kapi-agent review freshness, dev merge readiness, and post-merge tracker reconciliation. It must not mutate GitHub issues, merge PRs, close trackers, or write external workflow state from core; those actions remain explicit supervisor/runtime operations outside the generic RunContract model.
Candidate vocabulary is deliberately small and additive: ContractPreset, EvidenceExpectation, CompletionCriteria, and ScoringHint. RunContract must not start as a PolicyModule or plugin runtime. Ilchul remains product/documentation branding only; reusable code, API, tool, and serialized identifiers should use semantic names such as run, contract, preset, harness, evidence, score, and steer.
Naming and storage decisions are governed by Ilchul naming and compatibility policy. In short: do not perform broad kapi -> ilchul replacements; keep serialized workflow identifiers and integration names stable unless a scoped migration issue explicitly changes them; active workflow/storage routing uses .ilchul / ~/.ilchul; legacy .kapi folders are preserved as historical local state and are not active fallback roots.
Implementation rhythm for the RunContract track is behavior-preserving: document the boundary first, add the generic projection second, add evidence/completion primitives third, add advisory quality hints fourth, render compact supervisor status fifth, and only then map optional external workflow adapter semantics. Each slice should keep existing workflow APIs, WorkflowState, WorkflowDefinition, artifacts, validation gates, and CLI output backward-compatible except for intentional additive fields or sections.
RunContract scoring, preset, and governance changes should use the docs/runcontract-harness-evaluator.md checklist to separate real harness quality from visible metric optimization. The checklist is advisory and does not add completion authority, runtime gates, kapi-agent policy, or score hard-blocking behavior.
Ralph, Integrate, and Autoresearch preset contracts are defined in docs/runcontract-presets.md. That document records each preset's goal shape, required inputs, artifacts, evidence standard, done criteria, repair/rollback criteria, and adapter-neutral boundaries without adding a durable contract.json, scheduler, merge bot, or hidden score authority.
Harness module evolution is tracked by the metadata-only docs/harness-module-registry.md registry. It records module purpose, owner surface, quality signals, retirement signals, replacement notes, and regression evidence for surfaces such as intent parsing, state tracking, evidence extraction, quality scoring, replanning, verifier assistance, and report formatting. The registry is observational; it must not load modules, retire code automatically, or affect runtime behavior without a separate approved implementation issue.
Runtime storage, adapter configuration, and worker retention are described in docs/ilchul-runtime-config.md. That document defines the .ilchul/ layout, adapter defaults, lease/readiness/worker cap settings, retention states, and safe-cleanup boundary while preserving legacy .kapi folders as non-routing historical state.
Future learning-runtime boundaries are designed in docs/learning-runtime-boundaries.md. That document connects existing WorkflowState and RunContract projection responsibilities to a future RunState execution envelope while separating completion authority, runtime readiness authority, and advisory evaluation/learning signals.
The domain PolicySelector in src/domain/policy-selector.ts is an advisory pre-dispatch primitive. It generates a fixed initial policy set across conservative, balanced, aggressive, high-assurance, and learning-exploration strategies; simulates objective-weighted candidate outcomes from task complexity, expected module touch count, dependency depth, adapter mix, isolation mode, verification depth, historical success, and recent reward calibration; records estimator outputs for conflict risk, regression risk, repair likelihood, elapsed/tool cost, review burden, learning value, confidence, and utility; and emits prediction ids that reward-ledger entries can later calibrate. Human overrides are explicit (selector: "human" plus reason) and remain bounded by exploration/conflict/regression safety caps. The selector does not launch agents, mutate workflow state, or hard-block execution from simulated score alone.
The objective/reward domain in src/domain/objective.ts converts evidence-backed EvaluationResult records into append-only reward-ledger.v1 events. Reward records include outcome status, prediction-vs-actual delta, penalty taxonomy, anti-Goodhart checks tied to docs/runcontract-harness-evaluator.md, advisory PolicyHint values, and calibration metadata. Reward data may inform future PolicySelection records, but it must not silently mutate objective weights, selected policy, worker count, adapter choices, or completion authority; human-approved objective calibration must be recorded explicitly.
Kapi is evaluated as a thin harness, not just a feature surface. When no workflow is active, Kapi should stay transparent: no hidden workflow activation, no workflow artifacts, no workers, no tool blocking, and no heavy UI ownership.
Thinness checks used during review:
- default transparency for ordinary Pi turns;
- explicit workflow activation only;
- artifact-on-demand rather than universal logging;
- proportional enforcement by workflow depth;
- ownership restraint over only Kapi workflow state and Kapi-created workers;
- Pi-native minimalism instead of external orchestration weight.
Kapi uses Pi extension surfaces as thin safety rails rather than a separate orchestration runtime:
resources_discoverexposes Kapi's localskills/andprompts/resources.before_agent_startinjects hidden active-workflow context only while a workflow is active and preserves loaded Pi skill/prompt intent fromsystemPromptOptions.tool_callcurrently keeps Kapi thin: it observes active workflow state without blocking tools. Mutation control is enforced through workflow prompts, artifact discipline, and completion validation rather than live tool-call blocking.agent_endsends a follow-up message when active-workflow validation has blocking issues.session_before_switch,session_before_fork,session_before_tree,session_before_compact, andsession_shutdownkeep workflow ownership visible across Pi session lifecycle operations without taking over ordinary turns; governed workflows get soft warnings on switch and shutdown.- Session labels mark workflow start/resume checkpoints for
/tree, and@kapi:autocomplete suggests workflow references.
src/domain: pure workflow definitions, lifecycle transitions, artifact naming, validation, and worker strategy. No Pi, filesystem, tmux, git, or process dependencies.src/application: Kapi use cases, state artifact builders, and ports. Deep Interview readiness logic stays protocol-independent here so inline and child-RPC adapters share one authority core.src/adapters: filesystem store, project context discovery, local tmux/git worktree worker substrate, worker closeout, and optional child-process adapters such as the Deep Interview readiness RPC reviewer.src/presentation: Pi command/tool registration, parsers, messages, status UI, hooks, and hidden active-workflow context injection.
README.md— human-facing overview and operating model.GOAL.md— completeness objective and P0-P5 gates.docs/chedex-completeness.md— Chedex comparison boundary and intentional Pi-native differences.docs/runcontract-harness-evaluator.md— evaluator and anti-Goodhart checklist for RunContract scoring, presets, and harness-governance changes.docs/runcontract-presets.md— adapter-neutral Ralph, Integrate, and Autoresearch preset contracts for goals, inputs, artifacts, evidence, done criteria, and repair/rollback boundaries.docs/harness-module-registry.md— metadata-only harness module registry for module purpose, quality signals, retirement signals, and regression evidence.docs/ilchul-naming-policy.md— product naming, compatibility, and active.ilchulstorage policy.docs/ilchul-runtime-config.md— design contract for.ilchul/runtime layout, adapter config defaults, worker retention states, and safe cleanup boundaries.docs/learning-runtime-boundaries.md— design contract for futureRunState, objective, policy selection, task graph, worker runtime, evidence/evaluation, integration/repair, and reward-ledger boundaries.docs/learning-runtime-verification-matrix.md— verification matrix for schema/events/DAG/claims/workers/policy/reward/integration/retention/storage readiness before learning-runtime default claims.docs/ralph-live-qa.md— operator live QA checklist for proving/kapi-ralphstart, planning, approval, build, evidence, closeout, and resume behavior in a real Pi/Kapi runtime.skills/kapi-workflow/SKILL.md— active-workflow behavior reminders for agents.prompts/— Kapi prompt resources exposed to Pi.src/domain— pure workflow definitions, lifecycle transitions, artifact naming, validation, and worker strategy. No Pi, filesystem, tmux, git, or process dependencies.src/application— Kapi use cases, state artifact builders, and ports.src/adapters— filesystem store, project context discovery, local tmux/git worktree worker substrate, and worker closeout.src/presentation— Pi command/tool registration, parsers, messages, status UI, hooks, and hidden active-workflow context injection.test/— node:test coverage for workflow contracts, state, hooks, presentation, workers, quality checks, and validation.scripts/— quality reporting and verification helpers.references/chedex/— local Chedex reference snapshot; not active Kapi runtime guidance.
- Keep ordinary Pi behavior transparent by default.
- Add or refine workflow contracts in
src/domain/workflows.ts. - Keep lifecycle and validation rules in
src/domainpure. - Add use-case behavior through
src/applicationports before touching Pi presentation code. - Expose human commands in
src/presentation/commands.tsand agent tools insrc/presentation/tools.ts. - Keep prompt/skill guidance aligned with
skills/kapi-workflow/SKILL.md. - Update tests and run verification before review.
Presentation code should stay split by Pi-facing responsibility:
src/presentation/commands.tsfor human slash command registration;src/presentation/tools.tsfor agent tool registration;src/presentation/hooks.tsfor Pi lifecycle/tool hooks;src/presentation/parsers.tsfor human command parsing;src/presentation/schemas.tsfor tool parameter schemas;src/presentation/messages.tsfor formatting;src/presentation/pi-extension.tsas the composition root.
Refactors should preserve command/tool behavior, artifact formats, state compatibility, and thin-default semantics. They should reduce coupling and duplication without introducing a heavy runtime manager or command framework.
Kapi tracks Chedex-inspired completeness without copying Chedex's Codex-specific install/runtime machinery. See docs/chedex-completeness.md for the comparison boundary, intentional differences, and P0-P5 completeness gates used by autoresearch.
Chedex-like concepts that Kapi keeps:
- explicit workflows;
- durable context/spec/plan/progress/handoff/verify artifacts where useful;
- governed closeout with verifier evidence;
- architect/verifier approval provenance for governed handoffs;
- stop/switch visibility for broad governed work.
Chedex concepts that Kapi intentionally does not copy:
- Codex global install/mirror machinery;
- external tmux team runtime ownership;
- global HUD/mailbox/linked-mode overlays;
- Kapi-managed subagent orchestration.
Guarded workflows enforce correctness-critical boundaries primarily through durable state, prompts, artifacts, and completion validation:
/kapi-ralphplanning guidance tells agents to plan, review, and approve before building./kapi-ralphcan record RED/GREEN test evidence, but the current Pi hook layer does not block production mutation while waiting for RED evidence.- after critic, architect, and human approval,
/kapi-ralphmay perform the single scoped build iteration selected from the approved plan; live QA can use a tracked docs fixture for that bounded mutation. - governed closeout remains evidence-gated instead of treating narrative claims as proof.
npm run verifynpm run verify runs the standard local gate: npm test, npm run check, npm run check:unused, and npm run quality:budgets.
Completeness autoresearch uses the stricter reproducible scorer:
bash autoresearch.shAutoresearch verification also tracks maintainability quality metrics as secondary monitors:
- Code coverage (
code_coverage_pct) — coverage percentage when a coverage summary is available. - Cyclomatic complexity (
max_cyclomatic_complexity) — proxy for highest function-level control-flow complexity. - Duplicated code (
duplicated_code_pct) — proxy for repeated three-line source blocks across the codebase, so common single-line TypeScript syntax does not dominate the budget. - Code smells (
code_smells) — proxy count for long files, long functions, and oversized parameter lists. - Dependency and coupling (
coupling_max_imports,module_dependency_edges,facade_dependency_files) — proxies for the most import or re-export fanout in a single source file, total module edges, and multi-edge facade count, so facade/barrel modules cannot hide true dependency breadth. The architecture score gives small headroom credit only when total edges and facade-file counts stay under the documented Kapi thresholds, with additional edge-count credit below stricter 170-edge, 160-edge, 158-edge, 157-edge, 155-edge, 149-edge, 148-edge, 147-edge, 145-edge, 144-edge, 142-edge, 140-edge, 139-edge, 138-edge, 135-edge, 133-edge, 129-edge, and 128-edge, 127-edge, 126-edge, 125-edge, 124-edge, 123-edge, 122-edge, 121-edge, 120-edge, 119-edge, 118-edge, 117-edge, 116-edge, 115-edge, 114-edge, 113-edge, 112-edge, 111-edge, 110-edge, 109-edge, 108-edge, 107-edge, 106-edge, 105-edge, 104-edge, 103-edge, 102-edge, 101-edge, and 100-edge tiers, plus stricter facade-file credit at 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, and 26 files. - Budget warning count (
budget_warn_count), budget pass count (budget_pass_count), and budget not-configured count (budget_not_configured_count) — summary counts for optional maintainability budget status. - Semantic Autoresearch consistency (
semantic_consistency_score,bridge_term_misuse_count,root_autoresearch_dependency_count,autoresearch_artifact_mismatch_count,source_of_truth_conflict_count) — diagnostic checks that Kapi Autoresearch is described and implemented as an embedded durable engine, not an ambiguous bridge to root-levelautoresearch.*files such asautoresearch.md,autoresearch.sh,autoresearch.checks.sh,autoresearch.jsonl,autoresearch.ideas.md, orautoresearch.config.json. - pi-autoresearch reference coverage (
pi_autoresearch_reference_score,expected_pi_autoresearch_role_coverage,pi_autoresearch_metric_parsing_role,pi_autoresearch_resume_reconstruction_role) — diagnostic checks that Kapi maps the reference loop roles into durable-mode artifacts and behavior: contract, benchmark, checks, ledger, ideas, keep/discard/crash/checks_failed, resume/reconstruction, and metric parsing. - Runtime Autoresearch start diagnostics (
runtime_autoresearch_probe_executed,runtime_autoresearch_start_pass,runtime_autoresearch_start_contract_pass) — executable probes frombash autoresearch.shthat start/kapi-autoresearchin a temporary workspace and verify the Kapi-owned durable artifact contract is actually created on disk. - Cross-mode runtime readiness (
runtime_deep_interview_start_contract_pass,runtime_ralph_start_contract_pass,runtime_integrate_start_contract_pass,mode_runtime_probe_coverage) — temp-workspace probes that prove non-Autoresearch durable modes start cleanly and create their declared artifacts. - Event/snapshot semantics (
event_log_jsonl_parse_pass,snapshot_json_parse_pass,state_json_parse_pass) — runtime probes that parse createdevents.jsonl,snapshot.json, andstate.jsoninstead of accepting filename presence. - Human command-surface diagnostics (
command_surface_probe_executed,exact_command_surface_pass,extra_human_command_count,missing_mode_subcommand_count,mode_subcommand_behavior_pass) — source-of-truth command inventory plus static subcommand checks for the durable mode command surface and requiredstatus|resume|approvemode subcommands. - Readiness/blocker diagnostics (
kapi_readiness_score,ship_blocker_count,runtime_blocker_count,semantic_blocker_count) — separate ship-readiness indicators that aggregate runtime, event/snapshot, command-surface, semantic ownership, artifact, and source-of-truth blockers without hiding them behind the maintainability-inflated architecture score.
Use node scripts/code-quality-report.mjs --help to inspect quality-report options. Convenience scripts are available as npm run quality:json, npm run quality:markdown, npm run quality:budgets, and npm run quality:strict. Use node scripts/code-quality-report.mjs --json when CI or another tool needs the same metrics in machine-readable form. Use node scripts/code-quality-report.mjs --markdown for a human-readable table in issue comments or release notes. Add --budgets to include pass/warn budget status against Kapi's lightweight maintainability targets; add --fail-on-warn or run npm run quality:strict when a CI job should fail on budget warnings. Budget thresholds can be tuned with KAPI_QUALITY_COVERAGE_MIN, KAPI_QUALITY_COMPLEXITY_MAX, KAPI_QUALITY_DUPLICATION_MAX, KAPI_QUALITY_SMELLS_MAX, KAPI_QUALITY_COUPLING_MAX, KAPI_QUALITY_MODULE_EDGES_MAX, and KAPI_QUALITY_FACADE_FILES_MAX.
kapi_write_artifactsprevalidates artifact names and paths before writing, but it is not a full filesystem transaction if a mid-write I/O failure occurs.- Kapi quality metrics are lightweight proxy checks, not full coverage or complexity analysis.
docs/chedex-completeness.mddefines the Chedex comparison boundary, but not every Chedex feature is a Kapi goal.- Removed subagent-specific Kapi surfaces should be called out in release notes for users who previously invoked
/kapi-subagentorkapi_plan_subagent_strategy.