Skip to content

Roadmap: objective-driven learning parallel runtime harness #167

@devkade

Description

@devkade

Roadmap: objective-driven learning parallel runtime harness

Last updated: 2026-05-18 · Owner: Kade/Ragna/coding · Parent context: #114 · Related: #120, #121, #148

Purpose

This roadmap is the index and dependency map for Ilchul's next runtime direction: a portable, objective-driven, learning E2E execution harness.

Ilchul should decompose work, select an execution policy, run work through a safe graph-based runtime, evaluate outcomes against explicit objectives, integrate only gated results, record reward/prediction deltas, and improve future policy selection through advisory calibration.

#167 owns structure, boundaries, dependency order, MVP scope, and child-issue routing. Detailed schemas and phase semantics live in sub-issues.

North Star

Run
  -> standard-intake
  -> objective-approval
  -> policy-selection
  -> graph-execution
  -> objective-evaluation
  -> gated-integration
  -> record-and-calibrate
  -> evidence-sealed-close
RunContract
  -> RunObjective
  -> PolicySelection
  -> TaskGraph
  -> Worker Execution
  -> Evidence Collection
  -> Evaluation
  -> Integration
  -> Reward Ledger
  -> Policy Calibration Hints
  -> Sealed Close

Target identity:

Ilchul is a portable, objective-driven, learning E2E execution harness. It treats agent work as policy-driven experiments, runs selected strategies through a safe TaskGraph runtime, evaluates results against explicit objectives, and records prediction-vs-actual outcomes so future execution policies improve.

Core Decisions

  1. Unified Run lifecycle. Deep Interview, Ralph, Autoresearch, and Integrate are compatibility bundles over one Run lifecycle, not separate top-level runtime models.
  2. Every Run has a RunObjective. Intake may start with draft RunObjective, but Execute cannot start until RunObjective is approved.
  3. Objective taxonomy is split. RunObjective is this run's success/failure/repair criteria. LearningObjective is long-term policy improvement. PolicyUtility is pre-dispatch strategy utility.
  4. Full-lifecycle Strong PhasePreset. Intake through Close all have phase presets. Presets define protocol; RunObjective may strengthen but not weaken them.
  5. Learn follows Integrate. Reward/calibration must observe integration cost, conflicts, cleanup/retention, and final evidence before learning.
  6. TaskGraph is a runtime primitive. Single-agent, sequential, DAG-parallel, and team-parallel runs all execute through TaskGraph. Team/parallel is policy, not a different runtime model.
  7. PolicySelection chooses strategy. It may create graph sketches for simulation, but concrete TaskGraph creation belongs to graph-execution.
  8. Graph-execution owns concrete TaskGraph creation. It uses approved RunObjective plus selected PolicySelection, then applies readiness, claim, lease, worker, and evidence gates.
  9. GateEngine and Verifier are separate. Verifier validates evidence. GateEngine decides whether transitions are allowed.
  10. Three-layer gates. Transitions require HardInvariantGate, PhasePresetGate, and RunObjectiveGate to pass.
  11. Evidence required completion. Agent claims are not authority. Task, phase, and run completion require evidence.
  12. Fail closed. Gate failure denies without mutation unless a later explicit recovery transition records the repair path.
  13. Policy hints are advisory only. Reward/calibration may produce hints, but actual strategy changes must be recorded by PolicySelection events.
  14. Run closes by sealing. A run is not complete until evidence, artifacts, RewardRecord, cleanup/retention state, and replay/audit events are sealed.
  15. MVP boundary is runtime Phase 3. MVP reaches worker execution, heartbeat, and evidence-gated completion (Sub-roadmap: DAG runtime phase 3 — worker execution, heartbeat, and evidence-gated completion #196). Integration/repair hardening is post-MVP (Sub-roadmap: DAG runtime phase 4 — IntegrationCandidate and repair loop #195/Design: define integration, repair, and supersession semantics #190).
  16. RL-shaped roadmap, MVP-shaped implementation. The conceptual structure should point toward reinforcement-learning-style policy improvement, but MVP implementation should remain a minimal calibration-ready graph runtime.
  17. Thin uniform phase engines. The eight lifecycle phases may have one engine each, but every PhaseEngine must follow the same thin contract and produce phase outputs only.
  18. Authority stays in shared runtime. Phase engines do not own RunState mutation, transition authority, or evidence authority. RunOrchestrator, GateEngine, Verifier, EventStore, and RunStateStore own those responsibilities.
  19. Side effects are runtime-owned. Phase engines return SideEffectRequest values; SideEffectRunner executes allowlisted actions only after durable transition intent is committed.
  20. Index-first Runtime State. RunState, PhaseStatus, TaskGraph, RuntimeTask, WorkerState, ClaimLease, and SideEffectRecord are operational indexes, not payload stores.
  21. Do not create an engine for every noun. Separate runtime components only when behavior, state, or failure modes truly differ. Phase engines are acceptable when they share one contract and stay thin.

Canonical Runtime Terms

Use these exact terms in child issues, docs, schemas, tests, and implementation:

Canonical term Meaning
RunObjective Approved run success/failure/repair criteria.
TaskGraph Execute-phase graph primitive for single-agent, sequential, DAG-parallel, and team-parallel runs.
RuntimeTask.dependsOn RuntimeTask dependency field.
WorkerState Worker operational state object.
ClaimLease Task ownership and lease record.
EvidenceRef Evidence reference required for task, phase, and run completion.
EvaluationResult Objective evaluation output.
IntegrationCandidate Ref-backed candidate for gated integration.
RewardRecord Individual reward/prediction/calibration record.
PolicyHint Advisory policy signal. Policy changes must still be recorded by PolicySelection.
SideEffectRequest Requested external action returned by a PhaseEngine.
SideEffectRecord Runtime-owned operational state for side-effect execution and result.

Canonical worker status strings:

ready
busy
unhealthy
completed-retained
safe-to-close
stale-registry
cleanup-released
closed

Ownership Map

Area Owner Issue #167 Keeps
Roadmap structure, track order, MVP scope #167 This issue
RunContract-centered architecture boundaries #170 Boundary reference and dependency
RunState schemas, versions, migration #185 Blocker status
Runtime events, replay, recovery #186 Blocker status
Full-lifecycle PhasePreset and runtime spec schemas #198 Structural requirement only
Full-lifecycle preset semantics and graph-execution components #199 Catalog and boundary decision only
.ilchul storage, config, worker retention #169, #181 Storage direction and safety gate
Portable agent/substrate adapters #188 Adapter dependency
RunObjective/evaluation/reward loop #172 Objective taxonomy and lifecycle dependency
Reward calculation, penalties, PolicyHint, calibration #189 Learning boundary and anti-Goodhart constraint
Policy simulation selector #171 Strategy-selection dependency
Simulator features, estimators, exploration safety #187 Calibration dependency
TaskGraph/readiness implementation #194 Runtime Phase 1
Claim/lease/stale ownership #197 Runtime Phase 2
Worker execution/heartbeat/evidence gate #196 Runtime Phase 3 / MVP boundary
Integration candidates and repair loop #195, #190 Post-MVP hardening
Verification matrix and rollout readiness #191 Default-change gate

Track Status

Track Status Gate / Next
A — Architecture and state contract Design split opened: #170, #185, #186, #198 Define RunState schemas, event replay, recovery, and runtime spec schemas.
B — Storage, config, adapters Design / implementation split: #169, #181, #188 .ilchul is forward storage; avoid unsafe .kapi mutation; define portable adapters.
C — Objective, evaluation, reward Design split opened: #172, #189 Finalize RunObjective lifecycle, evaluation outputs, reward formula, calibration, anti-Goodhart rules.
D — Policy simulation Design split opened: #171, #187 Define candidate policies, estimator features, prediction ids, override trail, exploration safety.
E — Graph/DAG runtime Runtime-first sub-roadmap: #199, #194, #197, #196, #195, #168, #190 #199 semantics → Phase 1 readiness → Phase 2 claim/lease → Phase 3 worker/evidence MVP → Phase 4 integration/repair.
F — Verification/readiness Design split opened: #191 Define evidence matrix before implementation PRs claim runtime readiness or defaults change.

Issue Tiers

The number of child issues should not imply that all of them must be implemented at full strength for MVP. Tiers define implementation pressure.

MVP-critical open

These must be resolved enough to build the compact runtime MVP:

Design records / already closed

These are decision records or seed designs. They guide implementation but should not be read as additional full-strength MVP work:

Shallow-first

These should start as record-only, rule-based, or advisory behavior. They should not become heavy engines before #196 produces real runtime evidence:

Post-MVP hardening

These matter, but should not block the runtime MVP:

Tiering rule:

Closed design issues define the language.
MVP-critical open issues define the first runtime.
Shallow-first issues record signals without autonomous behavior.
Post-MVP issues harden convergence after #196 exists.

Dependency Order

Recommended order:

  1. Track A: Architecture: define RunContract-centered learning runtime boundaries #170 -> Design: define RunState schemas, versions, and migration policy #185 -> Design: define runtime event taxonomy, replay, and recovery semantics #186 -> Design: define full-lifecycle PhasePreset and runtime spec schemas #198
  2. Track B: Config/design: define .ilchul storage, adapter config, and worker retention policy #169 and Design: define portable agent adapter and execution substrate contracts #188 in parallel; refactor: route workflow storage to .ilchul only #181 only within storage migration rules
  3. Track E runtime MVP: Design: define full-lifecycle preset catalog and graph-execution runtime components #199 -> Sub-roadmap: DAG runtime phase 1 — task graph and readiness #194 -> Sub-roadmap: DAG runtime phase 2 — claim, lease, and stale ownership #197 -> Sub-roadmap: DAG runtime phase 3 — worker execution, heartbeat, and evidence-gated completion #196
  4. Track C shallow-first: Feature: add objective evaluation and reward-ledger learning loop #172 seeds the language; Design: define reward calculation, penalties, PolicyHint, and calibration #189 starts record-only and becomes richer once Sub-roadmap: DAG runtime phase 3 — worker execution, heartbeat, and evidence-gated completion #196 produces real evidence
  5. Track D shallow-first: Feature: add policy simulation selector for execution strategy choice #171 seeds policy selection; Design: define policy simulator features, estimators, and exploration safety #187 starts rule-based/deterministic and is calibrated after Sub-roadmap: DAG runtime phase 3 — worker execution, heartbeat, and evidence-gated completion #196
  6. Track E post-MVP: Sub-roadmap: DAG runtime phase 4 — IntegrationCandidate and repair loop #195 and Design: define integration, repair, and supersession semantics #190
  7. Track F: Design: define verification matrix for learning parallel runtime #191 finalizes verification criteria across tracks

Development Order

This is the implementation order for PR planning. It is more binding than the conceptual track list above when deciding what to build next.

Step 0 — Keep closed design issues as language, not new MVP scope

Do not reopen broad implementation from closed design records unless a later open issue requires it.

Step 1 — Build the thin runtime spine (#185 + #186)

Before adding more behavior, define the smallest shared runtime authority surface:

  • RunState snapshot as compact operational index;
  • runtime schema/version rules and unknown-newer-version behavior;
  • RuntimeEvent taxonomy for run, phase, task, worker, evidence, gate, and side-effect transitions;
  • commitTransition(patch, event) semantics with version check and event append;
  • snapshot + append-only event relationship, without requiring full event sourcing in MVP.

Exit gate: runtime state and event examples exist for one successful run and one repair/stale-worker path.

Step 2 — Lock adapter/substrate boundary (#188)

Define the minimum portable worker contract before worker execution:

  • AgentAdapter for launch/send/capture/interrupt/report parsing;
  • ExecutionSubstrate for tmux, process, native-subagent, and future substrates;
  • worker capability matching;
  • readiness nonce / health-check expectations;
  • structured worker report contract;
  • Codex, Pi, and Claude Code compatibility notes.

Exit gate: fake adapter/substrate can satisfy the contract without depending on tmux or a real agent.

Step 3 — Finish DAG phase 1: task graph and readiness (#194)

Implement or tighten:

  • TaskGraph and RuntimeTask schemas with graph id/version and inspectable task metadata;
  • validation for duplicate ids, missing dependsOn targets, cycles, and invalid status;
  • readiness calculation and readiness reasons;
  • failed-dependency downstream blocking;
  • graph-created and readiness-changed events.

Exit gate: 5-task fixture with two parallel branches passes validation and readiness tests.

Step 4 — Finish DAG phase 2: claim, lease, and stale ownership (#197)

Implement or tighten:

  • ClaimLease as an operational index or explicitly justified task-local representation;
  • claim token generation and active-claim rejection;
  • lease renewal, expiry, release, and explicit recovery;
  • completion requiring a valid unexpired claim token;
  • claim/lease events.

Exit gate: duplicate claim race, expired lease, and stale recovery tests pass.

Step 5 — Finish DAG phase 3: worker execution MVP (#196)

This is the MVP completion boundary.

Implement:

  • worker registry/state persistence;
  • dispatch of claimed ready tasks through the adapter/substrate boundary;
  • heartbeat tracking and stale/unhealthy projection;
  • structured worker report capture;
  • EvidenceRef extraction from reports, logs, test output, diffs, or artifacts;
  • evidence-required task completion;
  • operator-visible task/worker status.

Exit gate: smoke run executes at least two independent tasks in parallel through fake workers or a documented local substitute and completes only with evidence refs.

Step 6 — Define verification matrix before runtime readiness claims (#191)

After Steps 1-5 are implemented enough to be concrete, define the matrix that must pass before changing defaults or claiming runtime readiness:

  • schema validation;
  • event replay/snapshot reconstruction;
  • DAG readiness;
  • claim/lease race prevention;
  • worker heartbeat/stale detection;
  • evidence-required completion;
  • adapter fake integration;
  • RewardRecord shape;
  • storage compatibility under .ilchul.

Exit gate: every MVP invariant in #167 maps to at least one unit, fixture, integration, or smoke test category.

Step 7 — Add shallow-first learning and policy records (#189 + #187)

Only after #196 produces real execution evidence:

Exit gate: prediction-vs-actual records can be emitted from MVP smoke evidence without changing future behavior automatically.

Step 8 — Post-MVP integration and repair convergence (#190 + #195)

Do not block the MVP on this step.

Implement after the worker/evidence MVP is proven:

  • repair task generation and supersession;
  • repair budget;
  • IntegrationCandidate records;
  • dry-run/conflict checks;
  • final integration gate evidence;
  • cleanup/retention finalization.

Exit gate: clean integration and conflict repair fixtures both produce sealed evidence.

Development rule:

Build the spine first.
Then graph readiness.
Then ownership.
Then worker execution.
Then verification.
Then learning records.
Then integration/repair.

Conceptual dependency:

Architecture + Storage/Adapter Boundary
  -> Phase/Schema Contracts
  -> Graph Runtime MVP through worker execution
  -> Objective/Evaluation/Reward
  -> Policy Simulation
  -> Integration/Repair
  -> Verification/Rollout

Runtime Discipline

Ilchul should follow OMX-style discipline in Ilchul terms:

  1. RunState is source of truth.
  2. Agents produce evidence, not authority.
  3. Every phase transition is gated.
  4. Gate failure denies without mutation.
  5. Durable state is written before side effects.
  6. Parallel work requires claim and lease.
  7. Readiness is an explicit snapshot.
  8. Completion requires evidence.
  9. Strong PhasePreset defines phase protocol.
  10. RunObjective can strengthen but not weaken.
  11. Symbolic plans remap to concrete runtime ids.
  12. Recovery is explicit and inspectable.

MVP Minimality Rule

The roadmap defines the conceptual contract. MVP implementation should collapse concepts into the smallest runtime surface that preserves:

  • RunObjective approval before execution;
  • TaskGraph as the execute primitive;
  • gate/evidence-required transitions;
  • claim/lease for parallel work;
  • RewardRecord records;
  • sealed close.

MVP learning is calibration-ready, not fully self-optimizing. It records predictions, actual outcomes, deltas, and advisory PolicyHint values. It does not automatically mutate policy or objective weights.

Phase count does not imply heavyweight components. The full lifecycle may have eight thin PhaseEngines if they all share one contract and delegate common concerns to shared runtime services.

Minimality rule for phase engines:

  • PhaseEngine produces phase outputs, evidence refs, blockers, and proposed patches.
  • PhaseEngine does not mutate RunState directly.
  • PhaseEngine does not decide transition authority.
  • PhaseEngine does not verify its own completion.
  • RunOrchestrator records events and advances phases.
  • Verifier validates evidence.
  • GateEngine decides transitions.
  • RunState snapshot is the MVP operational source of truth; EventStore is append-only audit/replay support.
  • commitTransition(patch, event) records durable transition intent before external side effects.
  • SideEffectRunner executes only allowlisted, idempotency-keyed side effects.
  • Operational state stores status/version/refs/blockers/timestamps; payloads live behind ArtifactRef, EvidenceRef, EventStore, or external refs.

Acceptance Gates

Before runtime implementation becomes authoritative:

  1. RunContract core remains free of GitHub/PR/Ragna/Discord/kapi-agent semantics.
  2. Runtime state schemas and event replay behavior are defined.
  3. Phase/runtime schema details are owned by Design: define full-lifecycle PhasePreset and runtime spec schemas #198.
  4. Preset semantics and graph-execution component boundaries are owned by Design: define full-lifecycle preset catalog and graph-execution runtime components #199.
  5. Worker adapter contracts cover at least Codex, Pi, and Claude Code compatibility assumptions.
  6. .ilchul storage behavior is explicit and does not silently mutate existing .kapi state.
  7. Objective/evaluation scores remain advisory unless a separate issue authorizes stronger gates.
  8. RewardRecord and PolicyHint values cannot silently change behavior without a recorded PolicySelection decision.
  9. Human-approved calibration is required for objective-weight changes.
  10. Runtime MVP reaches Sub-roadmap: DAG runtime phase 3 — worker execution, heartbeat, and evidence-gated completion #196 before learning/policy claims rely on real execution evidence.
  11. Verification matrix exists before implementation PRs claim runtime readiness.
  12. MVP implementation demonstrates the minimality rule: phase engines are thin and uniform, while shared runtime services own state, gate, evidence, event, and persistence authority.
  13. Issue tiering is respected: closed design records guide language, shallow-first issues do not introduce autonomous behavior, and post-MVP issues do not block Sub-roadmap: DAG runtime phase 3 — worker execution, heartbeat, and evidence-gated completion #196.
  14. External actions are not hidden inside PhaseEngines; they are represented as SideEffectRequest records and executed through SideEffectRunner after transition commit.

Open Questions

These should become issues only if the need remains after active design tracks mature.

  1. Security / permission / sandbox policy: worker permissions, command allow/deny, secret redaction, network policy, evidence sanitization.
  2. Observability / operator UX: status, graph, workers, events, evidence, policy decision, reward history, recovery hints.
  3. Human override / approval governance: policy override, objective calibration approval, repair budget extension, audit trail.
  4. Rollout / compatibility plan: feature flags, dry-run mode, opt-in runtime, default-flip gates, existing Kapi compatibility.
  5. Benchmark / evaluation corpus: fixtures for good/bad/repair-heavy/conflict/stale-worker/policy-calibration runs.
  6. Scheduler fairness / resource budgeting: starvation, task priority, max parallelism, token/tool budgets, backpressure.
  7. Failure taxonomy / recovery runbook: readiness timeout, stale lease, worker crash, corrupted state, failed integration, bad reward event, adapter unavailable.

Current Next Actions

  1. Start with Step 1: implement the thin runtime spine from Design: define RunState schemas, versions, and migration policy #185 and Design: define runtime event taxonomy, replay, and recovery semantics #186.
  2. In parallel or immediately after, complete Step 2: the adapter/substrate contract in Design: define portable agent adapter and execution substrate contracts #188.
  3. Drive runtime MVP in order: Sub-roadmap: DAG runtime phase 1 — task graph and readiness #194 -> Sub-roadmap: DAG runtime phase 2 — claim, lease, and stale ownership #197 -> Sub-roadmap: DAG runtime phase 3 — worker execution, heartbeat, and evidence-gated completion #196.
  4. Use Design: define verification matrix for learning parallel runtime #191 to lock the verification matrix before claiming runtime readiness or changing defaults.
  5. Keep Design: define reward calculation, penalties, PolicyHint, and calibration #189 and Design: define policy simulator features, estimators, and exploration safety #187 shallow-first until Sub-roadmap: DAG runtime phase 3 — worker execution, heartbeat, and evidence-gated completion #196 produces real runtime evidence.
  6. Keep Design: define integration, repair, and supersession semantics #190/Sub-roadmap: DAG runtime phase 4 — IntegrationCandidate and repair loop #195 post-MVP unless integration/repair becomes necessary to complete Sub-roadmap: DAG runtime phase 3 — worker execution, heartbeat, and evidence-gated completion #196 evidence.

Verification

Changelog

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmeta-ssotSingle source of truth meta issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions