This document defines the Ilchul runtime storage/configuration surface for issue #169. The implementation now routes active workflow and worker storage to .ilchul / ~/.ilchul; legacy .kapi folders are preserved only as historical local state and are not active fallback roots.
- Define the canonical
.ilchulruntime shape without deleting or renaming existing.kapistate. - Define a portable adapter configuration surface for Codex, Pi, Claude Code, and future worker substrates.
- Define worker retention states so supervisors can distinguish audit retention from safe cleanup and stale leaks.
- Identify additive implementation slices that can be reviewed independently.
- No broad
kapi -> ilchulrename. - No filesystem mutation in this design issue.
- No destructive
.kapimigration or local-folder cleanup. - No automatic tmux/worktree deletion without an explicit safe cleanup command.
- No runtime plugin system or dynamic adapter loading authority.
The canonical active root is .ilchul/ inside the supervised workspace, with user-level defaults under ~/.ilchul/. The naming policy in docs/ilchul-naming-policy.md remains authoritative for storage safety:
- new runtime state is written under
.ilchul/; - default worker worktrees are created under
~/.ilchul/worktrees; - existing
.kapistate must be preserved but is not detected as an active fallback root; - both-present roots use
.ilchulfor normal routing and must not trigger implicit cleanup.
.ilchul/
active.json
config.json
runs/
<run-id>/
run-contract.json
objective.json
policy-selection.json
task-graph.json
workers.json
claims.json
events.jsonl
evidence/
evaluations/
integration/
learning-summary.md
learning/
reward-ledger.jsonl
policy-hints.json
strategy-stats.json
simulator-calibration.json
migrations/
storage-root.json
recovery.log
| File or directory | Responsibility | Authority |
|---|---|---|
active.json |
Points at the active non-terminal run, if any. | Runtime pointer only; not a deletion authority. |
config.json |
Workspace-local adapter/substrate/defaults config. | Configuration input after validation. |
runs/<run-id>/run-contract.json |
RunContract projection/contract snapshot for the run. | Run meaning and evidence expectation source. |
runs/<run-id>/objective.json |
Objective/evaluation intent. | Advisory until a later issue grants stronger authority. |
runs/<run-id>/policy-selection.json |
Records why an execution strategy was chosen. | Audit trail for strategy choice. |
runs/<run-id>/task-graph.json |
DAG task ids, dependencies, attempts, and statuses. | Task readiness source after scheduler implementation. |
runs/<run-id>/workers.json |
Worker lifecycle/retention state. | Supervisor inspection and safe-cleanup input. |
runs/<run-id>/claims.json |
Claim tokens, leases, and claim owner records. | Duplicate-execution guard after scheduler implementation. |
runs/<run-id>/events.jsonl |
Append-only runtime event stream for replay/recovery. | Recovery/audit evidence; malformed events fail closed. |
evidence/ |
Evidence refs and command/artifact proof. | Completion support; exact authority depends on workflow contract. |
evaluations/ |
Objective/evaluator outputs. | Advisory unless a future design grants hard gates. |
integration/ |
Merge/repair/integration records. | Explicit integration state, not hidden mutation. |
learning/ |
Cross-run reward and policy-learning data. | Learning input only; policy changes must be recorded in policy-selection.json. |
migrations/ |
Storage-root selection and recovery records. | Migration audit evidence; cleanup still requires explicit authorization. |
The future config should be JSON-serializable, validated before use, and conservative by default:
{
"schemaVersion": 1,
"storage": {
"root": ".ilchul",
"legacyRootPolicy": "preserve-without-active-fallback"
},
"adapters": {
"codex": { "enabled": true, "defaultSubstrate": "tmux" },
"pi": { "enabled": true, "defaultSubstrate": "tmux" },
"claudeCode": { "enabled": false, "defaultSubstrate": "process" }
},
"runtime": {
"worktreeRoot": ".ilchul/worktrees",
"maxWorkers": 3,
"readinessTimeoutSeconds": 60,
"leaseDurationSeconds": 900,
"heartbeatTimeoutSeconds": 120
},
"retention": {
"completedDefault": "completed-retained",
"safeCleanupRequiresCommand": true,
"retainLogsDays": 7
},
"verification": {
"defaultDepth": "standard",
"requireEvidenceForTaskCompletion": true
}
}Config rules:
- Unknown adapters are rejected or ignored with a warning until a scoped adapter issue accepts them.
worktreeRootmust stay inside the trusted workspace or approved user-level Ilchul root.maxWorkers, lease, heartbeat, and readiness values must have bounded minimums/maximums.safeCleanupRequiresCommanddefaults totrue; normal status/report/start commands must not delete sessions, worktrees, branches, or storage roots.- Learning and objective fields may advise strategy, but selected runtime behavior must be recorded in
policy-selection.json.
Worker state is separate from task status. A run can be terminal while one or more worker sessions remain retained for audit.
active
-> completed-retained
-> safe-to-close
-> cleanup-released
-> closed
active
-> stale-registry
-> safe-to-close
| State | Meaning | Supervisor action |
|---|---|---|
active |
Worker is assigned to non-terminal work or still expected to produce output. | Do not close. Inspect heartbeat/readiness first. |
completed-retained |
Work is terminal but the session/log remains intentionally available for audit. | Safe to leave alone; report as expected retention. |
safe-to-close |
Worker is terminal, unretained, and no active claim depends on it. | Eligible for explicit safe cleanup command. |
stale-registry |
Registry says the worker exists, but runtime evidence is missing, contradictory, or expired. | Report recovery/cleanup options; do not guess ownership. |
cleanup-released |
A safe cleanup command has released the retention hold or requested shutdown. | Await closure confirmation; preserve audit event. |
closed |
Runtime handle is gone and closure was observed. | Keep metadata for history; no runtime action needed. |
Transition rules:
- Active workers cannot become
safe-to-closewhile they hold an unexpired claim or pending evidence obligation. - Completed workers default to
completed-retainedwhen audit inspection is useful. - Only an explicit safe cleanup command may move
completed-retainedorsafe-to-closetowardcleanup-released. stale-registryis a diagnostic state, not proof that deletion is safe.- Cleanup must target Kapi/Ilchul-owned handles only and must not delete user-owned worktrees or branches.
Safe cleanup may close an unretained terminal tmux/process handle after ownership is verified. It must not:
- delete
.kapi,.ilchul, worktrees, branches, evidence, learning ledgers, or run directories; - kill active or uncertain workers;
- collapse both-present
.kapi/.ilchulmigration decisions into cleanup behavior; - run as a side effect of status, report, doctor, verify, or workflow start.
Destructive cleanup requires a separate issue, explicit scope, rollback/recovery notes, and Kade authorization for that slice.
- Add read-only legacy
.kapidiagnostics without using it as a routing root. - Add config schema validation and a
config.jsonreader with bounded defaults. - Persist worker retention states and expose them through read-only report/doctor/status surfaces.
- Add explicit
cleanup --safebehavior for verified terminal unretained runtime handles only. - Add migration recovery records only if a future issue authorizes importing or archiving legacy
.kapistate. - Design any destructive legacy cleanup as a separately authorized migration slice.
- The
.ilchullayout documents.kapias legacy local state rather than an active fallback root. - Adapter config covers enabled adapters, default substrates, worktree root, worker caps, readiness timeout, lease duration, cleanup/retention defaults, and verification depth.
- Worker retention distinguishes active, completed-retained, safe-to-close, stale-registry, cleanup-released, and closed.
- Safe cleanup is explicit and non-destructive.
- Follow-up implementation slices are additive and independently reviewable.