feat: add task claim leases#233
Conversation
|
@kapi-agent review Revision explanation for current head What changed:
Why this closes the issue:
Verification:
|
kapi-agent review completedThe formal GitHub PR review and required check were submitted separately. This request comment is kept concise to avoid duplicating the full review body. kapi-agent request lifecycle:
|
There was a problem hiding this comment.
kapi-agent review
Verdict: APPROVE
Final approval summary
Review journey
This PR adds phase-2 TaskGraph claim/lease lifecycle support. The semantic scope is bounded to the domain layer plus focused tests and README boundary wording: no worker dispatch, tmux/worktree runtime wiring, or durable event-envelope persistence is included.
There was no prior kapi-agent review on this head. The author provided a current-head revision explanation, and the change stays within the configured size threshold.
What changed
src/domain/task-graph.ts- Adds
ClaimLease, deterministic lease token creation, claim metadata, renew/release/recover operations, active-claim gating, and claim/lease runtime event shapes.
- Adds
test/task-graph.test.ts- Adds regression coverage for deterministic tokens, non-ready claim rejection, duplicate claim rejection, renew/release, expired completion blocking, explicit recovery, and event records.
README.md- Updates graph-execution boundary wording to include claim/lease ownership, stale-claim recovery, and claim events.
Why this is correct
The implementation enforces the key ownership contract: only ready tasks can be claimed, existing claims cannot be overwritten by normal claim flow, completion requires a matching unexpired claim token, and expired work is recoverable only through recoverExpiredClaim(). The recovery path preserves lineage through recoveredFromToken and increments attempts, which supports later worker/runtime phases without coupling this domain slice to dispatch or persistence.
Evidence
- Verifier gate: PASS —
npm ci && npm run verifyexited 0. - Size gate: PASS — 179 changed lines, below the 200-line semantic review threshold.
- Revision-explanation status: not required, but found.
- Ilchul harness gate: PASS — STANDARD risk, no blocking findings.
- Inspected files:
src/domain/task-graph.ts,test/task-graph.test.ts,README.md.
Remaining risks and approval rationale
Remaining risks are bounded and documented: durable event-envelope persistence and live worker dispatch are intentionally deferred, and deterministic tokens are not cryptographic secrets. These are acceptable for this domain-only phase because callers can provide explicit opaque tokens later, and the current contract is covered by focused tests plus the full verification suite.
Blocking issues
None.
Warnings / risks
src/domain/task-graph.ts:createClaimLeaseRuntimeEvent("claim.recovered", lease)can emitrecoveredFromToken: ""if called with a non-recovery lease. This is not blocking for the current PR becauserecoverExpiredClaim()creates the expected metadata and tests cover the normal path, but a future hardening pass could fail closed for malformed recovered events.- Durable persistence and worker dispatch are still out of scope, so runtime consumers must not assume these domain events are automatically persisted or emitted by workers yet.
Suggestions
- Consider adding a small test that
recoverExpiredClaim()rejects a new lease expiry that is not afternow, since that behavior is currently delegated throughcreateClaimLease(). - Consider making
createClaimLeaseRuntimeEvent("claim.recovered", ...)requirerecoveredFromTokenat runtime instead of defaulting to an empty string.
Looks good
- The explicit recovery path cleanly separates stale-claim recovery from normal duplicate-claim rejection.
- Completion is consistently gated through
requireActiveClaim(), reducing duplicated token/expiry logic. - Tests cover the main lifecycle transitions and regression-sensitive acceptance criteria.
- README boundary wording remains aligned with the actual implementation scope.
Verification notes
Verifier gate status: PASS — npm ci && npm run verify exited 0.
Size gate status: PASS — 179 changed lines below threshold.
Revision-explanation status: not required, found.
Ilchul review harness: PASS — STANDARD risk profile, no blocking findings.
Engine: pi
Summary
Linked issue
Closes #197
Problem
Issue #197 asks for DAG runtime phase 2: ready tasks must be claimable by exactly one worker, active leases must gate normal completion, expired leases must become recoverable only through an explicit path, and claim/lease transitions need event records.
Before this PR,
src/domain/task-graph.tshad a basicTaskClaimshape and completion token check, but the lease lifecycle was incomplete: token generation was caller-only, renew/release/recovery were absent, expired claims could not be represented as first-class recovery transitions, and claim/lease events were not exposed.Options considered
claimTask()/completeTask()checks and document recovery as future work.TaskGraph.Selected approach
Selected option: 2 — focused domain-only claim/lease lifecycle.
Why this one: phase 2 is about the execution contract, not live worker dispatch. Keeping it in the domain layer gives later worker/runtime phases a stable surface for ownership semantics while preserving the phase boundary.
Risks/trade-offs: the token generator is deterministic and domain-local, not a cryptographic secret generator. Callers that need stronger entropy can still provide an explicit token. Event records are lightweight domain records, not persisted
RuntimeEventEnvelopes.Implementation by file/surface
src/domain/task-graph.tsClaimLeaseand optional claim metadata (claimedAt,recoveredFromToken).createClaimLease()and deterministiccreateClaimLeaseToken().claimTask()to validate graph shape, claim onlyreadytasks, and reject any existing claim unless the explicit recovery path is used.renewClaimLease(),releaseClaim(), andrecoverExpiredClaim().completeTask()gated by a matching unexpired claim token and evidence refs.test/task-graph.test.tsREADME.mdWhy this fixes it
Ready tasks now move into claimed ownership with a lease token. Pending, blocked, and completed tasks reject claims; duplicate claims reject; completion without a matching unexpired token rejects; expired leases block normal completion; and stale work can be recovered only via
recoverExpiredClaim(). The domain also exposes transition event records for the phase-2 claim lifecycle.QA / Verification
npm ci— pass; installed local dependencies becausetsxwas absent in this fresh worktree.npm test -- test/task-graph.test.ts— pass; package script ran the fulltest/*.test.tssuite plus the explicit file argument (532tests,521pass,11skipped).npm run check— pass.npm run check:unused— pass.npm run quality:budgets— pass with configured non-failingcode_smellswarning.npm run verify— pass (532tests,521pass,11skipped; then check, unused check, quality budgets).git diff --check— pass.Anomalies observed
npm test -- test/task-graph.test.tsdoes not run only the named file becausepackage.jsondefinestsx --test test/*.test.ts; it executed the full test suite.[DEP0205] module.register()deprecation warnings during tests.code_smells=61; I reduced one source-level smell from the new code and reran the gates successfully withcode_smells=60.npm run quality:budgetsretains the existing warning:code_smells=60vs target<=20; the command exits 0 under the configured budget policy.Risks / Follow-up
kapi-agent review expectations and current-head merge gate
e9026e46dc51420052629d974823363ff6e7237a(full SHA available from PR metadata).origin/dev(168insertions,11deletions), under the review-size gate.kapi-agentshould verify claim lifecycle semantics, explicit stale recovery, event records, and README alignment before merge.