m0n0x41d · m0n0x41d · May 13, 2026 · May 4, 2026 · May 4, 2026 · May 4, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,30 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 
 ## [Unreleased]
 
+## [7.1.0] — 2026-05-13
+
+Maintenance release on top of 7.0.0. Adds an explicit CLI completion path for externally-run WorkCommissions, lays the foundation of the v8 agent stack alongside (not replacing) the v7 production paths, fixes a silent-misroute defect on `haft_decision(measure|baseline|apply)` that corrupted the artifact graph under typical LLM-client usage, and refreshes the embedded FPF corpus.
+
+### Added
+
+- **`haft commission complete-external <wc-id>` CLI** ([#78](https://github.com/m0n0x41d/haft/issues/78), [#79](https://github.com/m0n0x41d/haft/pull/79)) — operator path to close a WorkCommission lifecycle after an external runner (anything outside Haft Harness) has produced local runtime evidence. The command auto-records `start_after_preflight` if the WC is still `preflighting`, then records the terminal `complete_or_block` lifecycle event. Accepts `--verdict completed|pass|failed|blocked` and inline (`--payload '{...}'`) or file-based (`--payload-file evidence.json` / `--payload @evidence.json`) evidence payloads. Refuses queued/terminal/non-running states explicitly rather than abusing `cancel`; does NOT apply, merge, or publish workspace diffs (those remain on `haft harness apply`). Before this, downstream `depends_on` WorkCommissions stayed blocked unless the operator had access to hidden MCP/tool-only lifecycle calls. Contributed by @karabelaselias.
+- **v8 agent stack — foundation M1 (`internal/agentcore`, `internal/agentproto`, `internal/agentstore`)** — Layers G2/P/G3 of the new agent runtime planned in `.context/v8_plan.md` §2. `agentcore` defines pure algebraic Session/Turn/Part/Permission/SubAgentLink/ModelChoice types with sealed sum variants, opaque typed IDs (SessionID, TurnID, PartID, PermissionID), and transitions that always return new Session values (no field mutation). Mutating an existing Turn/Part, recording a Part without a Turn, completing a terminal Turn, attaching a SubAgent without naming its parent Turn, or resolving a never-requested Permission are inexpressible by construction. `agentproto` is the wire format shared with the upcoming TS TUI: 18 AgentEvent variants (session.\*, turn.\*, part.\*.delta, part.tool_use.\*, permission.\*, subagent.\*, model.switched, auth.expired) and 9 RPCCommand verbs. Tagged-envelope JSON with `kind` discriminator; `timeStamp` wrapper pins RFC3339Nano UTC; unknown kinds rejected with typed errors so forward-compat surface failures stay at the boundary. `agentstore` is an append-only per-session JSONL journal at `<store_root>/<id>/events.jsonl` with `meta.json` index. Pure Apply/Replay reconstruct a Session byte-for-byte; streaming deltas are wire-only via `IsJournalEvent` predicate so the journal stays compact. M1 acceptance bar — 1000 mixed events through both pure replay and disk round-trip producing `reflect.DeepEqual` Sessions — is asserted by `TestStore_Replay1000ViaDisk`. Coexists with legacy `internal/agent` / `internal/agentloop`; no v7 production code path is touched.
+- **v8 agent stack — foundation M2 (`internal/agentserver`, `internal/agentdriver`)** — Layers G4/G5: HTTP+SSE transport and a real turn driver. `agentserver` exposes RPC verbs as POST endpoints (`/session`, `/session/:id/turn|cancel|rename|model`, `/permission/:id`), reads as GET (`/session`, `/session/:id`, `/healthz`), and fans every published AgentEvent to subscribers through a single `/event/global` SSE channel that filters by `session_id` on the client side. Server binds `127.0.0.1:0` and reports the chosen port back for env-var handoff to the TUI process. Pluggable `Dispatcher` interface keeps transport and engine independent; `StoreDispatcher` (test/no-LLM path) and `DriverDispatcher` (real engine) both implement it. `agentdriver`'s `Driver.Drive(ctx, Session, userText)` opens a turn, streams the Provider's events, dispatches tools through `ToolDispatcher`, synchronously gates permission-required tools through a `PermissionGate` (per-driver sync primitive — driver Opens, blocks on chan, HTTP handler Resolves on operator POST; ctx cancellation cleans up pending entries with no leaked channels), and finishes with `turn.completed` or `turn.failed`. Pure orchestration: no global state, no implicit clock, no implicit ID source — `IDGen` and `Now` injected for deterministic tests. `Provider`, `ToolDispatcher`, `EventSink` interfaces decouple from real LLM clients; production wiring for `internal/provider` and `internal/tools` lands in a later slice. `CombinedSink` wraps Store + Hub: state-mutating events go through `agentstore.Append` (journaled) AND publish to Hub (broadcast); streaming deltas are broadcast-only via the `IsJournalEvent` gate. M2 acceptance bar from `.context/v8_plan.md` — `curl /event/global` shows live stream during `haft agent` — is asserted by integration tests covering the full path (POST /session → SSE session.created → POST /turn → SSE turn.started/text deltas/tool_use/turn.completed) plus journal replay producing the same state via `Store.Load`. Deliberately deferred to next slice: materialized assistant TextPart/ReasoningPart events, in-driver SubAgent runner, GET /auth/status, real provider/tool adapters. Legacy `internal/agent` / `internal/agentloop` paths stay alive unchanged in parallel.
+- **v8 agent driver code-review hardening pass** — multiple `fix(review)` follow-ups landed against the M1/M2 foundation before release: cancel-aware turn completion with stale-cancel HTTP status differentiation, `store.Load` serialized against concurrent `Append` writes, permission errors mapped to HTTP 400/404 instead of generic 500, wire-safe `SessionPayload` serialization on resume, `model.set` serialized against `turn.submit` Load, `ErrTurnAlreadyRunning` mapped to 409 Conflict, deltas flushed on provider error with `model.set` rejected mid-turn, replayed running turns rejected synchronously, per-session Append serialized with safe Hub cancel, tool args sent as raw JSON instead of base64, turn-failed on tool/flush errors with assistant text/reasoning journaled, journal append validation with deduped streamed part IDs, tightened permission validation with cancel matched to turn ID and shared hub, `turn.started` ordered after `StartTurn` with closed-stream-plus-canceled-ctx treated as canceled, Codex P1/P2 driver findings addressed. Net effect: the v8 foundation merges with the wire-protocol invariants verified under concurrent and adversarial conditions, not just happy-path tests.
+
+### Fixed
+
+- **`haft_decision(measure|baseline|apply)` silently misrouted on `artifact_ref`** ([#77](https://github.com/m0n0x41d/haft/issues/77)) — when an LLM client passed `artifact_ref` (the universal target key in `haft_refresh`, and the only documented ref on `haft_decision(evidence)`), the explicit ID was silently dropped and the call resolved to whichever DecisionRecord came back first from `store.ListByKind(KindDecisionRecord, 1)` — most commonly the most-recently-touched one. Baselines snapshotted the wrong files; measurements landed against the wrong claim chain; `haft_refresh action=scan` still flagged the intended target as `no baseline`. The reporter saw the stale-scan count drop from 42 to 27 in one round after manually switching their integration to `decision_ref`. Both serve-mode (`internal/cli/serve.go`) and tools-mode (`internal/tools/haft.go`) handlers had the same defect. Fix: `measure` and `baseline` now accept either `decision_ref` or `artifact_ref` and refuse to proceed without one — the silent `ListByKind(...,1)` fallback is gone for these two actions because corrupting authoritative state is worse than refusing to act. `apply` accepts both keys and keeps the auto-detect fallback since it is a read-only "generate brief" path with no persistent side effect. The FPF-guardrail `bindDecisionRef` helper on the tools path also honors the alias so guarded flows are not bypassed. Schema descriptions for `decision_ref` and `artifact_ref` updated to list every action that accepts each key, so future LLM clients see the right map at registration time. Three regression tests pin the bug shape: two DecisionRecords exist, the caller names the older one via `artifact_ref`, and the test fails if the implementation reaches for the newer one; a third test guards the new "no ref provided" guidance path.
+- **`install.sh` picked the wrong archive on CLI installs** — install script could grab the desktop tarball over the CLI archive depending on release asset ordering; now selects the CLI archive deterministically.
+
+### Chore
+
+- **FPF corpus refresh to `ee40821c`** — submodule `data/FPF` bumped from `b18acde` through seven upstream commits to `ee40821` ("formatting for GitHub", 2026-05-12). Substantive content additions, not housekeeping: a new architectural region for causal evidence (C.27 Causal-use calculus + C.28 `CounterfactualSamplingRealizabilityProfile` with controlled `CausalEvidenceSupportBasis` vocabulary: observational / interventional / realized-counterfactual / identified-estimate / simulation-only) and a new A.15 cluster member A.15.4 "Work-Relevant Source Restoration" governing the recover-source-before-reliance step when an encountered item (dashboard, generated explanation, credential view, projection output, copied approval, schema/API wording, composed source chain) is about to support a work or reliance claim by appearance. Two terminology cleanups (A.6.P boundary norm square + counterfactuality) refine wording that the embedded index surfaces automatically via search; the skill prompt cites pattern IDs only, so no skill-side rewording is needed for those. Index regenerated via `task fpf-refresh`: indexed_chunks 4972 → 5062 (+90; 4996 spec + 66 patterns), fpf_commit matches submodule HEAD. Index and submodule pointer move together — the release workflow rebuilds the index from the locked submodule SHA on tag, so a drift between these two files would surface as either stale search results in dev or a mismatch on next release build.
+- **`h-reason` skill prompt minor update for new FPF concepts** — two changes:
+  - New cross-cutting pattern `X-SOURCE-RESTORATION` (A.15.4) added to `internal/fpf/patterns/cross-cutting.md` and surfaced in the skill floor's Cross-cutting block. The detection rule "Object ≠ Description ≠ Carrier" was already in the skill; the operational rule "before reliance, recover the project source that makes the action admissible" was not. Practical sweep targets named explicitly include dashboards, generated explanations, credential views, projection outputs (`/h-view brief|rationale|audit`), copied approvals, provenance labels, schema/API text, and composed source chains. Anti-pattern: skipping restoration because the carrier "looks authoritative" — the failure A.15.4 names.
+  - `DEC-06 Predictions` extended with one sentence: predictions are causal claims; check realizability (can you sample the target distribution under physical / ethical / operational constraints?) before committing them as acceptance gates. Pointer to C.27 / C.28 added so the agent can pull the full calculus on demand without bloating the L1 floor.
+- **Drop `darwin-amd64` (Intel Mac) from the CLI release matrix** — no longer built.
+
 ## [7.0.0] — 2026-04-29
 
 v7 promotes specs to authoritative artifacts. The product is no longer "decision governance plus task execution"; it is **project harnessability**. A repository becomes harnessable only after it carries a parseable ProjectSpecificationSet (TargetSystemSpec + EnablingSystemSpec + TermMap), and Decisions / WorkCommissions / RuntimeRuns / Evidence flow downstream as consequences of that spec. The product surface model is also clearer: one Haft Core (semantic authority) under two production surfaces — MCP Plugin (embedded host-agent surface for Claude Code and Codex) and CLI Harness (operator/runtime surface). Desktop remains an alpha track and is not part of the v7 production envelope. Surfaces dispatch typed actions; they do not invent semantics.

diff --git a/README.md b/README.md
@@ -230,6 +230,7 @@ haft commission list --selector stale
 haft commission show wc-...
 haft commission requeue wc-... --reason stale_operator_recovery
 haft commission cancel wc-... --reason no_longer_relevant
+haft commission complete-external wc-... --runner external-runner --reason external_runtime_succeeded --payload-file runtime-evidence.json
 haft commission list-runnable
 haft commission claim wc-...
 ```

diff --git a/data/FPF b/data/FPF
diff --git a/internal/agentcore/doc.go b/internal/agentcore/doc.go
@@ -0,0 +1,25 @@
+// Package agentcore is Layer G2 of the v8 agent stack: pure algebraic types
+// for Session/Turn/Part/Permission/SubAgentLink/ModelChoice and the
+// pure transitions that move a Session from one state to the next.
+//
+// All values in this package are immutable. Every transition function takes
+// a Session and returns a NEW Session — no field mutation, no shared slice
+// state. Errors are returned, never thrown. Side effects (disk, network,
+// time) are forbidden here; they live at G0/G1/G3/G5.
+//
+// This package coexists with the legacy [internal/agent] package during the
+// v8 migration. Legacy types remain authoritative for the current coordinator
+// (internal/agentloop). Once M2 cuts the coordinator over to G4, legacy
+// agent.Session/Message will be deprecated.
+//
+// Inexpressible (by design):
+//   - Mutating an existing Turn or Part.
+//   - Recording a Part without a Turn.
+//   - Recording a Turn without a Session.
+//   - Resolving a Permission that was never requested.
+//   - Completing a Turn that is already complete.
+//   - Attaching a SubAgent without naming the parent Turn.
+//
+// Each is rejected by the type system (sealed interfaces, opaque IDs) or by
+// the transition function returning a typed error.
+package agentcore
diff --git a/internal/agentcore/ids.go b/internal/agentcore/ids.go
@@ -0,0 +1,21 @@
+package agentcore
+
+// Typed identifiers prevent cross-domain ID confusion at compile time.
+// SessionID, TurnID, PartID, PermissionID are not interchangeable strings;
+// the compiler rejects passing one where another is expected.
+
+type SessionID string
+
+type TurnID string
+
+type PartID string
+
+type PermissionID string
+
+type SubAgentID string
+
+func (s SessionID) String() string    { return string(s) }
+func (t TurnID) String() string       { return string(t) }
+func (p PartID) String() string       { return string(p) }
+func (p PermissionID) String() string { return string(p) }
+func (s SubAgentID) String() string   { return string(s) }
diff --git a/internal/agentcore/model.go b/internal/agentcore/model.go
@@ -0,0 +1,30 @@
+package agentcore
+
+// ProviderKind enumerates the LLM provider families haft can speak to.
+// Adding a kind is a deliberate breaking change to G1 — the wire format
+// in Layer P encodes ProviderKind as a string discriminator and consumers
+// will reject unknown values rather than silently ignoring them.
+type ProviderKind string
+
+const (
+	ProviderOpenAI    ProviderKind = "openai"
+	ProviderAnthropic ProviderKind = "anthropic"
+	// ProviderCodex is the ChatGPT-Sub OAuth path that reuses the OpenAI
+	// API surface but carries chatgpt_account_id auth. It is preserved
+	// from internal/cli/login.go and remains the only auth flow that
+	// requires a device-code exchange.
+	ProviderCodex ProviderKind = "codex"
+)
+
+// ModelChoice is the immutable triple a Session pins itself to. Switching
+// model mid-session is modeled as the runtime emitting a model.switched
+// event and the next Turn binding to a new ModelChoice; the current Turn
+// keeps the choice it started with.
+type ModelChoice struct {
+	Provider ProviderKind `json:"provider"`
+	Model    string       `json:"model"` // provider-native model id (e.g. "gpt-5.4", "claude-sonnet-4-6")
+	// CredentialKey identifies which stored credential to use without
+	// embedding the secret value here. The G1 provider layer dereferences
+	// the key against the credential store at call time.
+	CredentialKey string `json:"credential_key,omitempty"`
+}