Skip to content
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
cf1c0aa
chore(fpf): refresh corpus to 34b4d63 — temporal claim adequacy
m0n0x41d May 4, 2026
d3ac730
feat(v8): foundation M1 — algebraic types, wire protocol, append-only…
m0n0x41d May 4, 2026
a3253aa
feat(v8): foundation M2 — HTTP+SSE server + turn driver
m0n0x41d May 4, 2026
8e5f704
Add external WorkCommission completion CLI
May 11, 2026
a74d085
fix(serve): accept artifact_ref as alias for decision_ref in measure/…
m0n0x41d May 13, 2026
14a60ab
docs(changelog): record #77 fix under Unreleased
m0n0x41d May 13, 2026
b7c2ec9
fix(review): address Codex P1/P2 findings on agent driver
m0n0x41d May 13, 2026
ea68b72
fix(review): honor artifact_ref alias in bindDecisionRef; unique tmp …
m0n0x41d May 13, 2026
20fb979
fix(review): order turn.started after StartTurn; treat closed-stream-…
m0n0x41d May 13, 2026
d79be83
fix(review): tighten permission validation, match cancel to turn id, …
m0n0x41d May 13, 2026
c6edf0a
fix(review): validate journal appends, dedupe streamed part ids
m0n0x41d May 13, 2026
d2867a7
fix(review): fail turn on tool/flush errors; journal assistant text/r…
m0n0x41d May 13, 2026
5c63eb0
fix(review): send tool args as raw JSON instead of base64
m0n0x41d May 13, 2026
3dfa545
fix(review): serialize per-session Append; safe Hub cancel
m0n0x41d May 13, 2026
86fa47d
fix(review): reject replayed running turns synchronously
m0n0x41d May 13, 2026
024de7f
fix(review): flush deltas on provider error; reject model.set mid-turn
m0n0x41d May 13, 2026
162326b
fix(review): map ErrTurnAlreadyRunning to 409 Conflict
m0n0x41d May 13, 2026
0b80cf6
fix(review): serialize model.set against turn.submit Load
m0n0x41d May 13, 2026
58e6f4b
fix(review): serialize wire-safe SessionPayload on resume
m0n0x41d May 13, 2026
17bd63c
fix(review): map permission errors to 400/404 in HTTP transport
m0n0x41d May 13, 2026
5bc6248
fix(review): serialize store.Load with concurrent Append writes
m0n0x41d May 13, 2026
e2b9cf8
fix(review): cancel-aware turn completion and stale-cancel HTTP status
m0n0x41d May 13, 2026
4b9a839
Merge pull request #79 from karabelaselias/feature/commission-complet…
m0n0x41d May 13, 2026
6cf91b5
docs(changelog): cut [7.1.0] — complete-external CLI + v8 foundation …
m0n0x41d May 13, 2026
a8d72b8
fix(review): journal canceled permissions and surface failTurn publis…
m0n0x41d May 13, 2026
8535cf4
chore(fpf): refresh corpus to ee40821 + add X-SOURCE-RESTORATION pattern
m0n0x41d May 13, 2026
d5001d6
fix(review): tag ModelChoice and SessionMeta for snake_case JSON wire
m0n0x41d May 13, 2026
e5a5b25
docs(changelog): expand 7.1.0 v8 hardening list + artifact store meta…
m0n0x41d May 13, 2026
247697e
feat(reff): cap R at 0.5 for simulation-only and nonrealizable causal…
m0n0x41d May 13, 2026
e80ff87
feat(artifact): add C.28 CausalEvidenceSupportBasis and Realizability…
m0n0x41d May 13, 2026
0dbbf65
feat(surface): C.28 schemas + soft warning + A.15.4 carrier footer on…
m0n0x41d May 13, 2026
e3cad19
test(agentdriver): sync fakeTools.calls and drain SSE until turn.comp…
m0n0x41d May 13, 2026
d45a6e7
test(agentdriver): drain in-flight dispatcher goroutines before store…
m0n0x41d May 13, 2026
5d4954e
fix(review): keep journal authoritative and normalise empty tool args
m0n0x41d May 13, 2026
2750032
docs(changelog): document post-cut v8 P2 fixes and -race CI test-side…
m0n0x41d May 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,30 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [Unreleased]

## [7.1.0] — 2026-05-13

Maintenance release on top of 7.0.0. Adds an explicit CLI completion path for externally-run WorkCommissions, lays the foundation of the v8 agent stack alongside (not replacing) the v7 production paths, fixes a silent-misroute defect on `haft_decision(measure|baseline|apply)` that corrupted the artifact graph under typical LLM-client usage, and refreshes the embedded FPF corpus.

### Added

- **`haft commission complete-external <wc-id>` CLI** ([#78](https://github.com/m0n0x41d/haft/issues/78), [#79](https://github.com/m0n0x41d/haft/pull/79)) — operator path to close a WorkCommission lifecycle after an external runner (anything outside Haft Harness) has produced local runtime evidence. The command auto-records `start_after_preflight` if the WC is still `preflighting`, then records the terminal `complete_or_block` lifecycle event. Accepts `--verdict completed|pass|failed|blocked` and inline (`--payload '{...}'`) or file-based (`--payload-file evidence.json` / `--payload @evidence.json`) evidence payloads. Refuses queued/terminal/non-running states explicitly rather than abusing `cancel`; does NOT apply, merge, or publish workspace diffs (those remain on `haft harness apply`). Before this, downstream `depends_on` WorkCommissions stayed blocked unless the operator had access to hidden MCP/tool-only lifecycle calls. Contributed by @karabelaselias.
- **v8 agent stack — foundation M1 (`internal/agentcore`, `internal/agentproto`, `internal/agentstore`)** — Layers G2/P/G3 of the new agent runtime planned in `.context/v8_plan.md` §2. `agentcore` defines pure algebraic Session/Turn/Part/Permission/SubAgentLink/ModelChoice types with sealed sum variants, opaque typed IDs (SessionID, TurnID, PartID, PermissionID), and transitions that always return new Session values (no field mutation). Mutating an existing Turn/Part, recording a Part without a Turn, completing a terminal Turn, attaching a SubAgent without naming its parent Turn, or resolving a never-requested Permission are inexpressible by construction. `agentproto` is the wire format shared with the upcoming TS TUI: 18 AgentEvent variants (session.\*, turn.\*, part.\*.delta, part.tool_use.\*, permission.\*, subagent.\*, model.switched, auth.expired) and 9 RPCCommand verbs. Tagged-envelope JSON with `kind` discriminator; `timeStamp` wrapper pins RFC3339Nano UTC; unknown kinds rejected with typed errors so forward-compat surface failures stay at the boundary. `agentstore` is an append-only per-session JSONL journal at `<store_root>/<id>/events.jsonl` with `meta.json` index. Pure Apply/Replay reconstruct a Session byte-for-byte; streaming deltas are wire-only via `IsJournalEvent` predicate so the journal stays compact. M1 acceptance bar — 1000 mixed events through both pure replay and disk round-trip producing `reflect.DeepEqual` Sessions — is asserted by `TestStore_Replay1000ViaDisk`. Coexists with legacy `internal/agent` / `internal/agentloop`; no v7 production code path is touched.
- **v8 agent stack — foundation M2 (`internal/agentserver`, `internal/agentdriver`)** — Layers G4/G5: HTTP+SSE transport and a real turn driver. `agentserver` exposes RPC verbs as POST endpoints (`/session`, `/session/:id/turn|cancel|rename|model`, `/permission/:id`), reads as GET (`/session`, `/session/:id`, `/healthz`), and fans every published AgentEvent to subscribers through a single `/event/global` SSE channel that filters by `session_id` on the client side. Server binds `127.0.0.1:0` and reports the chosen port back for env-var handoff to the TUI process. Pluggable `Dispatcher` interface keeps transport and engine independent; `StoreDispatcher` (test/no-LLM path) and `DriverDispatcher` (real engine) both implement it. `agentdriver`'s `Driver.Drive(ctx, Session, userText)` opens a turn, streams the Provider's events, dispatches tools through `ToolDispatcher`, synchronously gates permission-required tools through a `PermissionGate` (per-driver sync primitive — driver Opens, blocks on chan, HTTP handler Resolves on operator POST; ctx cancellation cleans up pending entries with no leaked channels), and finishes with `turn.completed` or `turn.failed`. Pure orchestration: no global state, no implicit clock, no implicit ID source — `IDGen` and `Now` injected for deterministic tests. `Provider`, `ToolDispatcher`, `EventSink` interfaces decouple from real LLM clients; production wiring for `internal/provider` and `internal/tools` lands in a later slice. `CombinedSink` wraps Store + Hub: state-mutating events go through `agentstore.Append` (journaled) AND publish to Hub (broadcast); streaming deltas are broadcast-only via the `IsJournalEvent` gate. M2 acceptance bar from `.context/v8_plan.md` — `curl /event/global` shows live stream during `haft agent` — is asserted by integration tests covering the full path (POST /session → SSE session.created → POST /turn → SSE turn.started/text deltas/tool_use/turn.completed) plus journal replay producing the same state via `Store.Load`. Deliberately deferred to next slice: materialized assistant TextPart/ReasoningPart events, in-driver SubAgent runner, GET /auth/status, real provider/tool adapters. Legacy `internal/agent` / `internal/agentloop` paths stay alive unchanged in parallel.
- **v8 agent driver code-review hardening pass** — multiple `fix(review)` follow-ups landed against the M1/M2 foundation before release: cancel-aware turn completion with stale-cancel HTTP status differentiation, `store.Load` serialized against concurrent `Append` writes, permission errors mapped to HTTP 400/404 instead of generic 500, wire-safe `SessionPayload` serialization on resume, `model.set` serialized against `turn.submit` Load, `ErrTurnAlreadyRunning` mapped to 409 Conflict, deltas flushed on provider error with `model.set` rejected mid-turn, replayed running turns rejected synchronously, per-session Append serialized with safe Hub cancel, tool args sent as raw JSON instead of base64, turn-failed on tool/flush errors with assistant text/reasoning journaled, journal append validation with deduped streamed part IDs, tightened permission validation with cancel matched to turn ID and shared hub, `turn.started` ordered after `StartTurn` with closed-stream-plus-canceled-ctx treated as canceled, Codex P1/P2 driver findings addressed. Net effect: the v8 foundation merges with the wire-protocol invariants verified under concurrent and adversarial conditions, not just happy-path tests.

### Fixed

- **`haft_decision(measure|baseline|apply)` silently misrouted on `artifact_ref`** ([#77](https://github.com/m0n0x41d/haft/issues/77)) — when an LLM client passed `artifact_ref` (the universal target key in `haft_refresh`, and the only documented ref on `haft_decision(evidence)`), the explicit ID was silently dropped and the call resolved to whichever DecisionRecord came back first from `store.ListByKind(KindDecisionRecord, 1)` — most commonly the most-recently-touched one. Baselines snapshotted the wrong files; measurements landed against the wrong claim chain; `haft_refresh action=scan` still flagged the intended target as `no baseline`. The reporter saw the stale-scan count drop from 42 to 27 in one round after manually switching their integration to `decision_ref`. Both serve-mode (`internal/cli/serve.go`) and tools-mode (`internal/tools/haft.go`) handlers had the same defect. Fix: `measure` and `baseline` now accept either `decision_ref` or `artifact_ref` and refuse to proceed without one — the silent `ListByKind(...,1)` fallback is gone for these two actions because corrupting authoritative state is worse than refusing to act. `apply` accepts both keys and keeps the auto-detect fallback since it is a read-only "generate brief" path with no persistent side effect. The FPF-guardrail `bindDecisionRef` helper on the tools path also honors the alias so guarded flows are not bypassed. Schema descriptions for `decision_ref` and `artifact_ref` updated to list every action that accepts each key, so future LLM clients see the right map at registration time. Three regression tests pin the bug shape: two DecisionRecords exist, the caller names the older one via `artifact_ref`, and the test fails if the implementation reaches for the newer one; a third test guards the new "no ref provided" guidance path.
- **`install.sh` picked the wrong archive on CLI installs** — install script could grab the desktop tarball over the CLI archive depending on release asset ordering; now selects the CLI archive deterministically.

### Chore

- **FPF corpus refresh to `ee40821c`** — submodule `data/FPF` bumped from `b18acde` through seven upstream commits to `ee40821` ("formatting for GitHub", 2026-05-12). Substantive content additions, not housekeeping: a new architectural region for causal evidence (C.27 Causal-use calculus + C.28 `CounterfactualSamplingRealizabilityProfile` with controlled `CausalEvidenceSupportBasis` vocabulary: observational / interventional / realized-counterfactual / identified-estimate / simulation-only) and a new A.15 cluster member A.15.4 "Work-Relevant Source Restoration" governing the recover-source-before-reliance step when an encountered item (dashboard, generated explanation, credential view, projection output, copied approval, schema/API wording, composed source chain) is about to support a work or reliance claim by appearance. Two terminology cleanups (A.6.P boundary norm square + counterfactuality) refine wording that the embedded index surfaces automatically via search; the skill prompt cites pattern IDs only, so no skill-side rewording is needed for those. Index regenerated via `task fpf-refresh`: indexed_chunks 4972 → 5062 (+90; 4996 spec + 66 patterns), fpf_commit matches submodule HEAD. Index and submodule pointer move together — the release workflow rebuilds the index from the locked submodule SHA on tag, so a drift between these two files would surface as either stale search results in dev or a mismatch on next release build.
- **`h-reason` skill prompt minor update for new FPF concepts** — two changes:
- New cross-cutting pattern `X-SOURCE-RESTORATION` (A.15.4) added to `internal/fpf/patterns/cross-cutting.md` and surfaced in the skill floor's Cross-cutting block. The detection rule "Object ≠ Description ≠ Carrier" was already in the skill; the operational rule "before reliance, recover the project source that makes the action admissible" was not. Practical sweep targets named explicitly include dashboards, generated explanations, credential views, projection outputs (`/h-view brief|rationale|audit`), copied approvals, provenance labels, schema/API text, and composed source chains. Anti-pattern: skipping restoration because the carrier "looks authoritative" — the failure A.15.4 names.
- `DEC-06 Predictions` extended with one sentence: predictions are causal claims; check realizability (can you sample the target distribution under physical / ethical / operational constraints?) before committing them as acceptance gates. Pointer to C.27 / C.28 added so the agent can pull the full calculus on demand without bloating the L1 floor.
- **Drop `darwin-amd64` (Intel Mac) from the CLI release matrix** — no longer built.

## [7.0.0] — 2026-04-29

v7 promotes specs to authoritative artifacts. The product is no longer "decision governance plus task execution"; it is **project harnessability**. A repository becomes harnessable only after it carries a parseable ProjectSpecificationSet (TargetSystemSpec + EnablingSystemSpec + TermMap), and Decisions / WorkCommissions / RuntimeRuns / Evidence flow downstream as consequences of that spec. The product surface model is also clearer: one Haft Core (semantic authority) under two production surfaces — MCP Plugin (embedded host-agent surface for Claude Code and Codex) and CLI Harness (operator/runtime surface). Desktop remains an alpha track and is not part of the v7 production envelope. Surfaces dispatch typed actions; they do not invent semantics.
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,7 @@ haft commission list --selector stale
haft commission show wc-...
haft commission requeue wc-... --reason stale_operator_recovery
haft commission cancel wc-... --reason no_longer_relevant
haft commission complete-external wc-... --runner external-runner --reason external_runtime_succeeded --payload-file runtime-evidence.json
haft commission list-runnable
haft commission claim wc-...
```
Expand Down
2 changes: 1 addition & 1 deletion data/FPF
Submodule FPF updated 2 files
+8,669 −3,686 FPF-Spec.md
+164 −93 Readme.md
25 changes: 25 additions & 0 deletions internal/agentcore/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
// Package agentcore is Layer G2 of the v8 agent stack: pure algebraic types
// for Session/Turn/Part/Permission/SubAgentLink/ModelChoice and the
// pure transitions that move a Session from one state to the next.
//
// All values in this package are immutable. Every transition function takes
// a Session and returns a NEW Session — no field mutation, no shared slice
// state. Errors are returned, never thrown. Side effects (disk, network,
// time) are forbidden here; they live at G0/G1/G3/G5.
//
// This package coexists with the legacy [internal/agent] package during the
// v8 migration. Legacy types remain authoritative for the current coordinator
// (internal/agentloop). Once M2 cuts the coordinator over to G4, legacy
// agent.Session/Message will be deprecated.
//
// Inexpressible (by design):
// - Mutating an existing Turn or Part.
// - Recording a Part without a Turn.
// - Recording a Turn without a Session.
// - Resolving a Permission that was never requested.
// - Completing a Turn that is already complete.
// - Attaching a SubAgent without naming the parent Turn.
//
// Each is rejected by the type system (sealed interfaces, opaque IDs) or by
// the transition function returning a typed error.
package agentcore
21 changes: 21 additions & 0 deletions internal/agentcore/ids.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
package agentcore

// Typed identifiers prevent cross-domain ID confusion at compile time.
// SessionID, TurnID, PartID, PermissionID are not interchangeable strings;
// the compiler rejects passing one where another is expected.

type SessionID string

type TurnID string

type PartID string

type PermissionID string

type SubAgentID string

func (s SessionID) String() string { return string(s) }
func (t TurnID) String() string { return string(t) }
func (p PartID) String() string { return string(p) }
func (p PermissionID) String() string { return string(p) }
func (s SubAgentID) String() string { return string(s) }
30 changes: 30 additions & 0 deletions internal/agentcore/model.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
package agentcore

// ProviderKind enumerates the LLM provider families haft can speak to.
// Adding a kind is a deliberate breaking change to G1 — the wire format
// in Layer P encodes ProviderKind as a string discriminator and consumers
// will reject unknown values rather than silently ignoring them.
type ProviderKind string

const (
ProviderOpenAI ProviderKind = "openai"
ProviderAnthropic ProviderKind = "anthropic"
// ProviderCodex is the ChatGPT-Sub OAuth path that reuses the OpenAI
// API surface but carries chatgpt_account_id auth. It is preserved
// from internal/cli/login.go and remains the only auth flow that
// requires a device-code exchange.
ProviderCodex ProviderKind = "codex"
)

// ModelChoice is the immutable triple a Session pins itself to. Switching
// model mid-session is modeled as the runtime emitting a model.switched
// event and the next Turn binding to a new ModelChoice; the current Turn
// keeps the choice it started with.
type ModelChoice struct {
Provider ProviderKind `json:"provider"`
Model string `json:"model"` // provider-native model id (e.g. "gpt-5.4", "claude-sonnet-4-6")
// CredentialKey identifies which stored credential to use without
// embedding the secret value here. The G1 provider layer dereferences
// the key against the credential store at call time.
CredentialKey string `json:"credential_key,omitempty"`
}
Loading
Loading