Conversation
Submodule data/FPF bumped one commit upstream: b18acde → 34b4d63 "temporal claim adequacy" Embedded index regenerated against the new HEAD via `task fpf-refresh` (which chains fpf-pull → fpf-index): indexed_chunks: 4961 → 4972 (+11; 4907 spec + 65 patterns) fpf_commit: b18acde → 34b4d63 (matches submodule HEAD) Index and submodule pointer must move together — the release workflow rebuilds the index from the locked submodule SHA on tag, so a drift between these two files would surface as either stale search results in dev or a mismatch on next release build. No code change. Search and lookup paths (`haft fpf search`, `haft_query(action="fpf")`, MCP plugin reasoning hints) pick up the upstream wording change automatically.
… journal Lays the bottom three layers of the v8 agent stack as planned in .context/v8_plan.md §2. Coexists with legacy internal/agent and internal/agentloop — zero changes to v7 production code paths (MCP server, harness, spec onboarding, init, check, verify). M2 foundation (HTTP+SSE server + turn driver) lands separately. internal/agentcore — Layer G2: pure algebraic types Session, Turn, Part (sum: text/reasoning/tool_use/tool_result/ file_ref/step_boundary), Permission, SubAgentLink, ModelChoice. All transitions return new Session — no field mutation, no shared slice/map state. Sealed sum types via unexported markers; opaque typed IDs (SessionID, TurnID, PartID, PermissionID) prevent cross- domain confusion at compile time. Inexpressible by construction: mutating an existing Turn or Part, recording a Part without a Turn, completing a Turn that's already terminal, attaching a SubAgent without naming the parent Turn, resolving a Permission that was never requested. 19/19 tests, including tampering tests that prove withPart and withPermissions copy underlying slices/maps even when capacity would permit aliasing. internal/agentproto — Layer P: wire format shared with the TS TUI 18 AgentEvent variants (session.*, turn.*, part.*.delta, part.tool_use.*, permission.*, subagent.*, model.switched, auth.expired) and 9 RPCCommand verbs (session.create/list/resume/rename, turn.submit/cancel, permission.respond, model.set, subagent.attach). PartPayload tagged- envelope encodes a materialized Part for transport; deltas use per-variant streaming events instead. Tagged-envelope JSON with kind discriminator. timeStamp wrapper pins RFC3339Nano UTC so the TS SDK's new Date(...) round-trips losslessly. Decoders refuse unknown kinds with typed errors — forward-compat surface failures at the boundary, not silently. 8/8 round-trip + unknown-kind rejection tests. internal/agentstore — Layer G3: append-only journal + replay Per-session JSONL journal at <store_root>/<id>/events.jsonl with meta.json index for fast List() without replay. Pure Apply/Replay reconstruct an agentcore.Session from a journal byte-for-byte. Streaming deltas are wire-only (IsJournalEvent predicate gates routing); only state-mutating events get journaled. Compaction and delta-coalescing are M2/G4 concerns. 15/15 tests including the M1 acceptance bar: TestStore_Replay1000 ViaDisk runs 1000 mixed events through both pure in-memory replay and disk round-trip and asserts the two reconstructed Sessions are reflect.DeepEqual. Layer N talks only to N-1; skip-level access is forbidden by package import graph. Side effects only at G0 (db/fs) and G3 (journal write); G2 is pure. Migration plan: legacy internal/agent.Session/Message stays authoritative for the existing coordinator until M2 cuts over. Both type families coexist; no auto-migration runs.
Builds on M1 (agentcore + agentproto + agentstore) to give the v8
stack a real over-the-wire surface and a real turn loop. Still no
changes to legacy v7 paths: MCP server (haft serve) is a separate
binary command, separate code path, with no shared HTTP listener
or transitive coupling. The TS TUI in M3 will speak this surface
verbatim.
internal/agentserver — Layer G5: HTTP + SSE
/event/global is the single SSE channel that fans every published
AgentEvent to every subscriber; the TUI client filters by session_id
on its side. RPC verbs land as POST endpoints
(/session, /session/:id/turn|cancel|rename|model, /permission/:id);
reads land as GET (/session, /session/:id, /healthz). Server binds
127.0.0.1:0 by default — the chosen port is returned to the parent
process for env-var handoff to the TUI process.
Pluggable Dispatcher interface keeps transport and engine
independent. StoreDispatcher (in-package) handles session lifecycle
CRUD without an LLM and is what test code uses; DriverDispatcher
(in agentdriver) plugs in the real turn engine.
Hub is a deliberately small unbounded fan-out — backpressure on a
slow listener is treated as the correct behaviour, not a reason to
silently drop events. The only consumer is the local TUI process,
which drains as fast as the loopback can carry it.
10/10 tests including a real HTTP+SSE round-trip that subscribes,
triggers an event from another goroutine, and verifies the wire
bytes decode back to the right typed event.
internal/agentdriver — Layer G4: turn loop + permission gate
Driver.Drive(ctx, Session, userText) opens a turn, streams the
Provider's events, dispatches tools through ToolDispatcher,
synchronously gates permission-required tools through PermissionGate,
and finishes with turn.completed or turn.failed. Pure orchestration:
no global state, no implicit clock, no implicit ID source — IDGen
and Now are injected so tests are deterministic.
Provider, ToolDispatcher, EventSink interfaces decouple the driver
from real LLM clients and tool implementations. Production wiring
(M2c follow-up) adapts internal/provider and internal/tools to
these contracts; legacy internal/agentloop coordinator stays alive
unchanged in parallel.
PermissionGate is a per-driver synchronisation primitive: the
driver Open()s a permission, blocks on the chan, and the HTTP
handler Resolve()s on the operator's POST. Context cancellation
cleans up pending entries — no leaked channels.
Dispatcher implements agentserver.Dispatcher with an in-flight
per-session cancel map: a fresh turn cancels the previous one
before starting (agentcore also rejects concurrent turns at append
time, so this is belt-and-braces).
CombinedSink wraps Store + Hub: state-mutating events go through
agentstore.Append (journaled) and every event publishes to the Hub
(broadcast). Streaming deltas are broadcast-only, never journaled —
IsJournalEvent gates routing so the journal stays compact and the
TUI still sees live text.
12/12 tests: 9 unit (text-only turn, tool granted, tool requires-
prompt-approved/denied, provider error, ctx cancel, validate, gate
unknown-resolve, gate ctx-cancel-cleanup) plus 3 integration tests
that boot the real server, spawn the driver, exercise the full turn
via real HTTP — including the synchronous permission round-trip.
M2 acceptance bar from .context/v8_plan.md: dogfood `curl /event/global`
shows live stream during haft agent. The integration tests assert
exactly that path (POST /session → SSE session.created → POST /turn →
SSE turn.started/text deltas/tool_use/turn.completed) plus the
journal reproduces the same state via Store.Load.
Known M2 backfill items deliberately deferred to next slice:
- Materialized assistant TextPart/ReasoningPart events (currently
deltas are wire-only and journaled text comes from in-memory
accumulation; a PartTextAppendedEvent variant in Layer P would
let the journal carry materialized text too).
- SubAgent runner inside the driver (the Layer-P events and
agentcore.SubAgentLink type exist; driver does not yet call
AttachSubAgent).
- GET /auth/status endpoint (placeholder in Layer P spec; not yet
wired through the dispatcher).
- Real provider/tool adapters (driver runs against fake/scripted
providers in tests; M2c wires internal/provider and internal/tools).
…baseline/apply (#77) `haft_decision(measure|baseline|apply)` only read args["decision_ref"] and silently fell through to a most-recent-decision auto-detect when the key didn't match. LLM clients naturally generalise from `haft_refresh` (where artifact_ref is universal) and from the schema docs (which described artifact_ref as the evidence target with no mention that decision_ref is the magic word for these other actions). Reported repros show evidence/baselines landing on the wrong DecisionRecord — the kind of silent corruption the artifact graph cannot detect after the fact. Fix applies to both code paths (MCP serve handler and the registry- gated tool runner): - measure, baseline now accept either `decision_ref` or `artifact_ref`; if neither resolves, they return guidance text asking for the explicit ref instead of auto-detecting. The silent ListByKind(KindDecisionRecord, 1) fallback is gone for these two actions — corrupting authoritative state is worse than refusing to proceed. - apply also accepts both keys but keeps the auto-detect fallback, since apply is a read-only "generate brief" path with no persistent side effect. - Schema descriptions for decision_ref / artifact_ref updated to list every action that accepts each key. Regression tests pin the bug shape: two DecisionRecords exist, the caller asks to baseline/measure the older one via artifact_ref, and the test fails if the implementation reaches for the newer one. A third test guards the new error path when no ref is supplied. Audit: no production caller relies on the removed auto-detect. Documentation everywhere already shows the explicit decision_ref. Auto-baseline after decide uses the freshly-created DecisionRecord ID directly and never went through the buggy path. FPF guardrails bind decision_ref from the cycle when the registry is active, so guarded flows are unaffected. Out of scope: the side observation about a stale `needs_onboard` readiness signal in haft_decision responses — filing separately as reported. Fixes #77
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a74d0857bf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The fix landed in a74d085 but Unreleased was empty — adding the entry now so the upcoming release notes don't miss it.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 14a60ab89a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Reject concurrent turn.submit instead of cancel-and-replace. The old
goroutine still owns the journal until it appends a terminal event;
starting a new Drive immediately appended turn.started while replay
still saw a live turn, producing an unreplayable journal.
- Register the permission gate entry before publishing
permission.requested. A fast operator could POST /permission/{id}
before Open ran, causing ErrUnknownPermission on a valid request.
Adds PermissionGate.Discard for publish-failure rollback.
- Carry project_id in SessionCreatedEvent so replay reconstructs the
same Session value that Create returned. Updated dispatchers, store,
and the test header to populate it.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b7c2ec9e5d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…for meta writes P1 — bindDecisionRef only inspected `decision_ref`, so an `artifact_ref` pointing at a foreign DecisionRecord was silently overwritten with the active-cycle ref, and measure/baseline recorded against the wrong decision. Treat artifact_ref as an alias before deciding whether to inject the active ref — foreign refs now hit the guardrail. P2 — writeMeta used a fixed `meta.json.tmp` filename, racing under the Store's documented concurrent-safe contract. Switch to os.CreateTemp so each writer gets a unique temp file, then rename. Regression test pins the P1 bug shape: artifact_ref pointing outside the active cycle must produce the alignment guardrail, not a silent swap.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea68b72f09
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…with-canceled-ctx as canceled P1: Drive used to publish turn.started before validating the in-memory transition. After a server restart that replays to a Running turn while the dispatcher's running map is empty, a new submit would journal a second turn.started and only then StartTurn would reject it — leaving an unreplayable journal. Validate first, then publish. P2: When the provider closes its event channel as a documented reaction to ctx cancellation, both ctx.Done() and the closed-channel receive can be ready, and select picks at random. If the !open branch won we would journal turn.completed for a turn that was actually canceled. Check ctx.Err() before completing on channel close. Regression tests cover both paths.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 20fb979813
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…share hub Address Codex P1/P2 findings on PR #80. * Driver: treat any non-"approved" permission decision as denial. The prior check only matched the exact "denied" value, so a client posting decision: "approve" (typo) or any malformed string would silently fall through to Tools.Run. * PermissionGate.Resolve: reject decisions other than approved/denied at the boundary as defense-in-depth. * Dispatcher: pre-allocate the turn ID before launching Drive and record it alongside the cancel func; turn.cancel must now match the active turn or be rejected. Add Driver.DriveTurn(turnID) for the pre-allocated case; Drive keeps its old signature for tests. * agentserver.NewServer: accept a Hub explicitly so callers cannot accidentally subscribe SSE clients to a different hub than the one the dispatcher publishes to.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d79be837d4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Three Codex P1/P2 findings: - agentstore.Append now Load+Apply-validates before writing. A rejected transition (e.g. model.switched during a running turn) no longer poisons the journal with an unreplayable event. - agentstore.Append refuses missing sessions via Load → ErrSessionNotFound rather than letting openJournal materialize an empty events.jsonl under a fresh directory. - agentdriver allocates one part_id per accumulating text/reasoning buffer and reuses it for every delta AND the materialized part, so protocol-compliant clients can reconstruct one logical part by id.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c6edf0a61c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…easoning Codex P1 findings on PR #80: - handleToolCall (and the surrounding flushText/flushReasoning) used to return their error directly from streamTurn. When an error landed after turn.started — e.g. context.Canceled while gatePermission was waiting on the PermissionGate — the journal kept the turn in Running state forever, blocking the next submit until manual repair. Route every post-start error through failTurn so every started turn gets a terminal event. - Assistant text and reasoning parts only existed in the in-memory Session value Drive returned; the dispatcher reloads from Store.Load on the next turn, so the prior assistant response was lost. Add PartTextCompletedEvent and PartReasoningCompletedEvent to Layer P, journal+apply them in agentstore, and have the driver publish them before AppendPart. Replay now reconstructs the same Session value Drive returned.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d2867a70ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
[]byte fields serialize to base64 in encoding/json, so wire consumers (SSE clients, TS SDK) received an opaque blob in place of the tool arguments. Switch the wire-format Args fields on PartToolUseStartedEvent, PermissionRequestedEvent and the toolUseBody PartPayload to json.RawMessage so the original JSON shape passes through unchanged.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c63eb0aec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
agentstore: hold a per-session mutex across Load/Apply/Append so two concurrent mutations cannot validate against the same snapshot and then write events that are individually valid but jointly unreplayable (e.g. model.switched admitted on top of a turn that another goroutine just started). agentserver: drop the channel-close path on Hub Subscribe cancel and signal shutdown via a per-subscriber done channel instead. Publish selects on done so a snapshot of a since-cancelled subscriber no longer panics on send to a closed channel.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3dfa5459a3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
After a process restart, Store.Load can return a session whose journal still has a Running turn from the previous instance. The in-memory running map is empty, so handleTurnSubmit was accepting the submit and spawning a goroutine that then failed StartTurn with ErrTurnAlreadyRunning and dropped the error — leaving the client with an accepted turn_id that never produced events. Check session.HasLiveTurn() right after Load and reject with ErrTurnAlreadyRunning before allocating a new turn id.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86fa47dc39
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ProviderError now flushes pending text/reasoning before turn.failed to match the provider.go contract — replay was dropping partial output operators had already seen over SSE. handleModelSet now rejects with ErrTurnAlreadyRunning when a turn is in flight (in-memory map and post-restart HasLiveTurn check). Previously a model_switched event could land between handleTurnSubmit registering d.running and DriveTurn journaling turn.started, recording the switch ahead of a turn the provider was still driving with the old model.
Double-submitting a turn while one is already running is a client-visible state conflict, not a server failure. writeResult was returning 500 for the wrapped ErrTurnAlreadyRunning from the dispatcher; map it to 409 so clients can distinguish retryable turn conflicts from server errors.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 024de7fb8f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
handleModelSet sampled d.running and released d.mu before appending model_switched, while handleTurnSubmit loaded the session before locking. A model.set that won the running-empty check could append between submit's Load and its running registration — the goroutine then drove the provider with the old session.Model while the journal already showed model.switched preceding turn.started. Move submit's Store.Load and HasLiveTurn check inside d.mu, and hold d.mu across handleModelSet's Store.Append. The two paths now serialize on the same lock: either the switch lands and the next submit reads the new model, or the submit registers first and the switch is rejected.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b80cf60db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
handleSessionResume returned agentcore.Session directly through
writeJSON, but agentcore.Part is a sealed interface whose concrete
structs keep ID/CreatedAt in an unexported partBase. encoding/json
produced entries like {"Text":"..."} with no kind discriminator, so
clients resuming after any completed turn could not reconstruct part
IDs or kinds. Route the response through agentproto.EncodeSession,
which wraps every Part in PartPayload — same envelope shape the event
stream already uses.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 58e6f4bac8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
PermissionGate.Resolve returns ErrInvalidDecision for malformed payloads and ErrUnknownPermission for stale IDs; writeResult was bucketing both into 500. Lift the sentinels to agentcore (ErrPermissionDecision, reusing existing ErrPermissionNotFound), wrap the agentdriver-local ones via %w, and map them to 400 / 404 in writeResult.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17bd63cda3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
A journal Append on a session larger than a single page-write can be observed mid-write by a concurrent reader using a separate file handle, so the bufio scanner in ReadAll can decode a truncated line and fail. Switch sessLock to RWMutex; Load takes RLock, Append keeps the write lock. Append's internal replay uses a non-locking helper to avoid re-entering the lock.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5bc6248110
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Driver.streamTurn now checks ctx.Err() before completing a turn on ProviderTurnDone, mirroring the closed-channel guard. Both branches can be ready in the same select cycle (provider had queued done while handleToolCall was running and ctx was canceled); without the check the race could journal turn.completed for a canceled turn. Dispatcher.handleTurnCancel now returns ErrTurnNotRunning / ErrTurnMismatch sentinels (defined in agentserver), and writeResult maps them to 404 / 409 respectively. Stale cancels and mistyped turn IDs no longer surface as 500.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e2b9cf8530
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…e-external feat(commission): CLI complete-external for externally-run WorkCommissions (#78)
…#77 fix 7.1.0 is a SemVer minor bump on top of 7.0.0: - feat: haft commission complete-external CLI (#79) - feat: v8 agent stack foundation M1/M2 + review hardening pass - fix: artifact_ref alias on haft_decision measure/baseline/apply (#77) - fix: install.sh CLI archive selection - chore: FPF corpus refresh to 34b4d63; drop darwin-amd64 from CLI matrix The earlier "Unreleased — #77 only" stanza is folded into 7.1.0; the v8 foundation and complete-external entries are added so the release notes don't lose those contributions silently.
…h errors
Two correctness fixes in the agent driver:
- gatePermission: when Wait returns ctx.Err(), emit a permission.resolved
(denied) event before propagating the error. Without it, the journal
has permission.requested with no resolution while the in-memory gate
entry is already gone — replay surfaces a stuck Pending permission
that no POST /permission/{id} can resolve.
- failTurn: capture and wrap Sink.Publish errors on the turn.failed
event instead of silently dropping them. A failed journal write left
the durable record as Running while the local session moved to Failed,
silently blocking every future submit via HasLiveTurn until manual
repair.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a8d72b8903
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
FPF submodule moves from 34b4d63 → ee40821 ("formatting for GitHub",
2026-05-12). The seven upstream commits in between are not cosmetic:
9cb163b Causality and Realizability (+1524 net, C.27 + C.28)
f73766d functional descriptions + A.15.4 (+684 net, new pattern)
1f7c9e5 authority-looking reliance tuning (+552 net)
1f21daa counterfactuality terminology (cleanup)
136be3b A.6.P terminology cleanup (cleanup)
2b3e0f4 admissible action in problematic situations (Readme)
ee40821 formatting for GitHub (style)
Substantive content: a new architectural region for causal evidence
(C.27 Causal-use calculus + C.28 CounterfactualSamplingRealizabilityProfile
with controlled CausalEvidenceSupportBasis vocabulary), and a new A.15
cluster member A.15.4 "Work-Relevant Source Restoration" governing the
recover-source-before-reliance step when an encountered item is about
to support a work/reliance claim by appearance.
Index rebuild via task fpf-index --force:
indexed_chunks 4972 → 5062 (+90; 4996 spec + 66 patterns)
fpf_commit ee40821c (matches submodule HEAD)
Skill floor update — two minimal additions, not a rewrite:
internal/fpf/patterns/cross-cutting.md
+ X-SOURCE-RESTORATION pattern. The detection rule
"Object ≠ Description ≠ Carrier" was already in the floor; the
operational rule ("before reliance, recover the project source
that makes the action admissible") was not. Names dashboards,
generated explanations, credential views, projection outputs
(/h-view brief|rationale|audit), copied approvals, provenance
labels, schema/API text, composed source chains explicitly.
internal/cli/skill/h-reason/SKILL.md
+ DEC-06 Predictions sentence: predictions are causal claims;
check realizability under physical / ethical / operational
constraints before committing them as acceptance gates. Points
to C.27 / C.28 on demand — full calculus stays out of the
L1 floor to avoid ceremony creep.
+ Cross-cutting block adds X-SOURCE-RESTORATION reference.
Terminology cleanups in A.6.P and counterfactuality propagate through
fpf search automatically because the skill cites pattern IDs (CHR-10
etc) and never the surface text of the spec section.
ModelChoice and agentstore.SessionMeta were serialized with their Go field names — REST clients sending `credential_key` could not populate ModelChoice.CredentialKey (case-insensitive matching does not bridge snake_case to camelCase), and `GET /session` responses leaked `ProjectID`/`CreatedAt`/`EventCount` while the rest of the protocol speaks snake_case. Add explicit json tags so the wire shape matches the rest of agentproto.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5001d600a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…-write fix Two gaps after the initial 7.1.0 cut: - a8d72b8 and d5001d6 landed AFTER the changelog was cut. The v8 hardening bullet now mentions journaled-canceled-permissions + failTurn publish errors and the snake_case JSON wire tagging on ModelChoice + agentstore.SessionMeta explicitly. Reorganized the bullet into themed groups (concurrency / HTTP surface / wire protocol / turn lifecycle / permission gate) so the 17-item list is readable instead of a wall of commas. - ea68b72 bundled two fixes; only the bindDecisionRef alias half was captured (as a follow-on note on the #77 bullet). The other half — writeMeta race on the fixed meta.json.tmp filename in internal/artifact/store — is a legitimate concurrent-safe Store contract violation that affects v7 paths too, not just v8. Surfaced as its own Fixed entry.
… basis (CC-B3.9) Add ScoreEvidenceWithCausalBasis wrapping ScoreTypedEvidence with the C.28 causal-support-basis cap per CC-B3.9: when the basis is simulation-only OR the linked claim's realizability verdict is "nonrealizable", the resulting R is capped at 0.5 regardless of verdict, evidence type, or CL. "unknown" realizability does NOT cap — bounded use under C.28 may still be admissible. Expired evidence remains at 0.1; the cap floors at 0.5 and does not raise weak scores. Empty basis/realizability falls through to ScoreTypedEvidence unchanged so legacy callers preserve pre-7.1.0 semantics bit-for-bit. Add CausalSupportBasis and forward-compat ClaimRealizability to reff.Evidence; ComputeClaimAssurance now scores via ScoreEvidenceWithCausalBasis so the assurance engine honors the cap when callers populate the new fields. Per-claim realizability resolution from ClaimRefs is marked TODO(post-7.1) and degrades gracefully — empty realizability behaves identically to the legacy path. 5 new reff tests pin the contract: cap fires on simulation_only and nonrealizable, no-op on unknown, parity with ScoreTypedEvidence for legacy empty-basis inputs, expired evidence not raised by cap.
…Verdict types Embed the new FPF ee40821 C.28 controlled vocabulary structurally on the artifact graph, not only as corpus text. CausalEvidenceSupportBasis is a typed string with 5 canonical values: observationalAssociationSupportBasis, interventionalActionSupportBasis, realizedCounterfactualSampleSupportBasis, identifiedCounterfactualEstimateSupportBasis, and simulationOnlyCounterfactualOutputBasis. ParseCausalSupportBasis canonicalizes operator-friendly aliases (observational, simulation_only, etc.) on read — long FPF names also accepted. Unknown values are rejected at the artifact boundary, not silently dropped. RealizabilityVerdict carries the C.28 CounterfactualSamplingRealizability profile verdict at three values: realizable, nonrealizable, unknown. EvidenceItem gains causal_support_basis (omitempty); AttachEvidence parses and validates the value at ingest. DecisionClaim and DecisionPrediction gain realizability (omitempty), plumbed through newDecisionClaims, decisionClaimsFromPredictions, normalizeDecisionClaims, and decisionPredictionsFromClaims mirroring the existing verify_after pattern buck-for-buck so the round-trip is invariant under any reshape sequence. PredictionInput on the wire-input layer gains realizability too. WLNK scoreEvidence in decision.go now routes through reff.ScoreEvidenceWithCausalBasis so the C.28 cap (CC-B3.9) fires in the artifact-summary path in lockstep with the assurance engine. Realizability plumbing per-claim through WLNK is TODO(post-7.1); cap fires on CausalSupportBasis alone here. Legacy decisions/evidence without the new fields round-trip unchanged and score identically to pre-7.1.0 (verified by reff parity test landed in the prior commit). 6 new artifact tests pin the contract: alias canonicalization for both basis and realizability, realizability preservation through every normalize step, EvidenceItem JSON round-trip honors omitempty.
… /h-view projections
MCP and CLI transports both expose the new C.28 fields symmetrically:
haft_decision(action="evidence") accepts causal_support_basis (with the
short alias list and the CC-B3.9 cap callout in the description), and
predictions[] on the decide action accepts realizability with the three
canonical verdicts. internal/tools/haft.go, internal/cli/serve.go, and
internal/fpf/server.go all carry the schema entries so the parity test
stays in lockstep.
Soft warning lands on both transports: if EvidenceItem.Content reads like
a causal-use claim (causal, intervention, counterfactual, uplift,
treatment effect, ...) but CausalSupportBasis is empty, append a C.28 /
CC-B3.9 advisory to the result text. Warning, not reject — legacy ingest
continues unchanged; the surface signals to the LLM caller that an
undeclared basis cannot raise R for a causal-ladder climb.
A.15.4 Work-Relevant Source Restoration footer renders on every
/h-view projection (engineer, manager, audit, compare, delegated-agent,
change-rationale). Section "## Carrier — Not Source of Truth (A.15.4)"
lists every underlying artifact ref the projection was built from
(problems, portfolios, decisions, deterministic ordering), plus the
haft_query(action="get", ref="<id>") recovery path and the on-disk
.haft/{decisions|problems|solutions|evidence}/<id>.md source locations.
Empty graphs still render the section with an informational fallback
line so the carrier semantics stay visible even with no sources. Lives
in internal/present (flow layer) — Core boundary unaffected.
7 new tests: 1 in internal/fpf/server_test.go asserts the MCP-advertised
schema exposes causal_support_basis with the C.28 description and
predictions[].realizability; 6 in internal/present/projection_carrier_test.go
pin the footer across all four primary views plus the no-sources
fallback and the recovery-path citation.
CHANGELOG [7.1.0] gains two new Added entries documenting the C.28
typed binding (with the cap semantics and TODO note) and the A.15.4
carrier footer.
…leted before asserting Two related races surfaced under -race on the v8 driver integration test TestIntegration_PermissionRoundTrip: 1. fakeTools.calls was an unsynchronized slice appended from the dispatcher goroutine (Driver.Drive → ToolDispatcher.Run) while the test goroutine read len(tools.calls) in a busy-wait loop. Guard the slice with a mutex and expose a Calls() snapshot accessor; the test reads through Calls() so the read is properly synchronized. 2. Even with calls synchronized, the integration test could observe tools.Calls() != 0 and exit while the dispatcher was still mid-flight journaling turn.completed to the store. The deferred TempDir cleanup then races with the still-running goroutine writing into the store directory and surfaces as a cleanup failure. Replace the len(tools.calls) busy-wait with an SSE drain that consumes stream lines until "turn.completed" is observed. The dispatcher always journals turn.completed after the tool result, so seeing it on the wire is a synchronization point — by then the goroutine has finished writing to the store and the deferred cleanup is safe.
….Close to fix -race CI
CI's race-test job kept failing on TestIntegration_TurnCancel and
TestDispatcher_ModelSet_RejectsDuringRunningTurn with the same root
cause: srv.Shutdown waits for HTTP handlers but NOT for goroutines
spawned inside them — handleTurnSubmit fires a detached DriveTurn
goroutine, and the test exits while that goroutine is still journaling
events. Result on Linux runners under -race:
WARNING: DATA RACE
Write at ... by Store.Close → Journal.Close (j.w = nil)
Previous read at ... by DriveTurn → CombinedSink.Publish →
Store.Append → Journal.Append (j.w.Write)
Cancel + drain pattern, test-only, no production v8 change:
- drainRunningTurn(t, dispatcher, sessionID) issues TurnCancel and polls
the dispatcher until it returns agentserver.ErrTurnNotRunning. That
error is the synchronization point: the dispatcher's deferred
delete(d.running, ...) clears the map AFTER DriveTurn returns and the
goroutine finishes, so seeing ErrTurnNotRunning guarantees the journal
Append goroutine has fully unwound.
- bootIntegrationServer.cleanup now drains every session listed by
store.List("") before calling store.Close. Covers any integration test
routed through the helper (currently TurnRoundTrip, TurnCancel,
PermissionRoundTrip) without per-test plumbing.
- TestDispatcher_ModelSet_RejectsDuringRunningTurn uses its own
Dispatcher (not the boot helper), so it gets an explicit defer
drainRunningTurn call before its store.Close defer.
10x -race runs on the agentdriver package now pass clean. Full repo
under -race is green.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d45a6e7000
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Two Codex P2 findings on PR #80: - agentstore.Store.Append: writeMeta running after a successful journal commit could surface a "publish failed" error to the driver for an event already durable on disk. In the turn.started path that made the driver skip emitting turn.failed for a turn the journal would replay as Running, leaving HasLiveTurn permanently blocked. The journal is the authoritative record; meta.json is a denormalised cache for List. Treat the meta refresh as best-effort post-commit (Create and Append). - agentdriver/agentcore: a provider tool call with non-nil empty []byte for Args would marshal as invalid JSON via json.RawMessage, failing PartToolUseStartedEvent encoding and aborting the turn before the tool runs. Normalise empty/nil inputs to a nil Args in NewToolUsePart so RawMessage encodes as JSON null; have the driver emit the part's normalised Args on the wire. Tests: - TestStore_AppendSucceedsWhenMetaWriteFails locks the session dir read-only so writeMeta cannot create a tempfile, while the open journal FD keeps accepting Appends. - TestDriver_ToolCall_EmptyArgsEncodes drives an empty-args tool call and round-trips the resulting event through EncodeEvent.
… stability Three commits landed after the 7.1.0 changelog cut (e5a5b25) and were not yet documented: - 5d4954e fix(review): journal authoritative + empty tool args - d45a6e7 test(agentdriver): drain SSE before cleanup race - e3cad19 test(agentdriver): sync fakeTools.calls Extend the v8 hardening Added bullet with the two P2 fixes from 5d4954e (journal-authoritative writeMeta, empty tool args -> JSON null), and add a new Fixed bullet for the test-side -race stability work that was blocking CI on PR #80.
Release 7.1.0 — 2026-05-13
Maintenance release. SemVer minor bump on top of 7.0.0: adds
feat:items (complete-external CLI + v8 agent stack foundation) plus afix:for the silently-misroutedhaft_decision(measure|baseline|apply)defect that corrupted the artifact graph.Highlights
Features
haft commission complete-external <wc-id>(#78, #79) — operator CLI to close a WorkCommission lifecycle after an external runner produces local runtime evidence. Auto-recordsstart_after_preflight, then terminalcomplete_or_block. Supports--verdict completed|pass|failed|blockedand inline/file evidence payloads. Refuses queued/terminal/non-running states explicitly; does not apply/merge workspace diffs. Contributed by @karabelaselias.agentcore(pure algebraic Session/Turn/Part types, sealed sum variants, opaque IDs, transitions return new Session — mutation, partial states, double-completion inexpressible by construction),agentproto(wire format for the upcoming TS TUI: 18 AgentEvent variants + 9 RPC verbs, tagged-envelope JSON withkinddiscriminator, unknown kinds rejected at boundary),agentstore(append-only per-session JSONL journal + meta.json index; 1000-event replay round-trip verified),agentserver(HTTP+SSE on 127.0.0.1, single/event/globalfan-out, pluggable Dispatcher interface),agentdriver(turn loop, synchronous PermissionGate primitive, ctx-cancellation cleanup, Provider/ToolDispatcher/EventSink interfaces for production wiring later). Coexists with v7 paths; zero changes to MCP server, harness, spec onboarding, init, check, verify.fix(review)follow-ups on the M1/M2 foundation before release: cancel-aware turn completion, store.Load serialized vs concurrent Append, typed HTTP status mapping (400/404/409), wire-safe serialization on resume, model.set serialized against turn.submit, delta flush on provider error with mid-turn model.set rejection, replayed-running-turn synchronous rejection, per-session Append serialization with safe Hub cancel, raw-JSON tool args, journal append validation with streamed-part-id dedup, tightened permission validation with cancel matched to turn id, turn.started ordering after StartTurn. Foundation merges with concurrent/adversarial conditions verified, not just happy-path.Fixes
haft_decision(measure|baseline|apply)silently misrouted onartifact_ref— LLM clients naturally passedartifact_ref(universal inhaft_refresh, and the only documented key onhaft_decision(evidence)); both handlers only readdecision_refand fell through toListByKind(KindDecisionRecord, 1)auto-detect, corrupting evidence/baselines on the wrong DecisionRecord. Reporter saw stale-scan count drop 42 → 27 in one round after manually switching todecision_ref. Both code paths (internal/cli/serve.goMCP +internal/tools/haft.gotools) fixed:measure/baselineaccept either key and refuse to proceed without one (silent auto-detect on writes is gone — corrupting authoritative state is worse than refusing to act).applyaccepts either, keeps auto-detect (read-only). FPF-guardrailbindDecisionRefalso honors the alias. Schema descriptions updated to list every action per key. Three regression tests pin the bug shape.install.shpicked the desktop tarball over CLI archive — fixed.Chore
34b4d63("temporal claim adequacy"). Submodule + embedded index moved atomically viatask fpf-refresh. Indexed chunks 4961 → 4972.darwin-amd64(Intel Mac) from the CLI release matrix.Versioning
7.1.0 by SemVer rules — presence of
feat:commits (complete-external + v8 foundation) requires a minor bump on top of 7.0.0. No breaking changes; v6 artifacts and v7 production surfaces unchanged.Test plan
go build ./...cleango test ./...— pre-push hook ran the full suite (agentcore, agentdriver, agentproto, agentserver, agentstore, artifact, cli, fpf, tools, workcommission, project, specflow, scopeauth, implementationplan, autonomyenvelope and others — all green)golangci-lint0 issuesgofmt,vet,mod-tidycleanhaft_decision(baseline, artifact_ref=<id>)lands on<id>(not most-recent); empty-ref call returns guidance, not silent auto-detecthaft commission complete-externalagainst an externally-run WC closes lifecycle and unblocksdepends_onchaincurl /event/globalagainsthaft agent(v8 path) for the M2 acceptance barv7.1.0after merge; release pipeline rebuilds embedded FPF index from locked submodule SHACloses