Skip to content

Release 7.1.0: complete-external CLI, v8 foundation, #77 fix#80

Merged
m0n0x41d merged 35 commits into
mainfrom
dev
May 13, 2026
Merged

Release 7.1.0: complete-external CLI, v8 foundation, #77 fix#80
m0n0x41d merged 35 commits into
mainfrom
dev

Conversation

@m0n0x41d
Copy link
Copy Markdown
Owner

@m0n0x41d m0n0x41d commented May 13, 2026

Release 7.1.0 — 2026-05-13

Maintenance release. SemVer minor bump on top of 7.0.0: adds feat: items (complete-external CLI + v8 agent stack foundation) plus a fix: for the silently-misrouted haft_decision(measure|baseline|apply) defect that corrupted the artifact graph.

Highlights

Features

  • haft commission complete-external <wc-id> (#78, #79) — operator CLI to close a WorkCommission lifecycle after an external runner produces local runtime evidence. Auto-records start_after_preflight, then terminal complete_or_block. Supports --verdict completed|pass|failed|blocked and inline/file evidence payloads. Refuses queued/terminal/non-running states explicitly; does not apply/merge workspace diffs. Contributed by @karabelaselias.
  • v8 agent stack foundation M1/M2 — new internal packages agentcore (pure algebraic Session/Turn/Part types, sealed sum variants, opaque IDs, transitions return new Session — mutation, partial states, double-completion inexpressible by construction), agentproto (wire format for the upcoming TS TUI: 18 AgentEvent variants + 9 RPC verbs, tagged-envelope JSON with kind discriminator, unknown kinds rejected at boundary), agentstore (append-only per-session JSONL journal + meta.json index; 1000-event replay round-trip verified), agentserver (HTTP+SSE on 127.0.0.1, single /event/global fan-out, pluggable Dispatcher interface), agentdriver (turn loop, synchronous PermissionGate primitive, ctx-cancellation cleanup, Provider/ToolDispatcher/EventSink interfaces for production wiring later). Coexists with v7 paths; zero changes to MCP server, harness, spec onboarding, init, check, verify.
  • v8 review hardening pass — multiple fix(review) follow-ups on the M1/M2 foundation before release: cancel-aware turn completion, store.Load serialized vs concurrent Append, typed HTTP status mapping (400/404/409), wire-safe serialization on resume, model.set serialized against turn.submit, delta flush on provider error with mid-turn model.set rejection, replayed-running-turn synchronous rejection, per-session Append serialization with safe Hub cancel, raw-JSON tool args, journal append validation with streamed-part-id dedup, tightened permission validation with cancel matched to turn id, turn.started ordering after StartTurn. Foundation merges with concurrent/adversarial conditions verified, not just happy-path.

Fixes

  • haft_decision actions measure and baseline silently fall through to most-recent decision when artifact_ref is passed instead of decision_ref #77haft_decision(measure|baseline|apply) silently misrouted on artifact_ref — LLM clients naturally passed artifact_ref (universal in haft_refresh, and the only documented key on haft_decision(evidence)); both handlers only read decision_ref and fell through to ListByKind(KindDecisionRecord, 1) auto-detect, corrupting evidence/baselines on the wrong DecisionRecord. Reporter saw stale-scan count drop 42 → 27 in one round after manually switching to decision_ref. Both code paths (internal/cli/serve.go MCP + internal/tools/haft.go tools) fixed: measure/baseline accept either key and refuse to proceed without one (silent auto-detect on writes is gone — corrupting authoritative state is worse than refusing to act). apply accepts either, keeps auto-detect (read-only). FPF-guardrail bindDecisionRef also honors the alias. Schema descriptions updated to list every action per key. Three regression tests pin the bug shape.
  • install.sh picked the desktop tarball over CLI archive — fixed.

Chore

  • FPF corpus refresh to 34b4d63 ("temporal claim adequacy"). Submodule + embedded index moved atomically via task fpf-refresh. Indexed chunks 4961 → 4972.
  • Drop darwin-amd64 (Intel Mac) from the CLI release matrix.

Versioning

7.1.0 by SemVer rules — presence of feat: commits (complete-external + v8 foundation) requires a minor bump on top of 7.0.0. No breaking changes; v6 artifacts and v7 production surfaces unchanged.

Test plan

  • go build ./... clean
  • go test ./... — pre-push hook ran the full suite (agentcore, agentdriver, agentproto, agentserver, agentstore, artifact, cli, fpf, tools, workcommission, project, specflow, scopeauth, implementationplan, autonomyenvelope and others — all green)
  • golangci-lint 0 issues
  • gofmt, vet, mod-tidy clean
  • CI green on PR
  • Spot-check on a real project: haft_decision(baseline, artifact_ref=<id>) lands on <id> (not most-recent); empty-ref call returns guidance, not silent auto-detect
  • Spot-check: haft commission complete-external against an externally-run WC closes lifecycle and unblocks depends_on chain
  • Smoke curl /event/global against haft agent (v8 path) for the M2 acceptance bar
  • Tag v7.1.0 after merge; release pipeline rebuilds embedded FPF index from locked submodule SHA

Closes

m0n0x41d and others added 5 commits May 4, 2026 09:58
Submodule data/FPF bumped one commit upstream:
  b18acde → 34b4d63   "temporal claim adequacy"

Embedded index regenerated against the new HEAD via
`task fpf-refresh` (which chains fpf-pull → fpf-index):
  indexed_chunks: 4961 → 4972 (+11; 4907 spec + 65 patterns)
  fpf_commit:     b18acde → 34b4d63   (matches submodule HEAD)

Index and submodule pointer must move together — the release
workflow rebuilds the index from the locked submodule SHA on
tag, so a drift between these two files would surface as either
stale search results in dev or a mismatch on next release build.

No code change. Search and lookup paths (`haft fpf search`,
`haft_query(action="fpf")`, MCP plugin reasoning hints) pick
up the upstream wording change automatically.
… journal

Lays the bottom three layers of the v8 agent stack as planned in
.context/v8_plan.md §2. Coexists with legacy internal/agent and
internal/agentloop — zero changes to v7 production code paths
(MCP server, harness, spec onboarding, init, check, verify). M2
foundation (HTTP+SSE server + turn driver) lands separately.

internal/agentcore — Layer G2: pure algebraic types
  Session, Turn, Part (sum: text/reasoning/tool_use/tool_result/
  file_ref/step_boundary), Permission, SubAgentLink, ModelChoice.
  All transitions return new Session — no field mutation, no shared
  slice/map state. Sealed sum types via unexported markers; opaque
  typed IDs (SessionID, TurnID, PartID, PermissionID) prevent cross-
  domain confusion at compile time.
  Inexpressible by construction: mutating an existing Turn or Part,
  recording a Part without a Turn, completing a Turn that's already
  terminal, attaching a SubAgent without naming the parent Turn,
  resolving a Permission that was never requested.
  19/19 tests, including tampering tests that prove withPart and
  withPermissions copy underlying slices/maps even when capacity
  would permit aliasing.

internal/agentproto — Layer P: wire format shared with the TS TUI
  18 AgentEvent variants (session.*, turn.*, part.*.delta, part.tool_use.*,
  permission.*, subagent.*, model.switched, auth.expired) and 9
  RPCCommand verbs (session.create/list/resume/rename, turn.submit/cancel,
  permission.respond, model.set, subagent.attach). PartPayload tagged-
  envelope encodes a materialized Part for transport; deltas use
  per-variant streaming events instead.
  Tagged-envelope JSON with kind discriminator. timeStamp wrapper
  pins RFC3339Nano UTC so the TS SDK's new Date(...) round-trips
  losslessly. Decoders refuse unknown kinds with typed errors —
  forward-compat surface failures at the boundary, not silently.
  8/8 round-trip + unknown-kind rejection tests.

internal/agentstore — Layer G3: append-only journal + replay
  Per-session JSONL journal at <store_root>/<id>/events.jsonl with
  meta.json index for fast List() without replay. Pure Apply/Replay
  reconstruct an agentcore.Session from a journal byte-for-byte.
  Streaming deltas are wire-only (IsJournalEvent predicate gates
  routing); only state-mutating events get journaled. Compaction and
  delta-coalescing are M2/G4 concerns.
  15/15 tests including the M1 acceptance bar: TestStore_Replay1000
  ViaDisk runs 1000 mixed events through both pure in-memory replay
  and disk round-trip and asserts the two reconstructed Sessions
  are reflect.DeepEqual.

Layer N talks only to N-1; skip-level access is forbidden by package
import graph. Side effects only at G0 (db/fs) and G3 (journal write);
G2 is pure.

Migration plan: legacy internal/agent.Session/Message stays
authoritative for the existing coordinator until M2 cuts over. Both
type families coexist; no auto-migration runs.
Builds on M1 (agentcore + agentproto + agentstore) to give the v8
stack a real over-the-wire surface and a real turn loop. Still no
changes to legacy v7 paths: MCP server (haft serve) is a separate
binary command, separate code path, with no shared HTTP listener
or transitive coupling. The TS TUI in M3 will speak this surface
verbatim.

internal/agentserver — Layer G5: HTTP + SSE
  /event/global is the single SSE channel that fans every published
  AgentEvent to every subscriber; the TUI client filters by session_id
  on its side. RPC verbs land as POST endpoints
  (/session, /session/:id/turn|cancel|rename|model, /permission/:id);
  reads land as GET (/session, /session/:id, /healthz). Server binds
  127.0.0.1:0 by default — the chosen port is returned to the parent
  process for env-var handoff to the TUI process.
  Pluggable Dispatcher interface keeps transport and engine
  independent. StoreDispatcher (in-package) handles session lifecycle
  CRUD without an LLM and is what test code uses; DriverDispatcher
  (in agentdriver) plugs in the real turn engine.
  Hub is a deliberately small unbounded fan-out — backpressure on a
  slow listener is treated as the correct behaviour, not a reason to
  silently drop events. The only consumer is the local TUI process,
  which drains as fast as the loopback can carry it.
  10/10 tests including a real HTTP+SSE round-trip that subscribes,
  triggers an event from another goroutine, and verifies the wire
  bytes decode back to the right typed event.

internal/agentdriver — Layer G4: turn loop + permission gate
  Driver.Drive(ctx, Session, userText) opens a turn, streams the
  Provider's events, dispatches tools through ToolDispatcher,
  synchronously gates permission-required tools through PermissionGate,
  and finishes with turn.completed or turn.failed. Pure orchestration:
  no global state, no implicit clock, no implicit ID source — IDGen
  and Now are injected so tests are deterministic.
  Provider, ToolDispatcher, EventSink interfaces decouple the driver
  from real LLM clients and tool implementations. Production wiring
  (M2c follow-up) adapts internal/provider and internal/tools to
  these contracts; legacy internal/agentloop coordinator stays alive
  unchanged in parallel.
  PermissionGate is a per-driver synchronisation primitive: the
  driver Open()s a permission, blocks on the chan, and the HTTP
  handler Resolve()s on the operator's POST. Context cancellation
  cleans up pending entries — no leaked channels.
  Dispatcher implements agentserver.Dispatcher with an in-flight
  per-session cancel map: a fresh turn cancels the previous one
  before starting (agentcore also rejects concurrent turns at append
  time, so this is belt-and-braces).
  CombinedSink wraps Store + Hub: state-mutating events go through
  agentstore.Append (journaled) and every event publishes to the Hub
  (broadcast). Streaming deltas are broadcast-only, never journaled —
  IsJournalEvent gates routing so the journal stays compact and the
  TUI still sees live text.
  12/12 tests: 9 unit (text-only turn, tool granted, tool requires-
  prompt-approved/denied, provider error, ctx cancel, validate, gate
  unknown-resolve, gate ctx-cancel-cleanup) plus 3 integration tests
  that boot the real server, spawn the driver, exercise the full turn
  via real HTTP — including the synchronous permission round-trip.

M2 acceptance bar from .context/v8_plan.md: dogfood `curl /event/global`
shows live stream during haft agent. The integration tests assert
exactly that path (POST /session → SSE session.created → POST /turn →
SSE turn.started/text deltas/tool_use/turn.completed) plus the
journal reproduces the same state via Store.Load.

Known M2 backfill items deliberately deferred to next slice:
  - Materialized assistant TextPart/ReasoningPart events (currently
    deltas are wire-only and journaled text comes from in-memory
    accumulation; a PartTextAppendedEvent variant in Layer P would
    let the journal carry materialized text too).
  - SubAgent runner inside the driver (the Layer-P events and
    agentcore.SubAgentLink type exist; driver does not yet call
    AttachSubAgent).
  - GET /auth/status endpoint (placeholder in Layer P spec; not yet
    wired through the dispatcher).
  - Real provider/tool adapters (driver runs against fake/scripted
    providers in tests; M2c wires internal/provider and internal/tools).
…baseline/apply (#77)

`haft_decision(measure|baseline|apply)` only read args["decision_ref"]
and silently fell through to a most-recent-decision auto-detect when
the key didn't match. LLM clients naturally generalise from
`haft_refresh` (where artifact_ref is universal) and from the schema
docs (which described artifact_ref as the evidence target with no
mention that decision_ref is the magic word for these other actions).
Reported repros show evidence/baselines landing on the wrong
DecisionRecord — the kind of silent corruption the artifact graph
cannot detect after the fact.

Fix applies to both code paths (MCP serve handler and the registry-
gated tool runner):

  - measure, baseline now accept either `decision_ref` or
    `artifact_ref`; if neither resolves, they return guidance text
    asking for the explicit ref instead of auto-detecting. The
    silent ListByKind(KindDecisionRecord, 1) fallback is gone for
    these two actions — corrupting authoritative state is worse
    than refusing to proceed.
  - apply also accepts both keys but keeps the auto-detect fallback,
    since apply is a read-only "generate brief" path with no
    persistent side effect.
  - Schema descriptions for decision_ref / artifact_ref updated to
    list every action that accepts each key.

Regression tests pin the bug shape: two DecisionRecords exist, the
caller asks to baseline/measure the older one via artifact_ref, and
the test fails if the implementation reaches for the newer one. A
third test guards the new error path when no ref is supplied.

Audit: no production caller relies on the removed auto-detect.
Documentation everywhere already shows the explicit decision_ref.
Auto-baseline after decide uses the freshly-created DecisionRecord
ID directly and never went through the buggy path. FPF guardrails
bind decision_ref from the cycle when the registry is active, so
guarded flows are unaffected.

Out of scope: the side observation about a stale `needs_onboard`
readiness signal in haft_decision responses — filing separately as
reported.

Fixes #77
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a74d0857bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/dispatcher.go Outdated
Comment thread internal/agentdriver/driver.go Outdated
Comment thread internal/agentstore/apply.go Outdated
The fix landed in a74d085 but Unreleased was empty — adding the
entry now so the upcoming release notes don't miss it.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14a60ab89a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/tools/haft.go
Comment thread internal/agentstore/meta.go Outdated
- Reject concurrent turn.submit instead of cancel-and-replace. The old
  goroutine still owns the journal until it appends a terminal event;
  starting a new Drive immediately appended turn.started while replay
  still saw a live turn, producing an unreplayable journal.
- Register the permission gate entry before publishing
  permission.requested. A fast operator could POST /permission/{id}
  before Open ran, causing ErrUnknownPermission on a valid request.
  Adds PermissionGate.Discard for publish-failure rollback.
- Carry project_id in SessionCreatedEvent so replay reconstructs the
  same Session value that Create returned. Updated dispatchers, store,
  and the test header to populate it.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7c2ec9e5d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/driver.go Outdated
Comment thread internal/agentdriver/driver.go
…for meta writes

P1 — bindDecisionRef only inspected `decision_ref`, so an `artifact_ref`
pointing at a foreign DecisionRecord was silently overwritten with the
active-cycle ref, and measure/baseline recorded against the wrong
decision. Treat artifact_ref as an alias before deciding whether to
inject the active ref — foreign refs now hit the guardrail.

P2 — writeMeta used a fixed `meta.json.tmp` filename, racing under the
Store's documented concurrent-safe contract. Switch to os.CreateTemp so
each writer gets a unique temp file, then rename.

Regression test pins the P1 bug shape: artifact_ref pointing outside
the active cycle must produce the alignment guardrail, not a silent
swap.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea68b72f09

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/driver.go Outdated
Comment thread internal/agentdriver/dispatcher.go Outdated
Comment thread internal/agentdriver/dispatcher.go
…with-canceled-ctx as canceled

P1: Drive used to publish turn.started before validating the in-memory
transition. After a server restart that replays to a Running turn while
the dispatcher's running map is empty, a new submit would journal a
second turn.started and only then StartTurn would reject it — leaving
an unreplayable journal. Validate first, then publish.

P2: When the provider closes its event channel as a documented reaction
to ctx cancellation, both ctx.Done() and the closed-channel receive can
be ready, and select picks at random. If the !open branch won we would
journal turn.completed for a turn that was actually canceled. Check
ctx.Err() before completing on channel close.

Regression tests cover both paths.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 20fb979813

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/dispatcher.go
Comment thread internal/agentstore/store.go
Comment thread internal/agentdriver/driver.go Outdated
…share hub

Address Codex P1/P2 findings on PR #80.

* Driver: treat any non-"approved" permission decision as denial. The
  prior check only matched the exact "denied" value, so a client posting
  decision: "approve" (typo) or any malformed string would silently
  fall through to Tools.Run.
* PermissionGate.Resolve: reject decisions other than approved/denied
  at the boundary as defense-in-depth.
* Dispatcher: pre-allocate the turn ID before launching Drive and
  record it alongside the cancel func; turn.cancel must now match the
  active turn or be rejected. Add Driver.DriveTurn(turnID) for the
  pre-allocated case; Drive keeps its old signature for tests.
* agentserver.NewServer: accept a Hub explicitly so callers cannot
  accidentally subscribe SSE clients to a different hub than the one
  the dispatcher publishes to.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d79be837d4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/driver.go Outdated
Comment thread internal/agentdriver/driver.go Outdated
Three Codex P1/P2 findings:

- agentstore.Append now Load+Apply-validates before writing. A rejected
  transition (e.g. model.switched during a running turn) no longer
  poisons the journal with an unreplayable event.
- agentstore.Append refuses missing sessions via Load → ErrSessionNotFound
  rather than letting openJournal materialize an empty events.jsonl
  under a fresh directory.
- agentdriver allocates one part_id per accumulating text/reasoning
  buffer and reuses it for every delta AND the materialized part, so
  protocol-compliant clients can reconstruct one logical part by id.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c6edf0a61c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentproto/event.go Outdated
…easoning

Codex P1 findings on PR #80:

- handleToolCall (and the surrounding flushText/flushReasoning) used to
  return their error directly from streamTurn. When an error landed
  after turn.started — e.g. context.Canceled while gatePermission was
  waiting on the PermissionGate — the journal kept the turn in Running
  state forever, blocking the next submit until manual repair. Route
  every post-start error through failTurn so every started turn gets a
  terminal event.

- Assistant text and reasoning parts only existed in the in-memory
  Session value Drive returned; the dispatcher reloads from Store.Load
  on the next turn, so the prior assistant response was lost. Add
  PartTextCompletedEvent and PartReasoningCompletedEvent to Layer P,
  journal+apply them in agentstore, and have the driver publish them
  before AppendPart. Replay now reconstructs the same Session value
  Drive returned.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2867a70ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentstore/store.go Outdated
Comment thread internal/agentserver/hub.go Outdated
[]byte fields serialize to base64 in encoding/json, so wire consumers
(SSE clients, TS SDK) received an opaque blob in place of the tool
arguments. Switch the wire-format Args fields on PartToolUseStartedEvent,
PermissionRequestedEvent and the toolUseBody PartPayload to
json.RawMessage so the original JSON shape passes through unchanged.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5c63eb0aec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/dispatcher.go
agentstore: hold a per-session mutex across Load/Apply/Append so two
concurrent mutations cannot validate against the same snapshot and then
write events that are individually valid but jointly unreplayable
(e.g. model.switched admitted on top of a turn that another goroutine
just started).

agentserver: drop the channel-close path on Hub Subscribe cancel and
signal shutdown via a per-subscriber done channel instead. Publish
selects on done so a snapshot of a since-cancelled subscriber no longer
panics on send to a closed channel.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3dfa5459a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/dispatcher.go
Comment thread internal/agentdriver/driver.go
After a process restart, Store.Load can return a session whose journal
still has a Running turn from the previous instance. The in-memory
running map is empty, so handleTurnSubmit was accepting the submit and
spawning a goroutine that then failed StartTurn with
ErrTurnAlreadyRunning and dropped the error — leaving the client with
an accepted turn_id that never produced events.

Check session.HasLiveTurn() right after Load and reject with
ErrTurnAlreadyRunning before allocating a new turn id.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86fa47dc39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentserver/server.go
m0n0x41d added 2 commits May 13, 2026 19:16
ProviderError now flushes pending text/reasoning before turn.failed to
match the provider.go contract — replay was dropping partial output
operators had already seen over SSE.

handleModelSet now rejects with ErrTurnAlreadyRunning when a turn is in
flight (in-memory map and post-restart HasLiveTurn check). Previously
a model_switched event could land between handleTurnSubmit registering
d.running and DriveTurn journaling turn.started, recording the switch
ahead of a turn the provider was still driving with the old model.
Double-submitting a turn while one is already running is a client-visible
state conflict, not a server failure. writeResult was returning 500 for
the wrapped ErrTurnAlreadyRunning from the dispatcher; map it to 409 so
clients can distinguish retryable turn conflicts from server errors.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 024de7fb8f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/dispatcher.go Outdated
handleModelSet sampled d.running and released d.mu before appending
model_switched, while handleTurnSubmit loaded the session before
locking. A model.set that won the running-empty check could append
between submit's Load and its running registration — the goroutine
then drove the provider with the old session.Model while the journal
already showed model.switched preceding turn.started.

Move submit's Store.Load and HasLiveTurn check inside d.mu, and hold
d.mu across handleModelSet's Store.Append. The two paths now serialize
on the same lock: either the switch lands and the next submit reads
the new model, or the submit registers first and the switch is
rejected.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b80cf60db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/dispatcher.go Outdated
handleSessionResume returned agentcore.Session directly through
writeJSON, but agentcore.Part is a sealed interface whose concrete
structs keep ID/CreatedAt in an unexported partBase. encoding/json
produced entries like {"Text":"..."} with no kind discriminator, so
clients resuming after any completed turn could not reconstruct part
IDs or kinds. Route the response through agentproto.EncodeSession,
which wraps every Part in PartPayload — same envelope shape the event
stream already uses.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58e6f4bac8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentserver/server.go
PermissionGate.Resolve returns ErrInvalidDecision for malformed payloads
and ErrUnknownPermission for stale IDs; writeResult was bucketing both
into 500. Lift the sentinels to agentcore (ErrPermissionDecision,
reusing existing ErrPermissionNotFound), wrap the agentdriver-local
ones via %w, and map them to 400 / 404 in writeResult.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17bd63cda3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentstore/store.go
A journal Append on a session larger than a single page-write can be
observed mid-write by a concurrent reader using a separate file handle,
so the bufio scanner in ReadAll can decode a truncated line and fail.
Switch sessLock to RWMutex; Load takes RLock, Append keeps the write
lock. Append's internal replay uses a non-locking helper to avoid
re-entering the lock.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5bc6248110

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/driver.go
Comment thread internal/agentdriver/dispatcher.go Outdated
Driver.streamTurn now checks ctx.Err() before completing a turn on
ProviderTurnDone, mirroring the closed-channel guard. Both branches can
be ready in the same select cycle (provider had queued done while
handleToolCall was running and ctx was canceled); without the check the
race could journal turn.completed for a canceled turn.

Dispatcher.handleTurnCancel now returns ErrTurnNotRunning /
ErrTurnMismatch sentinels (defined in agentserver), and writeResult maps
them to 404 / 409 respectively. Stale cancels and mistyped turn IDs no
longer surface as 500.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2b9cf8530

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/driver.go
Comment thread internal/agentdriver/driver.go Outdated
m0n0x41d added 3 commits May 13, 2026 20:03
…e-external

feat(commission): CLI complete-external for externally-run WorkCommissions (#78)
…#77 fix

7.1.0 is a SemVer minor bump on top of 7.0.0:
- feat: haft commission complete-external CLI (#79)
- feat: v8 agent stack foundation M1/M2 + review hardening pass
- fix: artifact_ref alias on haft_decision measure/baseline/apply (#77)
- fix: install.sh CLI archive selection
- chore: FPF corpus refresh to 34b4d63; drop darwin-amd64 from CLI matrix

The earlier "Unreleased — #77 only" stanza is folded into 7.1.0;
the v8 foundation and complete-external entries are added so the
release notes don't lose those contributions silently.
…h errors

Two correctness fixes in the agent driver:

- gatePermission: when Wait returns ctx.Err(), emit a permission.resolved
  (denied) event before propagating the error. Without it, the journal
  has permission.requested with no resolution while the in-memory gate
  entry is already gone — replay surfaces a stuck Pending permission
  that no POST /permission/{id} can resolve.

- failTurn: capture and wrap Sink.Publish errors on the turn.failed
  event instead of silently dropping them. A failed journal write left
  the durable record as Running while the local session moved to Failed,
  silently blocking every future submit via HasLiveTurn until manual
  repair.
@m0n0x41d m0n0x41d changed the title Merge dev to main: v8 foundation M1/M2, #77 fix, hygiene Release 7.1.0: complete-external CLI, v8 foundation, #77 fix May 13, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8d72b8903

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentcore/model.go Outdated
Comment thread internal/agentstore/store.go Outdated
m0n0x41d added 2 commits May 13, 2026 20:13
FPF submodule moves from 34b4d63 → ee40821 ("formatting for GitHub",
2026-05-12). The seven upstream commits in between are not cosmetic:

  9cb163b  Causality and Realizability  (+1524 net, C.27 + C.28)
  f73766d  functional descriptions + A.15.4  (+684 net, new pattern)
  1f7c9e5  authority-looking reliance tuning  (+552 net)
  1f21daa  counterfactuality terminology  (cleanup)
  136be3b  A.6.P terminology cleanup  (cleanup)
  2b3e0f4  admissible action in problematic situations  (Readme)
  ee40821  formatting for GitHub  (style)

Substantive content: a new architectural region for causal evidence
(C.27 Causal-use calculus + C.28 CounterfactualSamplingRealizabilityProfile
with controlled CausalEvidenceSupportBasis vocabulary), and a new A.15
cluster member A.15.4 "Work-Relevant Source Restoration" governing the
recover-source-before-reliance step when an encountered item is about
to support a work/reliance claim by appearance.

Index rebuild via task fpf-index --force:
  indexed_chunks 4972 → 5062  (+90; 4996 spec + 66 patterns)
  fpf_commit ee40821c  (matches submodule HEAD)

Skill floor update — two minimal additions, not a rewrite:

internal/fpf/patterns/cross-cutting.md
  + X-SOURCE-RESTORATION pattern. The detection rule
    "Object ≠ Description ≠ Carrier" was already in the floor; the
    operational rule ("before reliance, recover the project source
    that makes the action admissible") was not. Names dashboards,
    generated explanations, credential views, projection outputs
    (/h-view brief|rationale|audit), copied approvals, provenance
    labels, schema/API text, composed source chains explicitly.

internal/cli/skill/h-reason/SKILL.md
  + DEC-06 Predictions sentence: predictions are causal claims;
    check realizability under physical / ethical / operational
    constraints before committing them as acceptance gates. Points
    to C.27 / C.28 on demand — full calculus stays out of the
    L1 floor to avoid ceremony creep.
  + Cross-cutting block adds X-SOURCE-RESTORATION reference.

Terminology cleanups in A.6.P and counterfactuality propagate through
fpf search automatically because the skill cites pattern IDs (CHR-10
etc) and never the surface text of the spec section.
ModelChoice and agentstore.SessionMeta were serialized with their Go
field names — REST clients sending `credential_key` could not populate
ModelChoice.CredentialKey (case-insensitive matching does not bridge
snake_case to camelCase), and `GET /session` responses leaked
`ProjectID`/`CreatedAt`/`EventCount` while the rest of the protocol
speaks snake_case. Add explicit json tags so the wire shape matches the
rest of agentproto.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5001d600a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentdriver/dispatcher.go
m0n0x41d added 6 commits May 13, 2026 20:26
…-write fix

Two gaps after the initial 7.1.0 cut:

  - a8d72b8 and d5001d6 landed AFTER the changelog was cut. The
    v8 hardening bullet now mentions journaled-canceled-permissions
    + failTurn publish errors and the snake_case JSON wire tagging
    on ModelChoice + agentstore.SessionMeta explicitly. Reorganized
    the bullet into themed groups (concurrency / HTTP surface / wire
    protocol / turn lifecycle / permission gate) so the 17-item list
    is readable instead of a wall of commas.

  - ea68b72 bundled two fixes; only the bindDecisionRef alias half
    was captured (as a follow-on note on the #77 bullet). The other
    half — writeMeta race on the fixed meta.json.tmp filename in
    internal/artifact/store — is a legitimate concurrent-safe Store
    contract violation that affects v7 paths too, not just v8.
    Surfaced as its own Fixed entry.
… basis (CC-B3.9)

Add ScoreEvidenceWithCausalBasis wrapping ScoreTypedEvidence with the C.28
causal-support-basis cap per CC-B3.9: when the basis is simulation-only OR
the linked claim's realizability verdict is "nonrealizable", the resulting
R is capped at 0.5 regardless of verdict, evidence type, or CL. "unknown"
realizability does NOT cap — bounded use under C.28 may still be admissible.
Expired evidence remains at 0.1; the cap floors at 0.5 and does not raise
weak scores. Empty basis/realizability falls through to ScoreTypedEvidence
unchanged so legacy callers preserve pre-7.1.0 semantics bit-for-bit.

Add CausalSupportBasis and forward-compat ClaimRealizability to reff.Evidence;
ComputeClaimAssurance now scores via ScoreEvidenceWithCausalBasis so the
assurance engine honors the cap when callers populate the new fields.
Per-claim realizability resolution from ClaimRefs is marked TODO(post-7.1)
and degrades gracefully — empty realizability behaves identically to the
legacy path.

5 new reff tests pin the contract: cap fires on simulation_only and
nonrealizable, no-op on unknown, parity with ScoreTypedEvidence for
legacy empty-basis inputs, expired evidence not raised by cap.
…Verdict types

Embed the new FPF ee40821 C.28 controlled vocabulary structurally on the
artifact graph, not only as corpus text.

CausalEvidenceSupportBasis is a typed string with 5 canonical values:
observationalAssociationSupportBasis, interventionalActionSupportBasis,
realizedCounterfactualSampleSupportBasis,
identifiedCounterfactualEstimateSupportBasis, and
simulationOnlyCounterfactualOutputBasis. ParseCausalSupportBasis
canonicalizes operator-friendly aliases (observational, simulation_only,
etc.) on read — long FPF names also accepted. Unknown values are rejected
at the artifact boundary, not silently dropped.

RealizabilityVerdict carries the C.28 CounterfactualSamplingRealizability
profile verdict at three values: realizable, nonrealizable, unknown.

EvidenceItem gains causal_support_basis (omitempty); AttachEvidence parses
and validates the value at ingest. DecisionClaim and DecisionPrediction
gain realizability (omitempty), plumbed through newDecisionClaims,
decisionClaimsFromPredictions, normalizeDecisionClaims, and
decisionPredictionsFromClaims mirroring the existing verify_after pattern
buck-for-buck so the round-trip is invariant under any reshape sequence.
PredictionInput on the wire-input layer gains realizability too.

WLNK scoreEvidence in decision.go now routes through
reff.ScoreEvidenceWithCausalBasis so the C.28 cap (CC-B3.9) fires in the
artifact-summary path in lockstep with the assurance engine. Realizability
plumbing per-claim through WLNK is TODO(post-7.1); cap fires on
CausalSupportBasis alone here.

Legacy decisions/evidence without the new fields round-trip unchanged and
score identically to pre-7.1.0 (verified by reff parity test landed in
the prior commit).

6 new artifact tests pin the contract: alias canonicalization for both
basis and realizability, realizability preservation through every
normalize step, EvidenceItem JSON round-trip honors omitempty.
… /h-view projections

MCP and CLI transports both expose the new C.28 fields symmetrically:
haft_decision(action="evidence") accepts causal_support_basis (with the
short alias list and the CC-B3.9 cap callout in the description), and
predictions[] on the decide action accepts realizability with the three
canonical verdicts. internal/tools/haft.go, internal/cli/serve.go, and
internal/fpf/server.go all carry the schema entries so the parity test
stays in lockstep.

Soft warning lands on both transports: if EvidenceItem.Content reads like
a causal-use claim (causal, intervention, counterfactual, uplift,
treatment effect, ...) but CausalSupportBasis is empty, append a C.28 /
CC-B3.9 advisory to the result text. Warning, not reject — legacy ingest
continues unchanged; the surface signals to the LLM caller that an
undeclared basis cannot raise R for a causal-ladder climb.

A.15.4 Work-Relevant Source Restoration footer renders on every
/h-view projection (engineer, manager, audit, compare, delegated-agent,
change-rationale). Section "## Carrier — Not Source of Truth (A.15.4)"
lists every underlying artifact ref the projection was built from
(problems, portfolios, decisions, deterministic ordering), plus the
haft_query(action="get", ref="<id>") recovery path and the on-disk
.haft/{decisions|problems|solutions|evidence}/<id>.md source locations.
Empty graphs still render the section with an informational fallback
line so the carrier semantics stay visible even with no sources. Lives
in internal/present (flow layer) — Core boundary unaffected.

7 new tests: 1 in internal/fpf/server_test.go asserts the MCP-advertised
schema exposes causal_support_basis with the C.28 description and
predictions[].realizability; 6 in internal/present/projection_carrier_test.go
pin the footer across all four primary views plus the no-sources
fallback and the recovery-path citation.

CHANGELOG [7.1.0] gains two new Added entries documenting the C.28
typed binding (with the cap semantics and TODO note) and the A.15.4
carrier footer.
…leted before asserting

Two related races surfaced under -race on the v8 driver integration test
TestIntegration_PermissionRoundTrip:

1. fakeTools.calls was an unsynchronized slice appended from the
   dispatcher goroutine (Driver.Drive → ToolDispatcher.Run) while the
   test goroutine read len(tools.calls) in a busy-wait loop. Guard the
   slice with a mutex and expose a Calls() snapshot accessor; the test
   reads through Calls() so the read is properly synchronized.

2. Even with calls synchronized, the integration test could observe
   tools.Calls() != 0 and exit while the dispatcher was still mid-flight
   journaling turn.completed to the store. The deferred TempDir cleanup
   then races with the still-running goroutine writing into the store
   directory and surfaces as a cleanup failure.

   Replace the len(tools.calls) busy-wait with an SSE drain that consumes
   stream lines until "turn.completed" is observed. The dispatcher always
   journals turn.completed after the tool result, so seeing it on the
   wire is a synchronization point — by then the goroutine has finished
   writing to the store and the deferred cleanup is safe.
….Close to fix -race CI

CI's race-test job kept failing on TestIntegration_TurnCancel and
TestDispatcher_ModelSet_RejectsDuringRunningTurn with the same root
cause: srv.Shutdown waits for HTTP handlers but NOT for goroutines
spawned inside them — handleTurnSubmit fires a detached DriveTurn
goroutine, and the test exits while that goroutine is still journaling
events. Result on Linux runners under -race:

  WARNING: DATA RACE
  Write at ... by Store.Close → Journal.Close (j.w = nil)
  Previous read at ... by DriveTurn → CombinedSink.Publish →
     Store.Append → Journal.Append (j.w.Write)

Cancel + drain pattern, test-only, no production v8 change:

- drainRunningTurn(t, dispatcher, sessionID) issues TurnCancel and polls
  the dispatcher until it returns agentserver.ErrTurnNotRunning. That
  error is the synchronization point: the dispatcher's deferred
  delete(d.running, ...) clears the map AFTER DriveTurn returns and the
  goroutine finishes, so seeing ErrTurnNotRunning guarantees the journal
  Append goroutine has fully unwound.

- bootIntegrationServer.cleanup now drains every session listed by
  store.List("") before calling store.Close. Covers any integration test
  routed through the helper (currently TurnRoundTrip, TurnCancel,
  PermissionRoundTrip) without per-test plumbing.

- TestDispatcher_ModelSet_RejectsDuringRunningTurn uses its own
  Dispatcher (not the boot helper), so it gets an explicit defer
  drainRunningTurn call before its store.Close defer.

10x -race runs on the agentdriver package now pass clean. Full repo
under -race is green.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d45a6e7000

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/agentstore/store.go Outdated
Comment thread internal/agentdriver/driver.go Outdated
m0n0x41d added 2 commits May 13, 2026 22:13
Two Codex P2 findings on PR #80:

- agentstore.Store.Append: writeMeta running after a successful journal
  commit could surface a "publish failed" error to the driver for an
  event already durable on disk. In the turn.started path that made the
  driver skip emitting turn.failed for a turn the journal would replay
  as Running, leaving HasLiveTurn permanently blocked. The journal is
  the authoritative record; meta.json is a denormalised cache for List.
  Treat the meta refresh as best-effort post-commit (Create and Append).
- agentdriver/agentcore: a provider tool call with non-nil empty []byte
  for Args would marshal as invalid JSON via json.RawMessage, failing
  PartToolUseStartedEvent encoding and aborting the turn before the
  tool runs. Normalise empty/nil inputs to a nil Args in
  NewToolUsePart so RawMessage encodes as JSON null; have the driver
  emit the part's normalised Args on the wire.

Tests:
- TestStore_AppendSucceedsWhenMetaWriteFails locks the session dir
  read-only so writeMeta cannot create a tempfile, while the open
  journal FD keeps accepting Appends.
- TestDriver_ToolCall_EmptyArgsEncodes drives an empty-args tool call
  and round-trips the resulting event through EncodeEvent.
… stability

Three commits landed after the 7.1.0 changelog cut (e5a5b25) and were not
yet documented:

- 5d4954e fix(review): journal authoritative + empty tool args
- d45a6e7 test(agentdriver): drain SSE before cleanup race
- e3cad19 test(agentdriver): sync fakeTools.calls

Extend the v8 hardening Added bullet with the two P2 fixes from 5d4954e
(journal-authoritative writeMeta, empty tool args -> JSON null), and add
a new Fixed bullet for the test-side -race stability work that was
blocking CI on PR #80.
@m0n0x41d m0n0x41d merged commit 9661e53 into main May 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant