review: drip-196 anomalyco/opencode#25051, #25050 + openai/codex#20321

Bojun-Vvibe · Bojun-Vvibe · commit b84ddc9fa04b · 2026-04-30T12:49:30.000+08:00
testEffect migration continuation (agent colors mechanical, retry policy with hidden Schema.parse fixture upgrade) plus codex hook trust metadata wire+config scaffolding (HookTrustStatus enum, persisted trustedHash, derived currentHash).
diff --git a/reviews/2026-W18/drip-196/openai-codex-pr-20321.md b/reviews/2026-W18/drip-196/openai-codex-pr-20321.md
@@ -0,0 +1,24 @@
+# openai/codex PR #20321 — [codex] Add hook trust metadata
+
+- PR: https://github.com/openai/codex/pull/20321
+- Head SHA: `cfff723772ba221d168079f79eea9f4afd61cb13`
+- Files touched: 18 files across `app-server-protocol/schema/{json,typescript}` (regenerated), `app-server-protocol/src/protocol/v2.rs`, `app-server/src/codex_message_processor.rs`, `app-server/tests/suite/v2/hooks_list.rs`, `config/src/hook_config.rs`, `config/src/hooks_tests.rs`, `core/config.schema.json`, `hooks/src/config_rules.rs`, `hooks/src/engine/{discovery.rs,mod.rs,mod_tests.rs}`, `protocol/src/protocol.rs` (+505/-61, 18 files)
+
+## Specific citations
+
+- New protocol enum `HookTrustStatus = "managed" | "untrusted" | "trusted" | "changed"` lands in three places: rust core at `protocol/src/protocol.rs`, the v2 wire mirror at `app-server-protocol/src/protocol/v2.rs:481-486` via the `v2_enum_from_core!` macro, and the generated TS schema at `app-server-protocol/schema/typescript/v2/HookTrustStatus.ts`. Three new fields land on `HookMetadata` at `v2.rs:4614-4616` (`current_hash: String`, `trusted_hash: Option<String>`, `trust_status: HookTrustStatus`) and the JSON schema additions at `codex_app_server_protocol.schemas.json:9644-9646, 9706-9714` correctly mark `currentHash` and `trustStatus` as required while `trustedHash` stays nullable.
+- The wire/required-set split is correct: `currentHash` is always derivable from the on-disk hook definition, so it's required; `trustedHash` only exists once a user has approved a hook, so it's nullable. `trustStatus` is required because it's the derived rollup the client renders.
+- `HookSource::Unknown` default at `v2.rs:486` (in `default_hook_source`) is preserved — good, this means clients on older protocol versions that don't send a source still deserialize.
+- `app-server/src/codex_message_processor.rs:8729-8731` plumbs the three new fields from `codex_hooks::HookListEntry` into the wire `HookMetadata` via straightforward `.clone()` / `.into()`. The `.into()` on `trust_status` exercises the `From<CoreHookTrustStatus> for HookTrustStatus` impl that the macro at `v2.rs:481` generates.
+- The regression suite at `app-server/tests/suite/v2/hooks_list.rs:28-71` adds a `command_hook_hash()` helper that builds a TOML table of the hook's identity-defining fields (`event_name`, `matcher`, `handler_type`, `command`, `timeout_sec`, `status_message`) and runs `codex_config::version_for_toml(...)` over it. This is the golden-value computation that locks the on-disk hash format. Three test sites at `:153-162, :232-241, :358-367` then assert `current_hash: command_hook_hash(...)` + `trusted_hash: None` + `trust_status: HookTrustStatus::Untrusted` for first-seen hooks across the user/plugin/project source paths.
+- The README update at `app-server/README.md:1456` correctly states "Unmanaged hooks expose a `currentHash`, persisted `trustedHash`, and derived `trustStatus`" — the wording carefully implies (correctly) that *managed* hooks may render differently in clients (likely `trustStatus: "managed"` regardless of hash), which the test surface doesn't yet exercise.
+
+## Verdict: merge-after-nits
+
+## Concerns / nits
+
+1. **No regression for the `trustStatus: "trusted"` and `"changed"` derivation paths**. The three tests at `:153, :232, :358` all assert the first-seen `Untrusted` shape. The whole point of the feature is to distinguish first-seen unmanaged hooks from hooks whose approved contents have *drifted* — that's the `"changed"` state — and there's no test in the diff slice that writes a `trusted_hash` for a hook, then mutates the on-disk command, then asserts `trust_status == Changed`. Same for `Trusted` (matching hashes). The PR body says "first backend slice for hook trust" so the second slice may add this, but a single state-transition regression in *this* PR would lock the derivation rule before the enforcement slice depends on it.
+2. **`current_hash: String` (not `Hash`-typed)** — the JSON schema at `:9644` accepts any string. Future readers at the call site `:8729` see `hook.current_hash.clone()` with no indication that the value is `sha256:...`-prefixed (per the README example) or that downstream comparison `current_hash == trusted_hash` is the canonical check. A small newtype `HookHash(String)` with `Display` + `FromStr` would prevent string-typed leakage and make the comparison in the future enforcement slice misuse-resistant.
+3. **The TOML-table golden hash helper at `:28-71`** uses `codex_config::version_for_toml(...)` directly. This works but couples the regression to the *specific implementation* of the hash function — any future change to the hash function (different field order, a new field added, a different serialization) silently breaks every assertion. Worth either (a) factoring out a public `compute_hook_hash(metadata: &HookMetadata) -> String` in `codex_hooks` and having both the production code and the test call it, or (b) asserting on a literal `"sha256:..."` golden string per test case so a hash-function change fails loudly with a stable error.
+4. **`unreachable!("TOML table construction should stay a table")`** at `:33-36` is dead-code defense against a pattern that the immediately-preceding line constructs as a table. Either drop the let-else and use `let mut table = Default::default();` directly, or write `let mut table = codex_config::TomlTable::new();` if the type is exposed. The current shape adds noise without buying any safety.
+5. **`v2_enum_from_core!` macro discipline**: the new `HookTrustStatus` enum is added to the macro's enum list at `:481-486` but there's no test that round-trips `Managed → wire → Managed` etc. across the v2 boundary. The macro likely auto-derives this, but a single `serde_json` round-trip assertion would catch a future variant rename / case-mismatch silently breaking the wire contract.
diff --git a/reviews/2026-W18/drip-196/sst-opencode-pr-25050.md b/reviews/2026-W18/drip-196/sst-opencode-pr-25050.md
@@ -0,0 +1,23 @@
+# sst/opencode PR #25050 — test: use testEffect for retry policy
+
+- PR: https://github.com/sst/opencode/pull/25050
+- Head SHA: `f90151516dbedfdf907943f957ca3764e063aefe`
+- Files touched: `packages/opencode/test/session/retry.test.ts` (+86/-79, 1 file)
+
+## Specific citations
+
+- The `it.live` migration is the same shape as the sibling testEffect series (drip-194 #25053, drip-195 #25052, drip-196 #25051) but this PR also lands a real correctness improvement: `apiError()` at `:19-26` now goes through `MessageV2.APIError.Schema.parse(...)` instead of the prior `as MessageV2.APIError` cast on the raw `.toObject()` output. The previous shape was lying to the type system about the runtime shape of the error object — every call site that read `.data.isRetryable` was relying on a structure that was *probably* but not *guaranteed* equal to the schema's projection. This PR fixes ~7 call sites: `:96` (the migrated retry-policy test), `:178`, `:190`, `:202`, `:214`, `:225` (the four `retries N status code` tests), `:268-270` (ECONNRESET fromError test), `:296`, `:319` (the openai-emulation tests). Net effect: every regression now exercises the parsed-schema shape that production code actually receives.
+- `wrap()` at `:30` adds `name: ""` to the synthesized error envelope. This is a minor type-shape correction — `NamedError["toObject"]` returns `{ name, data: ... }` and the previous `{ data: { message } } as ReturnType<NamedError["toObject"]>` was structurally incomplete. Cast-removal is the right direction.
+- The migrated `policy updates retry status and increments attempts` test at `:86-117` now resolves `SessionStatus.Service` once via `yield* SessionStatus.Service` at `:91` and reuses the resolved handle inside the `set` callback at `:97-103` — replacing two separate `AppRuntime.runPromise(SessionStatus.Service.use(...))` round-trips with a single in-Effect resolution. This is a real reduction in runtime ceremony and removes a potential layer-mismatch hazard between the test setup runtime and the inner per-call runtime.
+- The composed test layer at `:13` is `Layer.mergeAll(SessionStatus.defaultLayer, CrossSpawnSpawner.defaultLayer)`. The `CrossSpawnSpawner` dependency is transitive from `SessionStatus`'s default wiring (the same pattern as #25051's `AgentSvc.defaultLayer` requiring it).
+- New `if (!MessageV2.APIError.isInstance(result)) throw new Error("expected APIError")` guards at `:268, :298, :322` replace `(result as MessageV2.APIError)` casts. This is the refinement-via-narrowing pattern — slightly more verbose at the call site but catches schema-drift at runtime instead of segfaulting on access of a missing field.
+
+## Verdict: merge-after-nits
+
+## Concerns / nits
+
+1. **Hidden behavior change disguised as a test refactor**: the `apiError` helper at `:19-26` switching from `as` cast to `Schema.parse` is *not* a pure test-shape change — it forces the test fixtures to exercise the same parse path production code uses. This is the right thing, but it should be called out in the PR title or body so reviewers don't skim past it as a mechanical migration. As-is, the title says "use testEffect for retry policy" which understates the change.
+2. **Three sibling tests still bypass the composed test layer**: `retries 500 errors`, `retries 502`, `retries 503` (and friends) at `:178-225` all stay as plain `test()` and exercise `SessionRetry.retryable(error)` as a pure function. That's correct for the function-level coverage they aim at, but means the composed `it.live` layer is only used by one test in the file. If the broader migration intent is uniformity, future passes should either keep the pure tests as-is and document why, or migrate them too — the current mid-state reads slightly inconsistent.
+3. **`SessionRetry.policy.parse` callback now does `MessageV2.APIError.Schema.parse(err)` at `:94`** — this is a parse on every retry attempt, not just at construction time. For the test it's fine (one error, two retries), but the production `SessionRetry.policy` shape is `{parse: (err) => ...}` and the choice between `(err) => err as MessageV2.APIError` (zero-cost) and `(err) => Schema.parse(err)` (per-attempt parse cost) is a real perf decision worth noting. Worth confirming the production callers don't pay a parse-per-retry cost that this test now models.
+4. **Two `if (!APIError.isInstance(result)) throw` guards** at `:268-269, :298, :321-322` are identical — could lift to a small `expectAPIError(result)` helper local to the file. Cosmetic.
+5. **The `Layer.mergeAll(SessionStatus.defaultLayer, CrossSpawnSpawner.defaultLayer)` at `:13`** carries a non-obvious dependency on `CrossSpawnSpawner` being transitively required by `SessionStatus`. A short comment explaining *why* `CrossSpawnSpawner` is needed for a retry-status test would save a future reader a layer-graph trace.
diff --git a/reviews/2026-W18/drip-196/sst-opencode-pr-25051.md b/reviews/2026-W18/drip-196/sst-opencode-pr-25051.md
@@ -0,0 +1,18 @@
+# sst/opencode PR #25051 — test: use Effect test helper for agent colors
+
+- PR: https://github.com/sst/opencode/pull/25051
+- Head SHA: `d1953fc8999fef3464222337edf78121619fa0a2`
+- Files touched: `packages/opencode/test/config/agent-color.test.ts` (+43/-51, 1 file)
+
+## Specific citations
+
+- Net is -8 LOC and the migration shape exactly mirrors drip-194 #25053 and drip-195 #25052: replace `await using tmp = await tmpdir()` + `Instance.provide({...})` + `AppRuntime.runPromise(Config.Service.use(...))` with `it.live(...)` over `Effect.gen` that yields `tmpdirScoped()` and pipes through `provideInstance(dir)`. The composed test layer at `:14` is `Layer.mergeAll(AgentSvc.defaultLayer, CrossSpawnSpawner.defaultLayer)` — `CrossSpawnSpawner.defaultLayer` is added because the `AgentSvc` factory transitively spawns processes during the `agent.get` path that the second test exercises.
+- `writeConfig` helper at `:18-27` lifts the inline JSON-write into a small `Effect.promise`-wrapped `Bun.write`. The two callers at `:30-33` and `:46-49` differ only by their `agent` payload, so the helper is justified — and unlike a plain async function it returns an Effect that composes with the surrounding `Effect.gen`.
+- The first test still pierces the layer to call `AppRuntime.runPromise(Config.Service.use(...))` at `:36` rather than going through the test layer's services. This is fine for a config-parse smoke test (the goal is to assert `config.toml` parses), but it means this test isn't actually exercising the services injected by `it.live`'s composed layer — only the second test does.
+- The third `test("Color.hexToAnsiBold ...")` at `:55-58` (in original) is left untouched as a plain `bun:test` test — correct, since it tests a pure function and adding Effect ceremony would be cargo-culting.
+
+## Verdict: merge-as-is
+
+## Rationale
+
+Mechanical, narrow, and matches the established migration shape across the testEffect series (drip-194 #25053, drip-195 #25052, and now this). Net -8 LOC. No new behavior, no new code paths, no test parallelization concerns since the only stateful piece is a per-test `tmpdirScoped`. The pure `Color.hexToAnsiBold` test correctly stays as plain `test()`. One mild observation worth a follow-up but not a blocker: the first test's `AppRuntime.runPromise(Config.Service.use(...))` bypasses the `it.live`-composed layer; if the broader migration intent is to have all config tests resolve services through the test layer (so `CrossSpawnSpawner.defaultLayer` and friends are uniformly stub-able), a future pass could route this through `Config.Service.use` from inside the `Effect.gen`. As-is, it's still strictly better than the pre-PR shape.