Skip to content

Latest commit

 

History

History
688 lines (576 loc) · 66.7 KB

File metadata and controls

688 lines (576 loc) · 66.7 KB

Backend Parity Matrix — EngineBackend trait

Filename note (v0.12 / RFC-023 §9). The file is named POSTGRES_PARITY_MATRIX.md for historical reasons; at v0.12 it gained a SQLite column and now documents all three backends. The filename is retained to avoid URL churn in external consumer references. Rename to BACKEND_PARITY_MATRIX.md is an open owner call if the cross-reference cost ever stops exceeding the clarity gain (RFC-023 §7, §9 owner note).

RFC-018 Stage A note (2026-04-24, reshaped in v0.10): this matrix is now callable at runtime via EngineBackend::capabilities() — concrete ValkeyBackend / PostgresBackend impls populate a flat Capabilities { identity, supports } value, and consumers (cairn operator UI, operator tooling) dot-access the bools (backend.capabilities().supports.<field>) to read the typed answer at startup instead of parsing this file. v0.9 shipped a BTreeMap<Capability, CapabilityStatus> shape; v0.10 reshaped to the flat Supports struct per cairn's original #277 ask. This document remains the human-readable reference during the RFC-017 migration; drift between the two is a bug. Stage B (this file generated from the runtime value + CI drift check) lands as a follow-up PR per RFC-018 §8.

Source of truth for per-method status across Valkey and Postgres backends during the RFC-017 staged migration. Greppable by cairn-fabric runbooks + operator tooling. Updated at every stage merge.

Legend

  • impl — backend ships a real implementation (zero-behaviour-change wrapper or native path).
  • stub — trait default EngineError::Unavailable { op } in use. Not a bug; a deliberate Stage marker. Every row that ships stub at Stage D merge is a hard block on the Stage D PR (RFC-017 §9.0 L-1) — if the Postgres cell below is still stub by Stage D, the PR does not merge.
  • n/a — method does not apply (e.g. streaming-feature-gated method on a backend that disables streaming).

Fleet-wide cutover note (RFC-017 §9.0 round-2 K-R2-N1). During Stages B-D the ff-server binary hard-gates FF_BACKEND=postgres at boot. Mixed-fleet rolling upgrades (some nodes on Stage-D binary, some on Stage-E) with FF_BACKEND=postgres are unsupported. Operators must complete a valkey→valkey Stage D→E rolling upgrade first and flip FF_BACKEND=postgres as a second rollout.

Stage E sub-split (owner-adjudicated 2026-04-24). Stage E is split into four PRs so each lands in a single session with CI-green:

  • E1 (impl/017-stage-e1) — Postgres dial branch in Server::start_with_metrics + http_postgres_smoke.rs test passing end-to-end for the 5 migrated ingress routes. PostgresBackend::create_execution trait impl added (inherent was there since Wave 4). Server.engine / Server.scheduler become Option (None on Postgres). Dev-mode override (FF_BACKEND_ACCEPT_UNREADY=1 + FF_ENV=development) still required; §9.0 hard-gate not yet flipped.
  • E2 (shipped, impl/017-stage-e2) — cancel_flow header + member- ack migrated through the backend trait via new provided-default methods (cancel_flow_header, ack_cancel_member), Valkey impls porting the Server bodies verbatim (plus read_execution_info, read_execution_state, fetch_waitpoint_token_v07); Server::client: Client field + Postgres-branch ambient Valkey dial + fcall_with_reload + fcall_with_reload_on_client + parse_cancel_flow_raw + is_function_not_loaded + fcall_field_str
    • standalone ack_cancel_member helper removed. Postgres path touches no Valkey at all.
  • E3Server::claim_for_worker trait cutover (Postgres-native scheduler), Server::scheduler field removal.
  • E4 — v0.8.0 cleanup: deprecated flat Valkey ServerConfig fields removed, legacy waitpoint_token wire field removed, BACKEND_STAGE_READY flipped to &["valkey", "postgres"], version bump, cairn matrix published.

Stage D split (owner-adjudicated 2026-04-24). Stage D scope from RFC-017 §9 was split into two PRs to keep reviewer cognitive load bounded:

  • D1 (this row of flips) — §8 PendingWaitpointInfo schema rewrite, Valkey list_pending_waitpoints impl, 8 HTTP handler migrations, Server::start_with_backend entry point, FF_BACKEND env + §9.0 hard-gate wiring at stage "D", deprecation audit log (Deprecation: ff-017 header + ff_pending_waitpoint_legacy_token_served_total).
  • D2 (follow-up PR, required before v0.7 tag) — boot relocation into ValkeyBackend::connect_with_metrics (§4 row 12), Server::client: Client field removal (cascades to remaining handlers still using fcall_with_reload), http_postgres_smoke.rs integration test, Stage-B feature flag removal.

RFC-017 Stage A trait surface (51 methods total)

RFC-012 baseline (31 methods)

These landed with the initial trait in RFC-012 and are fully covered on both backends. Table omitted for brevity; consult crates/ff-core/src/engine_backend.rs pre-RFC-017-§5 for the full list. Both backends tested across the hot path; streaming-feature methods (read_stream, tail_stream, read_summary) are impl on both when the streaming feature is enabled, n/a otherwise.

RFC-017 Stage A additions (17 methods) — this PR

# Method Valkey Postgres Notes
1 create_execution impl impl Stage E1 flip. Valkey wraps ff_create_execution FCALL. Postgres trait impl lifts the existing inherent helper and always returns CreateExecutionResult::Created { public_state: Waiting } — the helper does not yet distinguish Created vs Duplicate; a follow-up may refine (http_postgres_smoke exercises it through HTTP).
2 create_flow impl impl Promoted from PG inherent to trait.
3 add_execution_to_flow impl impl Promoted from PG inherent to trait.
4 stage_dependency_edge impl impl Promoted from PG inherent to trait.
5 apply_dependency_to_child impl impl Promoted from PG inherent to trait.
5c project_flow_summary impl impl PR-7b Cluster 2b-B. Valkey lifts the pre-PR-7b Rust-composed SRANDMEMBER sample + per-member HGET + HSET summary pattern verbatim (no new Lua — the aggregation is inherently multi-round-trip across 256 exec partitions). Postgres SELECTs aggregate counts from ff_exec_core grouped by public_state + INSERT ... ON CONFLICT DO UPDATE into ff_flow_summary (migration 0019); partition-local. Derived public_flow_state is distinct from ff_flow_core.public_flow_state (RFC-007). SQLite returns Unavailable per RFC-023 Phase 3.5 (projection is a shared-deployment dashboard feature).
5d trim_retention impl impl PR-7b Cluster 2b-B. Valkey lifts the ZRANGEBYSCORE + per-execution cascade-delete loop verbatim (cluster-safe — all keys for one execution live on the same {p:N} slot). Postgres SELECTs a batch_size-bounded set of terminal+old executions, then cascades DELETEs across every execution-scoped sibling table in one transaction (no FK CASCADE in schema). Returns the count of ff_exec_core rows deleted. Per-execution retention overrides are deferred on PG — scanner uses the global default only; Valkey applies them via per-exec policy reads. SQLite returns Unavailable per RFC-023 Phase 3.5 (single-tenant local deployments manage their own DB lifecycle).
5b resolve_dependency impl stub PR-7b Step 0 overlap-resolver. Valkey wraps ff_resolve_dependency FCALL (RFC-016 Stage C KEYS[14]+ARGV[5]). Postgres returns Unavailable { op: "resolve_dependency" }; PG's post-completion cascade runs via ff_backend_postgres::dispatch::dispatch_completion(event_id) keyed on the ff_completion_event outbox, not per-edge — the per-edge Valkey shape does not map cleanly. PG reconciler already calls dispatch_completion directly; PR-7b/final's integration test expects Unsupported from Valkey-shaped scanners on PG. SQLite mirrors PG.
5e cascade_completion impl impl PR-7b Cluster 4 trait-routed completion listener (v0.13). Per-completion cascade behind EngineBackend::cascade_completion(&CompletionPayload). Documented timing divergence, not a parity gap. Valkey is synchronous (CascadeOutcome.synchronous = true) — FCALL-driven walk completes inline, child_skipped descendants recursed up to MAX_CASCADE_DEPTH before return. Postgres is async-via-outbox (synchronous = false) — resolves payload → ff_completion_event.event_id, invokes Wave-5a dispatch_completion; further hops ride their own outbox events emitted by per-hop tx. Both satisfy cascade; consumers requiring synchronous cascade target Valkey or verify PG dispatched_at_ms drain. SQLite inherits default Unavailable until a SQLite cascade lands.
6 cancel_execution impl impl Landed Stage C (Valkey) + Wave 9 v0.11 (Postgres, RFC-020). Postgres wraps the §4.2 SERIALIZABLE-fn template + ff_lease_event emit. Valkey body does HMGET pre-read (lane_id + current_attempt_index + current_waitpoint_id + current_worker_instance_id) then raw FCALL with variadic KEYS(21)/ARGV(5) matching lua/execution.lua::ff_cancel_execution.
7 change_priority impl impl Landed Stage C (Valkey) + Wave 9 v0.11 (Postgres, RFC-020 §4.2.4). Postgres UPDATE on ff_exec_core + ff_operator_event outbox emit (migration 0010); row-count=0 maps to EngineError::ExecutionNotEligible per Rev 7 Fork 3. Valkey reads authoritative lane via HGET when caller passes empty lane_id, then wraps ff_change_priority.
8 replay_execution impl impl Landed Stage C (Valkey) + Wave 9 v0.11 (Postgres, RFC-020 §4.2.5 Rev 7). Postgres in-place UPDATE on existing ff_attempt (no attempt_index bump, matches Valkey) + ff_edge_group counter reset for skipped-flow-member path (Rev 7 Fork 1 Option A) + ff_operator_event emit.
9 revoke_lease impl impl Landed Stage C (Valkey) + Wave 9 v0.11 (Postgres, RFC-020). Postgres SERIALIZABLE-fn path + ff_lease_event emit; surfaces "no active lease" as RevokeLeaseResult::AlreadySatisfied { reason: "no_active_lease" } matching Valkey.
10 create_budget impl impl Wave 9 v0.11 (RFC-020 §4.4, Rev 6). Postgres INSERT on ff_budget_policy (migration 0013 adds scheduling + breach + definitional columns). Valkey wraps ff_create_budget.
11 reset_budget impl impl Wave 9 v0.11. Postgres UPDATE clears breach counters + bumps next_reset_at_ms. Valkey wraps ff_reset_budget.
12 create_quota_policy impl impl Wave 9 v0.11 (RFC-020 §4.4, Rev 6). Postgres INSERT on ff_quota_policy family (migration 0012: ff_quota_policy + ff_quota_window + ff_quota_admitted, 256-way HASH-partitioned on partition_key). Valkey wraps ff_create_quota_policy.
13 get_budget_status impl impl Wave 9 v0.11. Postgres 2-table read on ff_budget_policy + ff_budget_usage with hard_limits / soft_limits parsed from policy_json. Valkey is 3× HGETALL.
14 report_usage_admin impl impl Wave 9 v0.11. Postgres READ-COMMITTED INSERT-or-UPDATE on ff_budget_usage + incremental breach-counter bookkeeping on ff_budget_policy matching Valkey's pattern.
15 get_execution_result impl impl Wave 9 v0.11 (RFC-020 §4.1). Postgres SELECT result FROM ff_exec_core (current-attempt semantics match Valkey's GET ctx.result()).
16 list_pending_waitpoints impl impl Landed Stage D1 (Valkey) + Wave 9 v0.11 (Postgres, RFC-020 §4.5). Postgres SELECT against ff_waitpoint_pending with migration 0011 additive columns (state, required_signal_names, activated_at_ms); producer-side suspend_ops writes these on insert + activation. waitpoint_key was already shipped via migration 0004 (Rev 5 ground-truth correction). Valkey path: pipelined SSCAN + 2× HMGET. Both backends share the §8 D1 schema (HMAC redaction + token_kid/token_fingerprint) + after/limit pagination.
17 ping impl impl Valkey: PING. Postgres: SELECT 1.
18 claim_for_worker impl impl Landed Stage C (Valkey) + Stage E3 (Postgres). Valkey: ff-backend-valkey added ff-scheduler dep; ValkeyBackend holds Option<Arc<ff_scheduler::Scheduler>> wired at ff-server boot. Postgres: PostgresScheduler + 6 reconcilers shipped at Stage E3 (v0.8.0). Both backends report Supports.claim_for_worker = true when a scheduler is wired.
19 cancel_flow_header impl impl Landed Stage E2 (Valkey) + Wave 9 v0.11 (Postgres, RFC-020 §4.3). Postgres UPDATE on ff_flow + insert into ff_cancel_backlog (migration 0014) driving the per-member cancel sweep; AlreadyTerminal returned idempotently with stored policy/reason on row-count=0. Valkey wraps ff_cancel_flow FCALL.
20 ack_cancel_member impl impl Landed Stage E2 (Valkey) + Wave 9 v0.11 (Postgres). Postgres DELETE on ff_cancel_backlog member entry; drives the cancel-backlog reconciler. Valkey wraps ff_ack_cancel_member (SREM + conditional ZREM).
21 read_execution_info impl impl Landed Stage E2 (Valkey) + Wave 9 v0.11 (Postgres, RFC-020 §4.1). Postgres multi-column projection on ff_exec_core + LEFT JOIN LATERAL on ff_attemptExecutionInfo / StateVector parse; partition-local. Returns Ok(None) when the row is absent. Valkey: HGETALL on exec_core.
22 read_execution_state impl impl Landed Stage E2 (Valkey) + Wave 9 v0.11 (Postgres, RFC-020 §4.1). Postgres SELECT public_state FROM ff_exec_core (partition-local single-column point read) + JSON deserialize. Valkey: HGET public_state on exec_core.
23 fetch_waitpoint_token_v07 impl stub Landed Stage E2 (relocated from Server). Valkey body is HGET waitpoint_token on the exec's waitpoint hash. Filters empty strings to None. Retires at v0.8.0 with the legacy wire field. Postgres returns Unavailable.

Cross-cutting (unconditional, landed pre-Stage-A):

Method Valkey Postgres
backend_label "valkey" "postgres"
shutdown_prepare impl (semaphore drain) impl (ping check; pool drain a follow-up)
prepare (issue #281) impl (FUNCTION LOAD via ff_script::loader::ensure_library; returns PrepareOutcome::Applied { description: "FUNCTION LOAD (flowfabric lib v<N>)" }) impl (returns PrepareOutcome::NoOp — schema migrations are applied out-of-band per v0.7 Wave 0 Q12; schema-version check runs at connect time)

Count reconciliation

Trait surface pre-RFC-017: 33 methods (includes backend_label + shutdown_prepare landed in Stage B pre-work).

  • Note the RFC drafted "31 existing" but a direct grep on the pre-RFC trait found 33 methods. Both backend_label + shutdown_prepare were added in the Stage B pre-work landed on main via PR #264 before this Stage A backfill.

Trait surface after RFC-017 Stage A backfill: 50 methods (33 + 17 new). The RFC's "51" target counts backend_label and shutdown_prepare as Stage A additions; in-tree both landed early (Stage B), so the net Stage A addition here is 17 methods — matching the RFC's §2.3 breakdown minus the two cross-cutting methods already on main.

Reporter must choose the canonical count: this PR uses in-tree count = 50, matching cargo expand + grep -cE 'async fn' on the trait. The RFC-cited "51" remains correct when counting the original trait at 31 + the RFC §2.3 new-20; in-tree drift is due to backend_label + shutdown_prepare landing earlier than the RFC sequence anticipated.


Stage boundaries

  • Stage A (this PR): trait surface complete; Valkey impls for cheap single-FCALL ops; Postgres impls for ingress (promoted from inherent).
  • Stage B (shipped, PR #264): read + admin + stream handler migration on Valkey.
  • Stage C (shipped, PR impl/017-stage-c): operator control + budget-status + claim handler migration. The 6 Valkey stub rows above (cancel_execution, change_priority, replay_execution, revoke_lease, get_budget_status, claim_for_worker) moved to impl. list_pending_waitpoints remains stub pending Stage D's §8 schema rewrite. Stage C also migrated the 10 HTTP handlers (2 operator control, 4 budget admin, 1 quota admin, 1 claim, 1 budget status, 1 admin report_usage) from inherent Server::X(...) + fcall_with_reload to server.backend().X(...) trait dispatch via the freshly minted From<EngineError> for ApiError bridge. The Engine IntoResponse arm gained Conflict / Contention / State kinds mapping to HTTP 409 per RFC-010 §10.7.
  • Stage D1 (this PR, impl/017-stage-d1): §8 schema rewrite + Valkey list_pending_waitpoints impl + 8 HTTP handlers migrated to trait dispatch (create_execution, create_flow, add_execution_to_flow, stage_dependency_edge, apply_dependency_to_child, get_execution_result, list_pending_waitpoints, + header-only migration of cancel_flow — the async member dispatcher remains on Server pending D2). Server::start_with_backend entry point added; FF_BACKEND env + §9.0 hard-gate wired at stage "D" with the dev-mode override and ff_backend_unready_boot_total{backend,stage} metric. Deprecation audit log: Deprecation: ff-017 response header + ff_pending_waitpoint_legacy_token_served_total counter
    • per-entry pending_waitpoint_legacy_token_served tracing event.
  • Stage D2 (follow-up, required before v0.7 tag): boot relocation into ValkeyBackend::connect_with_metrics (§4 row 12), Server::client: Client field removal + cascading handler cutover, http_postgres_smoke.rs integration test, Stage-B feature flag removal (if present). CI gate: test_postgres_parity_no_unavailable asserts no Postgres Unavailable on any HTTP-exposed method. Every stub in the Postgres column above must be impl before Stage D2 merges.
  • Stage E1 (this PR, impl/017-stage-e1): Server::start_with_metrics gains a Postgres branch — dials PostgresBackend::connect_with_metrics, runs apply_migrations, installs the backend as Arc<dyn EngineBackend> for all migrated HTTP ingress handlers. Server::engine and Server::scheduler become Option<T> (None on Postgres; Stage E3 wires the Postgres-native scheduler). PostgresBackend::create_execution lifted onto the trait. New crates/ff-test/tests/http_postgres_smoke.rs (gated on postgres-e2e feature) passes end-to-end against a live Postgres for 5 migrated HTTP routes: create_flow, create_execution × 3, add_execution_to_flow × 3, stage_dependency_edge × 2, apply_dependency_to_child × 2.
  • Stage E (v0.8.0): BACKEND_STAGE_READY updated to &["valkey", "postgres"]; FF_BACKEND=postgres boots successfully for the first time.

Remaining gaps for v0.8.0 (E2 honest-scope)

Stage E2 closes the Server::client removal and lifts cancel_flow

  • get_execution* + fetch_waitpoint_token_v07 onto the backend trait (Valkey impls port the Server bodies verbatim; Postgres returns Unavailable for rows that land in Wave 9). The Stage E1 smoke still deliberately avoids the following HTTP routes because Postgres does not yet implement the underlying trait method:
  • POST /v1/flows/{id}/cancel — the Server side now dispatches through backend.cancel_flow_header + backend.ack_cancel_member; both methods have Valkey impls but Postgres returns Unavailable until Wave 9 lands the cancel machinery. Smoke remains Valkey-only.
  • GET /v1/executions/{id} / …/state — dispatch through backend.read_execution_info / backend.read_execution_state; Valkey impls port the HGETALL / HGET + parse logic. Postgres default is Unavailable until the PG read-model lands.
  • GET /v1/executions/{id}/result — trait method get_execution_result was already available; Valkey impl unchanged. Postgres returns Unavailable until the result-store migration lands.
  • GET /v1/flows/{id} — this HTTP route does not exist today; callers infer flow state from member executions. Introducing it is an ingress-read task tracked post-E4.
  • POST /v1/workers/{id}/claimServer::claim_for_worker calls self.scheduler.claim_for_worker which wraps a ferriskey::Client. No Postgres equivalent yet; Stage E3 wires the Postgres-native scheduler path.
  • Scheduler + engine scanners on Postgres — the Postgres boot path skips Engine::start_with_completions (scanners hold a ferriskey::Client) and skips Scheduler::with_metrics. The Postgres reconciler helpers already live under crates/ff-backend-postgres/src/reconcilers/ and the engine crate's scan_tick_pg wrappers exist; a Postgres-side scanner supervisor spawning them on an interval is a Stage E2/E3 task.
  • Dev-mode overrideFF_BACKEND=postgres still requires FF_BACKEND_ACCEPT_UNREADY=1 + FF_ENV=development through Stages E1-E3. Stage E4 flips BACKEND_STAGE_READY to include "postgres" and this requirement goes away.
  • Server::client: Client on Postgres path — retired in Stage E2. The Postgres branch of Server::start_with_metrics no longer dials Valkey at all; the residual legacy-field paths all flow through the EngineBackend trait, and Postgres returns Unavailable for the methods whose impls land in Wave 9 (cancel_flow_header, ack_cancel_member, read_execution_info, read_execution_state, fetch_waitpoint_token_v07).

Release: v0.8.0 (Stage E4, 2026-04-24)

Gate status: BACKEND_STAGE_READY = &["valkey", "postgres"]. FF_BACKEND=postgres boots natively; no FF_BACKEND_ACCEPT_UNREADY dev-override. The §9.0 hard-gate still exists in ff-server as defence-in-depth for future backend additions (e.g. SQLite, DynamoDB) but no longer trips on Postgres.

Parity summary at v0.8.0

Family Valkey Postgres Notes
Ingress (create_flow, create_execution, add_execution_to_flow, stage_dependency_edge, apply_dependency_to_child) impl impl Full HTTP parity. http_postgres_smoke exercises all 5 end-to-end.
Flow family (describe_flow, list_flows, list_edges, describe_edge, cancel_flow, set_edge_group_policy) impl impl RFC-v0.7 Wave 4c. cancel_flow accepts all three CancelFlowWait modes (NoWait, WaitTimeout(Duration), WaitIndefinite) on both backends via shared wait_for_flow_cancellation poll (#298).
Flow cancel (cancel_flow_header, ack_cancel_member) impl stub Wave 9 follow-up; Postgres cancel_flow covers the bulk path, the header/ack split lands with cancel-backlog machinery.
Read model (read_execution_info, read_execution_state, get_execution_result) impl stub Wave 9 PG read-model migration.
Operator control (cancel_execution, change_priority, replay_execution, revoke_lease) impl stub Wave 9.
Budget / quota (create_budget, reset_budget, create_quota_policy, get_budget_status, report_usage_admin) impl stub Wave 9.
Scheduler (claim_for_worker) impl impl Stage E3 PostgresScheduler + claim_for_worker trait impl + 6 reconcilers.
Waitpoints (list_pending_waitpoints) impl stub Wave 9.
Admin rotation (rotate_waitpoint_hmac_secret_all) impl impl Shipped pre-v0.10; Postgres path is a single INSERT against ff_waitpoint_hmac(kid, secret, rotated_at) (ground truth: crates/ff-backend-postgres/tests/capabilities.rs:44).
Admin seed (seed_waitpoint_hmac_secret, issue #280) impl impl Idempotent boot-time seed so cairn can drop its raw HSET boot path. Valkey fans out per-partition HSET against the waitpoint_hmac_secrets:{p:N} layout. Postgres INSERTs one row into ff_waitpoint_hmac when no kid is active. AlreadySeeded { same_secret } lets callers distinguish replay from real conflict.
Cross-cutting (backend_label, shutdown_prepare, ping) impl impl

v0.8.0 shipped

  • Stage E1 — Postgres HTTP ingress branch + http_postgres_smoke.
  • Stage E2 — Server::client field retired on Postgres path; cancel_flow_header / ack_cancel_member / read_execution_info / read_execution_state / fetch_waitpoint_token_v07 moved onto the trait (Valkey bodies ported verbatim; Postgres returns Unavailable for the Wave-9 rows).
  • Stage E3 — PostgresScheduler + claim_for_worker trait impl + sibling-cancel, lease-timeout, completion-listener, cancel-backlog, flow-staging, edge-group-policy reconcilers.
  • Stage E4a (this release) — BACKEND_STAGE_READY flip, ServerConfig flat-field removal, legacy waitpoint_token wire field removal, v0.8.0 version bump, CHANGELOG, migration doc.

RFC-019 Stage A — Stream-cursor subscriptions (4 methods)

Method Valkey Postgres Notes
subscribe_lease_history impl impl Valkey: duplicate_connection() + XREAD BLOCK 5000 STREAMS ff:part:{fp:N}:lease_history <cursor>; cursor is 0x01 ++ ms(BE8) ++ seq(BE8). Postgres: ff_lease_event outbox + LISTEN ff_lease_event; cursor is 0x02 ++ event_id(BE8). Producer sites: attempt::{claim,claim_from_reclaim,renew,complete,fail,delay,wait_children}, suspend_ops::{suspend_impl,claim_resumed_execution_impl}, flow::cancel_flow_once, reconcilers::{attempt_timeout,lease_expiry}. #282 ScannerFilter surface: trait method takes filter: &ScannerFilter; Valkey gates via #122 FilterGate per-event HGET on ff:exec:{p}:<eid>:tags; Postgres filters inline against denormalised namespace/instance_tag columns on ff_lease_event (migration 0008). (Stage B, #308 / #282)
subscribe_completion impl impl Postgres wraps completion::subscribe (ff_completion_event outbox + LISTEN ff_completion), durable via event-id cursor. Valkey (Stage B, #309) wraps the RESP3 ff:dag:completions pubsub subscriber — partial: non-durable cursor (pubsub-backed, at-most-once over the live subscription window). Postgres impl is durable via outbox + cursor. Durable Valkey completion subscription is a separate follow-up if demanded. #282 ScannerFilter surface: trait method takes filter: &ScannerFilter; non-noop filter routes both backends through the existing subscribe_completions_filtered path so multi-consumer isolation matches the #122 design.
subscribe_signal_delivery impl impl Valkey: duplicate_connection() + XREAD BLOCK 5000 STREAMS ff:part:{fp:N}:signal_delivery <cursor>; cursor is 0x01 ++ ms(BE8) ++ seq(BE8). Producer XADD lives in ff_deliver_signal at KEYS[15]. Postgres: ff_signal_event outbox + LISTEN ff_signal_event; cursor is 0x02 ++ event_id(BE8). Producer INSERT lives in suspend_ops::deliver_signal_impl's SERIALIZABLE tx. #282 ScannerFilter surface: trait method takes filter: &ScannerFilter; Valkey gates via the shared #122 FilterGate; Postgres filters in-memory against denormalised namespace/instance_tag columns on ff_signal_event (migration 0009). (Stage B, #310 / #282)
subscribe_instance_tags n/a n/a Audited #311 (2026-04-24) + deferred: cairn's one-shot instance_tag_backfill pattern is served by list_executions + ScannerFilter::with_instance_tag(..) pagination; a realtime tag-churn stream is speculative demand we do not have today. Trait method remains and returns Unavailable on both backends; reserving the surface for future concrete demand. RFC-019 §instance_tags amended.

Post-v0.9 additive methods

Method Valkey Postgres Notes
suspend_by_triple impl impl Cairn #322 service-layer entry point for suspend-by-triple (pause-by-operator, enter-waiting-approval, cancel-with-timeout-record). Valkey: HGETALL exec_core pre-read for lane_id / current_attempt_index / current_worker_instance_id, then the existing ff_suspend_execution FCALL with fence fields sourced from the triple — identical Lua dedup / §3 replay contract as suspend. Postgres: single SELECT attempt_index FROM ff_exec_core pre-read (Postgres attempts are keyed by attempt_index, not the triple's attempt_id — the attempt_id field is advisory on this backend), then the shared suspend_core SERIALIZABLE body. Default trait impl returns EngineError::Unavailable { op: "suspend_by_triple" } so downstream impls remain non-breaking.

Deferred to v0.9 / post-0.8 (historical — closed at v0.11.0)

At v0.8.0 every Wave-9 Postgres cell was stub. No stub row was HTTP-exposed as a dispatch blocker because handlers returned a structured 503 Unavailable with an EngineError::Unavailable { op } body. All Wave-9 rows flipped to impl at v0.11.0; see next section.

Release: v0.11.0 (Wave 9, 2026-04-26)

PostgresBackend::capabilities() reports true for the 12 Wave-9 flags (cancel_execution, change_priority, replay_execution, revoke_lease, read_execution_state, read_execution_info, get_execution_result, budget_admin, quota_admin, list_pending_waitpoints, cancel_flow_header, ack_cancel_member). Every Stage-A table row above is now impl | impl except subscribe_instance_tags (n/a on both per #311) and the retired fetch_waitpoint_token_v07 (removed at v0.8.0).

Parity summary at v0.11.0

Family Valkey Postgres Notes
Ingress (5 methods) impl impl
Flow family (6 methods) impl impl
Flow cancel (cancel_flow_header, ack_cancel_member) impl impl Wave 9. Postgres cancel-backlog via migration 0014.
Read model (read_execution_info, read_execution_state, get_execution_result) impl impl Wave 9. Postgres join-on-read against ff_exec_core + ff_attempt.
Operator control (cancel_execution, change_priority, replay_execution, revoke_lease) impl impl Wave 9. Postgres SERIALIZABLE-fn template + ff_operator_event / ff_lease_event outbox emit.
Budget / quota (create_budget, reset_budget, create_quota_policy, get_budget_status, report_usage_admin) impl impl Wave 9. Postgres migrations 0012 + 0013 + Postgres-native BudgetResetReconciler.
Scheduler (claim_for_worker) impl impl
Waitpoints (list_pending_waitpoints) impl impl Wave 9. Postgres ff_waitpoint_pending with migration 0011 columns.
Admin rotation + seed impl impl
Cross-cutting impl impl
subscribe_instance_tags n/a n/a Deferred per #311 (speculative demand).

Wave 9 — shipped v0.11.0 (2026-04-26)

Full design record: rfcs/RFC-020-postgres-wave-9.md (ACCEPTED 2026-04-26, Revision 7). All 13 Wave-9 method rows flipped stub → impl on the Postgres column at v0.11.0; the subscribe_instance_tags row remains n/a on both backends per #311 (speculative demand, served by list_executions + ScannerFilter::with_instance_tag today).

Migrations shipped as part of Wave 9 (all additive, forward-only):

  • 0010 ff_operator_event outbox — new LISTEN/NOTIFY channel for operator-control events (priority_changed / replayed / flow_cancel_requested), preserving the RFC-019 ff_signal_event subscriber contract.
  • 0011 ff_waitpoint_pending additive columns (state, required_signal_names, activated_at_ms) — required to serve the real PendingWaitpointInfo contract.
  • 0012 ff_quota_policy family (ff_quota_policy + ff_quota_window + ff_quota_admitted, 256-way HASH-partitioned on partition_key).
  • 0013 ff_budget_policy additive columns — next_reset_at_ms
    • 4 breach-tracking + 3 definitional columns; enables a Postgres- native BudgetResetReconciler.
  • 0014 ff_cancel_backlog table — per-member cancel tracking driving cancel_flow_header + ack_cancel_member + the cancel-backlog reconciler.

Consumer migration notes at CONSUMER_MIGRATION_0.11.md.

Release: v0.12.0 (SQLite dev-only backend, RFC-023, 2026-04-26)

Third EngineBackend implementation lands at v0.12: ff-backend-sqlite. Scoped permanently to dev-only / testing per rfcs/RFC-023-sqlite-dev-only-backend.md §1.0. See docs/dev-harness.md for the consumer setup guide and docs/CONSUMER_MIGRATION_0.12.md for the upgrade checklist.

Positioning reminder. SQLite is a testing harness; Valkey is the engine; Postgres is the enterprise persistence layer. The parity rows below do NOT imply perf parity — SQLite is a single-writer, single-process, ~10³ write-QPS envelope. Production scale demands Valkey or Postgres.

Parity summary at v0.12.0 (all three backends)

Family Valkey Postgres SQLite Notes
Ingress (5 methods) impl impl impl create_flow, create_execution, add_execution_to_flow, stage_dependency_edge, apply_dependency_to_child.
Flow family (6 methods) impl impl impl describe_flow, list_flows, list_edges, describe_edge, cancel_flow, set_edge_group_policy.
Flow cancel (cancel_flow_header, ack_cancel_member) impl impl impl v0.12 (RFC-023 Phase 3.3). SQLite ports the PG cancel-backlog semantics (migration 0014 analogue); AlreadyTerminal { stored_* } idempotent replay matches PG. Operator-event outbox INSERT in-tx + post-commit broadcast wakeup.
Read model (read_execution_info, read_execution_state, get_execution_result) impl impl impl v0.12 (RFC-023 Phase 3.3). SQLite lowers PG's LEFT JOIN LATERAL to correlated subqueries; storage-tier literal normalisation reuses the PG helper shape.
Operator control (cancel_execution, change_priority, replay_execution, revoke_lease) impl impl impl v0.12 (RFC-023 Phase 3.2). SQLite runs the PG Rev-7 spine under BEGIN IMMEDIATE RESERVED-lock + WHERE-clause CAS fencing + retry_serializable for SQLITE_BUSY absorption. ff_lease_event + ff_operator_event outbox emits match PG exactly.
Budget / quota (create_budget, reset_budget, create_quota_policy, get_budget_status, report_usage_admin) impl impl impl v0.12 (RFC-023 Phase 3.4). SQLite hand-ports PG ff_budget_policy / ff_quota_policy family with positional ? placeholders; breach-counter columns (breach_count, soft_breach_count, last_breach_at_ms, last_breach_dim) maintained in-tx matching Valkey + PG Rev-6.
Waitpoints (list_pending_waitpoints) impl impl impl v0.12 (RFC-023 Phase 3.3). SQLite cursor-paginated scan of ff_waitpoint_pending with state IN ('pending','active') filter + NotFound on missing execution + (token_kid, token_fingerprint) parsed from the stored token.
Scheduler (claim_for_worker) impl impl stub SQLite non-goal per RFC-023 §5 — no scheduler is wired on SQLite. Supports::claim_for_worker = false on the SQLite backend. Dev harness consumers drive claim/complete/fail directly through the EngineBackend trait.
Subscribe (lease / completion / signal) impl impl impl v0.12. SQLite uses tokio::sync::broadcast channels for WAKEUP + outbox tables (migrations 0006 / 0007 / 0010 analogues) for cursor-resume, mirroring the RFC-019 contract. In-process only — cross-process subscribe fan-out is a PG-only property.
subscribe_instance_tags n/a n/a n/a Deferred per #311 (speculative demand; served by list_executions + ScannerFilter::with_instance_tag).
Admin rotation + seed impl impl impl
Tags (set_execution_tag, set_flow_tag, get_execution_tag, get_flow_tag) impl impl impl v0.12 follow-up (issue #433). Operator/control-plane point-writes + reads for caller-namespaced tags (e.g. cairn.session_id). Valkey routes to ff_set_{execution,flow}_tags Lua; PG/SQLite upsert raw_fields JSON(B) (tags.<k> for exec, top-level <k> for flow). Trait-side ff_core::engine_backend::validate_tag_key enforces the full-key regex ^[a-z][a-z0-9_]*\.[a-z0-9_][a-z0-9_.]*$ on every backend (stricter than Valkey's prefix-only Lua check — the Rust gate is the parity-of-record); read-side missing-row collapses to Ok(None).
Cross-cutting (backend_label, ping, shutdown_prepare, prepare) impl impl impl SQLite prepare returns PrepareOutcome::NoOp (migrations run via sqlx::migrate! at pool init); shutdown_prepare drains the N=1 scanner supervisor.

SQLite-specific notes

  • No partitioning. SQLite drops PARTITION BY HASH — one non-partitioned table per entity (RFC-023 §4.1). The scanner supervisor collapses to N=1 (one tick task per reconciler, no fan-out).
  • Single-writer + retry classifier. SERIALIZABLE-grade ops run under BEGIN IMMEDIATE + retry_serializable wrapping of is_retryable_sqlite_busy (SQLITE_BUSY / SQLITE_BUSY_TIMEOUT / SQLITE_LOCKED). MAX_ATTEMPTS = 3 matches PG's CANCEL_FLOW_MAX_ATTEMPTS.
  • Production guard. SqliteBackend::new refuses to construct without FF_DEV_MODE=1 and emits a WARN banner on every construction.
  • Migrations. SQLite migrations 0001 – 0014 are hand-ported SQLite-dialect files 1:1 numbered with Postgres for parity-drift detection (RFC-023 §4.1). CI lints the pairing via a .sqlite-skip sidecar allow-list.
  • Handle codec wire byte 0x03. BackendTag::Sqlite (wire byte 0x03) joins Valkey=0x01 / Postgres=0x02; handles minted by one backend are rejected by the other two with EngineError::Validation { kind: HandleFromOtherBackend }.

Phase 4c (capability-matrix snapshot test) flips the Supports flags above from Supports::none() at release time; the impl entries here are the post-flip state. See crates/ff-backend-sqlite/tests/capabilities.rs for the snapshot gate.

RFC-024 + agnostic-SDK trait surface at v0.12.0

The ten EngineBackend trait methods below are the v0.12 trait surface relevant to this release — nine are new in v0.12 (RFC-024 lease-reclaim + the agnostic-SDK PR-1..PR-5.5 trait-routed read primitives, scanner primitives, and grant-consumer dispatch); deliver_signal predates v0.12 on the trait and is included here because the SDK caller is no longer valkey-default-gated in v0.12. Row values reflect actual bodies on main at tag-prep time — per-backend grep 'fn <method>' crates/ff-backend-*/src/** against the trait defaults in crates/ff-core/src/engine_backend.rs.

Method Valkey Postgres SQLite Notes
issue_reclaim_grant impl impl impl RFC-024. PR-F (Valkey), PR-D (PG), PR-E (SQLite). Admission write for the lease-reclaim path; rejects NotReclaimable on non-active lifecycle or capability-mismatch.
reclaim_execution impl impl impl RFC-024. Grant-consumer for reclaim; mints a fresh attempt on lease_expired_reclaimable / lease_revoked. All three backends enforce max_reclaim_count (default 1000) + emit HandleKind::Reclaimed.
read_execution_context impl impl impl Agnostic-SDK PR-1 (#411). Point-read of ExecutionContext for the SDK worker's resume path. Missing row surfaces as Validation { kind: InvalidInput } on all three — SDK only invokes post-claim so a missing row is a loud invariant violation.
read_current_attempt_index impl impl impl Agnostic-SDK PR-3. Documented asymmetry (rustdoc on EngineBackend::read_current_attempt_index): Valkey returns AttemptIndex(0) when exec_core is present but current_attempt_index is absent/empty (pre-PR-3 inline parity); PG/SQLite use NOT NULL DEFAULT 0 so a pre-claim row reads 0 naturally, but a missing row surfaces as Validation { kind: InvalidInput }. The downstream claim_resumed_execution FCALL / SQL surfaces the proper NotAResumedExecution / ExecutionNotLeaseable reject.
read_total_attempt_count impl impl impl Agnostic-SDK PR-5.5 (#419). Valkey HGET on {exec}:core (monotonic total-claims counter, distinct from current_attempt_index); PG reads raw_fields JSONB extract via exec_core::read_total_attempt_count_impl; SQLite reads via json_extract in crates/ff-backend-sqlite/src/reads.rs.
claim_execution impl Unavailable (default) Unavailable (default) Agnostic-SDK PR-4 (#417). Trait-routed grant-consumer for the SDK claim_from_grant path. Valkey fires one ff_claim_execution FCALL; PG/SQLite inherit the Err(Unavailable { op: "claim_execution" }) default. PG's grant-consumer flow today routes through PostgresScheduler::claim_for_worker — a separate scheduler-side entry point distinct from this trait method; SQLite has no grant-consumer path. Full trait-level grant-consumer parity for PG + SQLite is v0.13 RFC-scope; see CONSUMER_MIGRATION_0.12.md §Known limitations.
scan_eligible_executions impl Unavailable (default) Unavailable (default) Agnostic-SDK PR-5 (#418). Scanner-bypass primitive — lane-eligible ZSET peek. Valkey-only by design; PG/SQLite consumers use the scheduler-routed claim_for_worker path which handles eligibility server-side. Exposed behind the direct-valkey-claim bench-only feature; not a general consumer surface.
issue_claim_grant impl Unavailable (default) Unavailable (default) Agnostic-SDK PR-5. Scheduler-bypass claim-grant write. Pairs with scan_eligible_executions; same Valkey-only scope and direct-valkey-claim feature gating. PG/SQLite consumers use claim_for_worker.
block_route impl Unavailable (default) Unavailable (default) Agnostic-SDK PR-5. Moves an execution from lane-eligible ZSET to blocked_route ZSET after a capability-mismatch reject. Valkey-only; PG/SQLite admission rejects are handled server-side inside the scheduler-routed claim_for_worker path.
deliver_signal impl impl impl Agnostic-SDK PR-3 (#416). Method was already trait-routed pre-v0.12 across all three backends; the v0.12 change is removing the valkey-default-feature module gate on the SDK caller so consumers compiling --no-default-features --features sqlite (or postgres) can reach it.

Consumer-facing limitations for this surface live in CONSUMER_MIGRATION_0.12.md §Known limitations.

PR-7b Cluster 1 — Foundation scanner operations (cairn #436)

Per-execution write hooks invoked by the engine scanner loop. Each method is per-row tx on Postgres, mirroring the Valkey FCALL semantic. SQLite hosts its own reconciler supervisor (RFC-023 Phase 3.5 §4.1) and inherits the trait default — these hooks are never invoked under SQLite's N=1 topology.

Method Valkey Postgres SQLite Notes
mark_lease_expired_if_due impl impl Unavailable (default, intentional) Valkey: FCALL ff_mark_lease_expired_if_due. Postgres: reconcilers::lease_expiry::release_for_execution (per-row tx on ff_attempt + ff_exec_core).
promote_delayed impl impl Unavailable (default, intentional) Valkey: FCALL ff_promote_delayed. Postgres: reconcilers::delayed_promoter::promote_for_execution — re-checks (lifecycle_phase='runnable', eligibility_state='not_eligible_until_time', deadline_at_ms <= now) and flips eligibility_state → 'eligible_now' + clears deadline_at_ms under FOR UPDATE. deadline_at_ms is overloaded with the execution_deadline scanner on Postgres; the (lifecycle_phase, eligibility_state) tuple disambiguates. A dedicated delay_until_ms column is a post-v0.12 additive cleanup.
close_waitpoint impl impl Unavailable (default, intentional) Valkey: FCALL ff_close_waitpoint (ARGV [waitpoint_id, "never_committed"]). Postgres: reconcilers::pending_wp_expiry::close_for_execution — DELETE on ff_waitpoint_pending + append {"status":"never_committed",...} marker into ff_suspension_current.member_map[waitpoint_key] so the composite-condition evaluator sees the failure at resume time.
expire_execution (AttemptTimeout) impl impl Unavailable (default, intentional) Valkey: FCALL ff_expire_execution (ARGV [eid, "attempt_timeout"]). Postgres: reconcilers::attempt_timeout::expire_for_execution — per-row tx terminates or retries per policy.retry_policy.max_retries + emits ff_completion_event on terminal + ff_lease_event outbox.
expire_execution (ExecutionDeadline) impl impl Unavailable (default, intentional) Valkey: FCALL ff_expire_execution (ARGV [eid, "execution_deadline"]). Postgres: reconcilers::execution_deadline::expire_for_execution — per-row tx flips ff_exec_core.lifecycle_phase='terminal' with last_failure_message='execution_deadline' + clears active attempt lease + emits ff_completion_event{outcome='expired'} + ff_lease_event outbox. Candidate selection scopes by lifecycle_phase='active' to disambiguate from promote_delayed's deadline_at_ms overload.
expire_suspension impl impl Unavailable (default, intentional) Valkey: FCALL ff_expire_suspension. Postgres: reconcilers::suspension_timeout::expire_for_execution — per-row tx honors TimeoutBehavior (Fail/Cancel/Expire/Escalate → terminal; AutoResumeWithTimeoutSignal → synthetic signal in member_map[__timeout__]).

All five foundation-scanner ops land real SQL on Postgres as of PR-7b Cluster 1 + Wave-9-minimal (this row). Consumer deployments switching from Valkey to Postgres no longer see Unavailable for scanner-op writes.

get_execution_namespace — dedicated single-field point-read used by scanner::should_skip_candidate to preserve the 1-HGET cost contract — lands as impl on all three backends (Valkey: HGET :core namespace; Postgres: SELECT raw_fields->>'namespace' FROM ff_exec_core WHERE ...; SQLite: SELECT json_extract(raw_fields, '$.namespace') FROM ff_exec_core WHERE ...).

cairn #389 — service-layer typed FCALL surface (v0.13)

Six EngineBackend methods that mirror existing Handle-taking peers (complete / fail / renew) but accept (execution_id, fence) tuples — lets control-plane callers (cairn's valkey_control_plane_impl.rs, future non-Handle consumers) drop the raw ferriskey::Value + check_fcall_success + parse_* pattern. Args/Result types already existed in ff_core::contracts; this landing adds the trait-method surface.

All three backends now ship bodies at v0.13.0. PG + SQLite were deferred to Unavailable at landing and filled in during the v0.13 delivery (#453 for these 6 + claim_execution, plus #33 for SQLite).

Method Valkey Postgres SQLite Notes
complete_execution impl impl impl Service-layer peer of complete(handle). Valkey pre-reads lane_id + current_worker_instance_id from exec_core then delegates to ff_complete_execution. PG + SQLite land in #453/2 + #33.
fail_execution impl impl impl Service-layer peer of fail(handle, …). Same exec_core pre-read pattern; delegates to ff_fail_execution. PG + SQLite land in #453/3 + #33.
renew_lease impl impl impl Service-layer peer of renew(handle). No pre-read needed — ExecKeyContext suffices. Delegates to ff_renew_lease. PG + SQLite land in #453/1 + #33.
resume_execution impl impl impl Lifecycle transition from suspended to runnable. Valkey pre-reads lane_id + current_waitpoint_id from exec_core then delegates to ff_resume_execution. PG + SQLite land in #453/4 + #33.
check_admission impl impl impl Atomic admission check against a quota policy. Takes quota_policy_id: &QuotaPolicyId and dimension: &str outside the CheckAdmissionArgs struct — quota keys live on the {q:<policy>} partition that cannot be derived from execution_id, and widening the existing pub-fields struct would semver-break. Empty dimension"default". Delegates to ff_check_admission_and_record. PG + SQLite land in #453/5 + #33.
evaluate_flow_eligibility impl impl impl Read-only eligibility status (eligible, blocked_by_dependencies, …). Delegates to ff_evaluate_flow_eligibility. PG + SQLite land in #453/6 + #33.
claim_execution impl impl impl Service-layer peer of claim_for_worker that takes (execution_id, worker) directly — bypasses scheduler lane-routing. Valkey: FCALL ff_claim_execution. PG + SQLite: single-tx attempt-insert + exec_core lifecycle flip, mirrors the PG ff_attempt row UPSERT shape. Lands in #453/7 + #33.

cairn #454 — control-plane additions (v0.13)

Four additional typed EngineBackend trait methods added during the v0.13 cycle to close the last raw-ferriskey dispatch sites in cairn's control plane. All four land bodies on every backend at v0.13.0 — no staged deferrals.

Method Valkey Postgres SQLite Notes
record_spend impl impl impl Per-execution budget attribution via ff_budget_usage_by_exec (migration 0020). Valkey: FCALL ff_record_spend (per-execution HASH keyed by dimension). PG + SQLite: upsert under the budget's READ-COMMITTED / BEGIN IMMEDIATE tx — same counter semantics as ff_budget_usage but scoped per (budget_id, execution_id, dimension) so reversal is idempotent-safe.
release_budget impl impl impl Idempotent reversal. Reads the per-execution stamp, decrements the aggregate counter by the exact recorded amount, DELETEs the per-execution row — all in one tx. Missing-row returns Ok(()).
deliver_approval_signal impl impl impl Approval-typed specialisation over the suspend/signal path. Valkey: FCALL ff_deliver_approval_signal. PG + SQLite: inlined approval validation inside the ff_suspension_current + ff_signal_event outbox tx.
issue_grant_and_claim impl impl impl Backend-atomic composition of issue_reclaim_grant + claim_from_grant. Valkey: single FCALL ff_issue_grant_and_claim. PG + SQLite: single tx wraps both helper queries — no mid-flight observable state where a grant exists without a matching claim.

PR-7b scanner + cairn #434 additions (v0.13)

Seven new trait methods landed during the PR-7b trait-routed scanner push (cairn #436) plus cairn #434 (waitpoint-token read). Row values reflect bodies on main at v0.13 tag-prep time.

Cluster 2 — reconciler scanners (#445):

Method Valkey Postgres SQLite Notes
unblock_execution impl Unavailable (default, intentional) Unavailable (default) Valkey: FCALL ff_unblock_execution — scanner-driven move from blocked_route ZSET back to lane-eligible after a capability change. Postgres is architecturally Unavailable: the PG scheduler re-evaluates eligibility live via SQL on every claim_for_worker tick (ff_exec_core eligibility columns + ff_lane routing join), so there is no persisted blocked-index to reconcile. SQLite inherits PG's live-SQL approach; same rationale.

Cluster 3 — cancel-family scanners (#444):

Method Valkey Postgres SQLite Notes
drain_sibling_cancel_group impl Unavailable (default, intentional) Unavailable (default) Valkey: FCALL ff_drain_sibling_cancel_group — scanner drains the per-group sibling-cancel backlog into per-member cancel writes. Postgres cancel_flow_header + ack_cancel_member already use ff_cancel_backlog (migration 0014) and the PG cancel-backlog reconciler drives per-member writes directly from the table; no separate drain step exists. SQLite mirrors the PG cancel-backlog design (RFC-023 Phase 3.3).
reconcile_sibling_cancel_group impl Unavailable (default, intentional) Unavailable (default) Valkey: FCALL ff_reconcile_sibling_cancel_group — scanner recounts group_pending vs group_active and flips group-terminal transitions. Postgres + SQLite derive group state from SQL aggregates over ff_cancel_backlog + ff_exec_core at read time; no persisted group counter to reconcile.

Cluster 2b-A — tally-recompute scanners (#447):

Method Valkey Postgres SQLite Notes
reconcile_execution_index impl Unavailable (default, intentional) Unavailable (default) Valkey: FCALL ff_reconcile_execution_index — recomputes per-lane eligible ZSET counters to repair drift from crash-in-middle-of-FCALL sequences. Postgres runs SERIALIZABLE-tx counters under ff_exec_core + live SQL eligibility evaluation on every claim; no persisted counter to drift, nothing to reconcile. SQLite inherits the PG design under BEGIN IMMEDIATE.
reconcile_budget_counters impl Unavailable (default, intentional) Unavailable (default) Valkey: FCALL ff_reconcile_budget_counters — recomputes ff:budget:* hash counters from ff_budget_usage entries. Postgres maintains breach counters + next_reset_at_ms in-tx on ff_budget_policy + ff_budget_usage (migration 0013) under READ-COMMITTED INSERT-or-UPDATE; drift-free by construction. SQLite mirrors PG (RFC-023 Phase 3.4).
reconcile_quota_counters impl Unavailable (default, intentional) Unavailable (default) Valkey: FCALL ff_reconcile_quota_counters — recomputes per-window quota counters after partial-apply crashes. Postgres maintains ff_quota_window + ff_quota_admitted in-tx (migration 0012, 256-way HASH-partitioned); drift-free by construction. SQLite mirrors PG.

Cluster 2b-B — projection + retention scanners (#449, pending merge):

Rows for project_flow_summary and trim_retention depend on PR #449 merging. They will be added in a follow-up commit once #449 lands, OR as part of #449's own doc updates. Expected shape:

  • project_flow_summary — Valkey impl, Postgres impl (migration 0019_flow_summary.sql), SQLite default Unavailable.
  • trim_retention — Valkey impl, Postgres impl, SQLite default Unavailable. Retention overrides (policy.stream_policy.retention_ttl_ms) are Valkey-only; PG uses the global default. Not a regression — design intent documented in CONSUMER_MIGRATION_0.13.md §Known limitations.

#453 / PR-7b — typed-FCALL bodies (cairn blocker): SQLite bodies shipped in v0.13 Phase 1 (#33) mirroring PG with dialect swaps (BEGIN IMMEDIATE RESERVED lock; json_set(doc, '$.k1', v1, '$.k2', v2, …) multi-key patch; excluded.col UPSERT keyword lowercase). Tests in crates/ff-backend-sqlite/tests/typed_*.rs; 38 tests total, all run against :memory: in every CI matrix cell.

Method Valkey Postgres SQLite Notes
renew_lease impl impl impl PG: READ COMMITTED tx + FOR UPDATE on ff_attempt. Fence validation is epoch-only (PG doesn't persist lease_id/attempt_id to disk; lease_epoch monotonic bump is sufficient to catch the same violations Valkey's full-triple check covers). Error map: fence_requiredValidation{InvalidInput}, stale_leaseState(StaleLease), lease_expiredState(LeaseExpired), execution_not_activeContention(ExecutionNotActive{...}). Integration test: crates/ff-backend-postgres/tests/typed_renew_lease.rs.
complete_execution impl impl impl PG: READ COMMITTED tx with FOR UPDATE on ff_attempt + ff_exec_core. Fence-or-operator-override gate matches Valkey — source=OperatorOverride skips epoch fence but lifecycle gate still fires. Flips ff_exec_core to terminal/completed + stores result payload; emits ff_completion_event (→ pg_notify DAG cascade) + lease_event::EVENT_REVOKED. Integration test: crates/ff-backend-postgres/tests/typed_complete_execution.rs.
fail_execution impl impl impl PG: retry/terminal branch mirrors Valkey. Retry policy JSON parsed into fixed/exponential backoff; retry_count < max_retriesRetryScheduled { delay_until, next_attempt_index }. Otherwise → TerminalFailed (flips to terminal/failed, emits completion outbox with outcome='failed'). Integration test: crates/ff-backend-postgres/tests/typed_fail_execution.rs.

PR-7b Wave 0a — backend-agnostic primitives + start_* no-panic (cairn #436):

Method Valkey Postgres SQLite Notes
server_time_ms impl impl impl Backend-agnostic wall-clock primitive. Valkey: TIME command. Postgres: SELECT (EXTRACT(EPOCH FROM clock_timestamp()) * 1000)::bigintnot now(), which returns the transaction start timestamp and can be stale under long-running tx; scanners need a fresh wall-clock read. SQLite: SELECT CAST((julianday('now') - 2440587.5) * 86400000 AS INTEGER). Default falls back to SystemTime::now() so out-of-tree impls (cairn mocks, test doubles) remain source-compatible. Used by 15 scanners to compute "due" thresholds.
read_exec_core_fields impl impl impl Backend-agnostic exec_core field read returning HashMap<String, Option<String>>. Valkey: HMGET on the {p:N}:core hash. Postgres: dynamic SELECT from ff_exec_core with per-field CAST routing + raw_fields ->> for JSONB-resident fields (current_waitpoint_id, current_worker_instance_id, budget_ids, quota_policy_id). SQLite: mirrors PG with json_extract(raw_fields, '$.field'). Unknown fields project NULL (Valkey HMGET absent-field parity). Scanners continue to use direct HGETs on the Valkey-only fallback path; trait-ified rewrites are Wave 0b.

Additionally, Engine::start_with_completions / start_with_metrics / start no longer panic on non-Valkey EngineBackend implementations. Prior to PR-7b the constructor downcast to ValkeyBackend and panic!d on None; now non-Valkey backends skip the in-tree scanner spawn block and only wire the partition router + optional completion dispatch loop. Consumers embedding Engine with a custom backend are expected to own their reconciler supervisor (the shape shipped by PostgresBackend::with_scanners and SqliteBackend). Regression guard: engine_start_does_not_panic_on_non_valkey_backend in crates/ff-engine/tests/scanner_namespace_point_read.rs.

cairn #434 — waitpoint-token read (#438):

Method Valkey Postgres SQLite Notes
read_waitpoint_token impl impl impl Non-mutating point read of the stored waitpoint token. Replaces the Valkey-only fetch_waitpoint_token_v07 retired at v0.8.0 with a backend-agnostic surface. Valkey: HGET on the partition-local waitpoint hash. Postgres: SELECT token FROM ff_waitpoint_pending WHERE partition = $1 AND waitpoint_id = $2 (suspend_ops::read_waitpoint_token_impl). SQLite: same shape via crates/ff-backend-sqlite/src/reads.rs::read_waitpoint_token_impl. Missing row surfaces as Ok(None) on all three backends (signal-bridge poll-on-resume pattern relies on this).

Summary of Valkey-scanner asymmetry (PG/SQLite Unavailable). The 6 scanner methods above are all Valkey-impl / PG+SQLite Unavailable by design. This is not a parity gap — the reconcile-drift pattern these scanners implement only exists on the Valkey backend's CRDT-ish counter topology. Postgres uses SERIALIZABLE transactions + live SQL eligibility re-evaluation; SQLite uses BEGIN IMMEDIATE + the same PG reconciler shapes. Neither backend can drift, so neither needs a reconcile hook. Consumers embedding Engine get backend-agnostic scanner routing: PR-7b final scaffolding asserts Unsupported on PG/SQLite for scanner-shaped calls, and the engine's scanner supervisor skips the tick entirely on those backends.

Release: v0.14.0 (RFC-025 worker registry, 2026-05-02)

cairn #473 — worker-registry primitives (RFC-025):

Method Valkey Postgres SQLite Notes
register_worker impl impl impl Atomic idempotent register + refresh. Valkey: ff_register_worker Lua FCALL (library v34) with KEYS=3 (alive/caps/idx) + ARGV=(instance_id, worker_id, lanes_csv, caps_csv, ttl_ms, now); SET PX (no NX — idempotent overwrite per RFC §9.3) + HSET caps hash (worker_id, lanes_csv, caps_csv, ttl_ms, registered_at_ms, last_heartbeat_ms) + PEXPIRE caps + SADD index. Returns "registered" / "refreshed". Postgres: INSERT INTO ff_worker_registry ON CONFLICT DO UPDATE RETURNING (xmax = 0) inside a tx; preflight SELECT ... FOR UPDATE rejects worker_instance_id reassigned under a different worker_id with Validation(InvalidInput, "instance_id reassigned"). SQLite: preflight SELECT + INSERT ON CONFLICT DO UPDATE in a BEGIN IMMEDIATE tx (mirrors PG sans partitioning).
heartbeat_worker impl impl impl TTL refresh. Valkey: single atomic ff_heartbeat_worker FCALL (library v34) — HGET caps ttl_ms → PEXPIRE alive + PEXPIRE caps + HSET last_heartbeat_ms; absent caps hash surfaces as NotRegistered. Postgres/SQLite: UPDATE with last_heartbeat_ms + liveness_ttl_ms > $now predicate — refuses to revive TTL-expired rows; RETURNING liveness_ttl_ms so the caller receives next_expiry_ms = now + ttl. Append 'heartbeat' event row on PG/SQLite.
mark_worker_dead impl impl impl Operator-driven death (distinct from passive TTL expiry). All three backends validate reason ≤ 256 bytes + no control chars (MARK_WORKER_DEAD_REASON_MAX_BYTES). Valkey: sequential DEL alive + DEL caps + SREM index (atomicity ceiling sufficient per §Non-goals 1). PG/SQLite: DELETE RETURNING 1 → Marked; 0 rows → NotRegistered (idempotent). Event row 'marked_dead' with reason.
list_expired_leases impl impl impl Enumerate leases with expires_at_ms <= as_of. Cursor is ExpiredLeasesCursor { expires_at_ms, execution_id } tuple (§9 locked — stable under equal-expiry). Defaults: limit = 1000, max_partitions_per_call = 32, cap limit = 10_000. Valkey: ZRANGEBYSCORE ff:idx:{p}:lease_expiry fanned across max_partitions_per_call partitions from cursor's partition offset; merged + sorted. PG: JOIN ff_attempt + ff_exec_core on existing ff_attempt_lease_expiry_idx, namespace filter via c.raw_fields->>'namespace'. SQLite: same JOIN, namespace via json_extract, execution_id hex-normalised for the cursor strict-greater predicate. Synthetic lease_id derived byte-identically across PG/SQLite via synthetic_lease_uuid(exec_uuid, attempt_index, lease_epoch). Gate: all(core, suspension).
list_workers impl impl impl Live worker listing for operator tooling (RFC §9.4 scope addition). Namespace-scoped; cross-namespace (namespace = None) rejects with EngineError::Unavailable on all three backends — single-key WorkerInstanceId cursor can't disambiguate across tenants. Valkey: SSCAN ff:idx:{ns}:workers (COUNT=100, cluster-safe) → concurrency-capped parallel HGETALL on ff:worker:{ns}:{inst}:caps; TTL-race entries (caps missing) logged at tracing::debug! + skipped. PG/SQLite: straight SELECT with ORDER BY worker_instance_id LIMIT. capabilities_csv parsed via `split(',').filter(

Backend-specific notes:

  • Valkey key shape: all RFC-025 writes use namespace-prefixed keys (ff:worker:{ns}:{inst}:alive / :caps, ff:idx:{ns}:workers). The legacy SDK-preamble shape (ff:worker:{inst}:alive, ff:idx:workers) coexists during Phase 5 rollout; both are read-safe because no writer sees the other's keys. Preamble + unblock-scanner migration to the new shape lands alongside this matrix update.

  • PG partitioning: ff_worker_registry + ff_worker_registry_event are both 256-way HASH-partitioned on partition_key = (fnv1a_u64(worker_instance_id.as_bytes()) % 256)::smallint — identical derivation to ff_budget_usage_by_exec (migration 0020) so register / heartbeat / mark_dead / ttl_sweep all land on the same partition for a given worker_instance_id.

  • SQLite flat tables: per RFC-023 §4.1 A3 (single-writer, no HASH partitioning). lanes stored as sorted-joined CSV since SQLite lacks text[].

  • TTL sweep (PG + SQLite): new worker_registry_ttl_sweep scanner, 30s cadence matching budget_reconciler. CTE: DELETE FROM ff_worker_registry WHERE last_heartbeat_ms + liveness_ttl_ms < $now RETURNING ... → INSERT INTO ff_worker_registry_event ('ttl_swept'). Valkey uses native PEXPIRE, no scanner.

  • RFC-018 Supports flags: 5 new bools — register_worker, heartbeat_worker, mark_worker_dead, list_expired_leases, list_workers. All true on every in-tree backend at RFC-025 acceptance.

Release: v0.15.0 (FF #511 scheduler-agnostic, 2026-05-03)

cairn #511 — drop ferriskey::Client coupling from ff_scheduler::Scheduler:

Method Valkey Postgres SQLite Notes
release_admission impl impl impl Idempotent quota-admission slot release. Valkey wraps the pre-#511 ff_release_admission FCALL; PG/SQLite DELETE ff_quota_admitted + UPDATE active_concurrency = GREATEST(c-1, 0) in one tx.
read_quota_policy_limits impl impl impl Typed snapshot of (max_requests_per_window, requests_per_window_seconds, active_concurrency_cap, jitter_ms). Valkey: single HMGET. PG/SQLite: SELECT (no jitter_ms column; defaults to 0). Returns None when the policy row is absent.
block_execution_for_admission impl Unavailable Unavailable Generalised admission block covering budget/quota/capability reason codes via the BlockingReason enum. Valkey routes to ff:idx:{p}:lane:{lane}:blocked_<reason> via the existing Lua FCALL. PG/SQLite stay Unavailable — scheduler is a Valkey-only concept; PG/SQLite wire native claim paths.
read_budget_usage_and_limits impl Unavailable Unavailable Typed snapshot of a budget's HGETALL limits + per-dim HGET usage. Same Valkey-only posture — scheduler path.

Scheduler refactor (FF #511 Phase 2c + Phase 3):

  • Scheduler.backend: Weak<dyn EngineBackend> — admission primitives route through the trait (6 call-sites migrated: server_time_ms, read_quota_policy_limits, check_admission, block_execution_for_admission, release_admission, issue_claim_grant).
  • Scheduler.client: Option<ferriskey::Client>Some on Valkey deploys (scanner ZRANGEBYSCORE + handful of exec_core HGETs still need direct access); None on backend-only deploys. claim_for_worker short-circuits to Ok(None) when client is None — PG/SQLite consumers wire their own native claim paths (PostgresScheduler, SqliteBackend::claim_for_worker).
  • New Scheduler::new_with_backend(weak_backend, config) constructor — no ferriskey argument.

Why the cycle is safe: Scheduler holds Weak<dyn EngineBackend>, not Arc. Valkey and Postgres backends embed the Scheduler inside their struct; an Arc cycle would leak both. Upgrade-or-degrade pattern on every scheduler call handles the rare backend-dropped-mid-tick case by returning SchedulerError::Config.

RFC-018 Supports flags: release_admission, read_quota_policy_limits, block_execution_for_admission, read_budget_usage_and_limits — 4 new bools. release_admission + read_quota_policy_limits on all 3 backends; the other 2 on Valkey only.