Skip to content

Commit efb1e1a

Browse files
Branimir Rakiccursoragent
authored andcommitted
feat(agent,cli,publisher): OT-RFC-38 LU-6 — opaque SWM hosting on cores + member host-catchup fallback
Lands the last Phase A milestone for SPEC_CG_HOSTING_MEMBERSHIP: cores opaquely host a curated CG's encrypted SWM substrate without ever holding the chain key, and a member that was offline (but whose sender-key state survived on disk) can recover the missed history from cores when the curator AND every other member is also offline. The §1.1 user-visible promise — "any member can recover the full history as long as one peer that has the bytes is reachable" — now holds for the "every member is offline; only cores are reachable" case. The §5.2 invariant ("cores never possess plaintext or keys") is preserved: cores store the wire bytes verbatim, never attempt decryption, and `SwmHostModeStore` rejects zero-length envelopes. Source surfaces - `packages/agent/src/swm/host-mode-store.ts` (new) — file-backed append-only per-CG log of opaque ciphertext envelopes. Separate TTL + per-CG byte cap for unregistered (default 6h / 1 MiB — pre-registration staging per §1.2) vs registered (30d / 64 MiB). Eight-byte-BE timestamp + seqno + four-byte-BE len framing; one file per CG, named by `sha256(cgId)` base64url so user-supplied ids stay filesystem-safe. - `packages/agent/src/swm/host-catchup-wire.ts` (new) — JSON wire format for the new libp2p request/response protocol `/dkg/10.0.1/swm-host-catchup`. `denied` vs empty `entries` distinguishes "I refuse to serve" from "you're up-to-date / I have nothing"; envelope bytes base64-encoded so the same JSON works as a debug-tool target. - `packages/agent/src/dkg-agent.ts` — host-mode reconciler that subscribes cores to a curated CG's SWM topic in HOST MODE (drops bytes that aren't ciphertext, accepts and stores those that are), the `/dkg/10.0.1/swm-host-catchup` request handler, the member-side `catchupSwmFromHost` / `catchupSwmFromConnectedHosts` clients, and the `enableSwmHostModeFor` operator surface for explicit designation. - `packages/publisher/src/workspace-handler.ts` — new `{ trustedReplay: true }` option on `SharedMemoryHandler.handle()` that skips the two pubsub-transport-layer peer assertions (`publisherPeerId === fromPeerId`, peer-allowlist gate). The cryptographic chain — gossip-envelope signature verification + sender-key AEAD decryption — is still enforced for every replayed envelope, so a host can't forge or tamper with what it stored opaquely, only relay it. - `packages/cli/src/daemon/routes/memory.ts`: - `POST /api/shared-memory/host-mode/subscribe` — operator-driven designation (Phase A surface that the future sharding-table auto-subscribe plugs into). - `POST /api/shared-memory/host-catchup` — dedicated member-side catchup endpoint for debugging a specific peer's hosting. - `GET /api/shared-memory/host-mode/stats` — per-daemon diagnostics (cgCount / totalBytes / totalEntries / subscribedCgIds). - Auto-fallback in `POST /api/shared-memory/catchup`: when the standard sync path inserts 0 triples, transparently invokes `catchupSwmFromConnectedHosts` against the same peer set. Opt out with `{ hostCatchupFallback: false }`. - `packages/core/src/constants.ts` — `PROTOCOL_SWM_HOST_CATCHUP` string. Devnet validation (`scripts/devnet-test-rfc38-late-joiner.sh`) - SCENARIO D (new, LU-6 happy path): curated CG with `[curator, member]`; cores explicitly designated via the new `host-mode/subscribe` endpoint (note: cores are NOT pre-created on CG_D — gossiped meta would otherwise expand the allowlist union and shortcut past the host-mode path). Curator handshake + 1 triple → member receives chain key. Member killed. Curator writes 5 more (ciphertext flows to cores). Curator killed. Member restarted (chain key persists on disk). Catchup endpoint returns `hostCatchup.ranFallback: true` and member ends with all 6 triples. The test asserts on `totalEntries > 0` across cores AND on `hostCatchup.ranFallback === true` so we'd catch a silent regression where the host-mode path stops running. - SCENARIO C updated: now correctly documents that cores DO serve ciphertext, and the outsider's 0-inserted result is the confidentiality invariant (no chain key → AEAD decrypt fails on apply), not a missing-hosting gap. Documentation - `docs/specs/SPEC_CG_HOSTING_MEMBERSHIP.md` §7.1.1 — LU-6 moved from "deferred" to "landed" with a description of the operator- driven designation surface, the wire protocol, the auto-fallback in catchup, and the explicit Phase B carry-overs (sharding-table auto-subscribe, per-wallet rate limits, cross-core ciphertext re-gossip). Run instructions ./scripts/devnet.sh start 6 ./scripts/devnet-test-rfc38-late-joiner.sh # ~2 min, 4 scenarios Tested - All 4 late-joiner scenarios PASS on a fresh 6-node devnet (4 cores + 2 edges). SCENARIO D output shows `totalEntries: 2` × 4 cores = 8 stored envelopes, `hostCatchup.ranFallback: true`, `applied: 1` (the SWM batch envelope; the sender-key setup envelope was already processed pre-kill so its replay is correctly skipped), and `N_D_POST: 6` via SPARQL. - `packages/agent/test/swm/host-mode-store.test.ts` — 8 new unit tests covering monotonic seqno, persistence across restarts, TTL pruning, byte-cap enforcement, registered-vs-unregistered limit switching, zero-length-envelope rejection. - `packages/publisher` test suite — 965 passed / 1 skipped (the `trustedReplay` option change is additive; no regressions). Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 94c96bd commit efb1e1a

10 files changed

Lines changed: 1724 additions & 25 deletions

File tree

docs/specs/SPEC_CG_HOSTING_MEMBERSHIP.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -837,14 +837,18 @@ The Phase A milestones above are mostly landed. Devnet validation lives in `scri
837837
| LU-9: member-attestation token mint + outsider verification (with optional `membershipResolver` chain hook) | `packages/agent/src/swm/member-attestation.ts` + `POST /api/attestation/{mint,verify}` | `devnet-test-rfc38-lu9.sh` | ✅ landed |
838838
| LU-10: public-CG regression sweep (publish + anonymous catchup + verify-batch + attestation, all on a public CG) | reuses LU-5/7/8/9 surfaces with `accessPolicy: 0` | `devnet-test-rfc38-lu10.sh` | ✅ landed |
839839
| Cross-CG isolation, multi-member (3-way), scale (50 triples / 25 KAs), late-joiner (member-from-member with curator offline) | scenario coverage on top of the landed surfaces | `devnet-test-rfc38-{cross-cg,multi-member,scale,late-joiner}.sh` | ✅ landed |
840-
| LU-6: sharding-table-driven SWM substrate subscription on cores + pre-registration staging (TTL, byte caps, ciphertext fanout to cores) so cores can serve catchup when the curator AND all live members are offline | (deferred) | `devnet-test-rfc38-late-joiner.sh` SCENARIO C documents the gap with a passing fail-soft assertion (cores-only catchup returns 0 triples cleanly, no crash) | ⚠️ deferred (see below) |
841-
842-
**What "deferred LU-6" means in practice on this branch:**
843-
844-
- A new member joining when the curator OR any other current member is online → catches up the full SWM history via `POST /api/shared-memory/catchup` against that peer. ✅ works.
845-
- A new member joining when the curator AND all current members are offline → catchup against cores returns 0 triples. The endpoint shape is correct (`peersAttempted > 0`, `totalInsertedTriples == 0`, no crash); the data simply isn't there because today's cores don't subscribe to curated CG SWM gossip topics outside the member allowlist. ⚠️ gap.
846-
847-
This gap is acceptable for the Phase A user-visible surface (the §1.1 bug was about *publishing*, not about a specific late-joiner pattern), but is the next thing to land for the full "scenarios 1–4 of §2.4" promise to be honest. The substrate-subscription work itself is non-trivial: it touches the `SharedMemoryHandler` apply path (currently signature-checks the publisher and applies plaintext quads; needs a parallel "store opaque ciphertext under sharding-table assignment" path) and the SWM gossip wire format (Phase B in §7.2 will move it to AEAD per §5.2; Phase A could ship a transitional "cores subscribe but only persist for members" mode if needed sooner).
840+
| LU-6: opaque SWM ciphertext hosting on cores + member-side host-catchup fallback (so curated CGs can be recovered by an offline-then-online member even when the curator and every other member is also offline) | `packages/agent/src/swm/host-mode-store.ts`, `packages/agent/src/swm/host-catchup-wire.ts`, `packages/agent/src/dkg-agent.ts` (host-mode reconciler, `enableSwmHostModeFor`, `catchupSwmFromConnectedHosts`, `handleSwmHostCatchup`); `packages/cli/src/daemon/routes/memory.ts` (`POST /api/shared-memory/host-mode/subscribe`, `POST /api/shared-memory/host-catchup`, `GET /api/shared-memory/host-mode/stats`, plus auto-fallback in `POST /api/shared-memory/catchup`); `packages/publisher/src/workspace-handler.ts` `trustedReplay` flag for the receiver path | `devnet-test-rfc38-late-joiner.sh` SCENARIO D (LU-6 happy path: cores host ciphertext, member with surviving chain key catches up while curator+everyone offline) + SCENARIO C (confidentiality invariant: outsider without chain key applies 0 triples even though cores serve ciphertext) | ✅ landed |
841+
842+
**LU-6 shape (Phase A):**
843+
844+
- **Operator-driven hosting designation.** Cores opt into a CG's encrypted SWM substrate via `POST /api/shared-memory/host-mode/subscribe { contextGraphId }`. The (future) sharding-table-driven auto-assignment from §5.1.1 plugs in as a periodic reconciler over `listContextGraphs()`; the explicit endpoint is the Phase A surface for operators (and the devnet harness) to designate hosting without waiting on the full sharding-table integration.
845+
- **Opaque storage.** Cores store the raw gossip envelope bytes under a per-CG append-only log (`packages/agent/src/swm/host-mode-store.ts`) with a separate TTL + per-CG byte cap for unregistered vs registered CGs (defaults: 6h / 1 MiB unregistered, 30d / 64 MiB registered — pre-registration staging per §1.2 and §5.1.1). Cores never possess the chain key and cannot decrypt; the cryptographic invariant in §5.2 holds.
846+
- **Member-side catchup.** `/dkg/10.0.1/swm-host-catchup` is a new libp2p request/response protocol (JSON over the universal Messenger substrate) that lets members fetch stored ciphertext envelopes by `(contextGraphId, sinceSeqno)`. Members re-feed each envelope through `SharedMemoryHandler.handle(..., { trustedReplay: true })`, which skips the two pubsub-transport-layer peer assertions (`publisherPeerId === fromPeerId`, peer-allowlist gate) while *keeping* the cryptographic chain (gossip-envelope signature + sender-key AEAD decryption) intact.
847+
- **Auto-fallback in catchup.** `POST /api/shared-memory/catchup` runs the standard sync path first; if that returns 0 triples (typical of the "every member is offline; only cores still hold the substrate" scenario), it transparently invokes `catchupSwmFromConnectedHosts` against the same peer set. Opt out with `{ hostCatchupFallback: false }`.
848+
- **What LU-6 does NOT add (deferred to later phases):**
849+
- Sharding-table-driven auto-subscribe (Phase B): cores currently subscribe only via the explicit `host-mode/subscribe` endpoint or the periodic reconciler's local-CG scan. The eventual goal is "core looks up its assignment in the sharding table and subscribes accordingly" — the abstraction is in place (`enableSwmHostModeFor` is the call site that the auto-subscriber will reuse).
850+
- Pre-registration staging quotas (§1.2.4): per-wallet rate limits and per-core aggregate budgets are NOT enforced yet. The TTL + per-CG byte cap is enforced; the additional rate-limit dimensions land in Phase B.
851+
- On-disk replay of stored envelopes across cores: cores don't currently re-gossip the ciphertext to each other, so a fresh core joining the sharding-table assignment for an existing CG starts with an empty store. Phase B addition (would let a member catch up from ANY core in the assignment, not just one that was already subscribed when the writes happened).
848852

849853
### 7.2 Phase B — Explicit key lifecycle + monetization model β
850854

packages/agent/src/dkg-agent-types.ts

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ import type { QueryAccessConfig } from '@origintrail-official/dkg-query';
3838
import type { SkillHandler } from './messaging.js';
3939
import type { CclFactResolutionMode } from './ccl-fact-resolution.js';
4040
import type { JsonLdContent } from './dkg-agent-utils.js';
41+
import type { SwmHostModeStoreLimits } from './swm/host-mode-store.js';
4142
import type { SyncPhase } from './sync/auth/request-build.js';
4243

4344
// ── File-local structural types ─────────────────────────────────────
@@ -600,6 +601,25 @@ export interface DKGAgentConfig {
600601
syncContextGraphs?: string[];
601602
/** TTL for shared memory data in milliseconds. Expired operations are periodically cleaned up. Default: 48 hours. Set to 0 to disable. */
602603
sharedMemoryTtlMs?: number;
604+
/**
605+
* OT-RFC-38 LU-6 — settings for the core-side host-mode SWM store.
606+
* Only honoured when `nodeRole === 'core'`. Omit on edges (the
607+
* store is never initialized there).
608+
*
609+
* Fields:
610+
* - `enabled`: when `false`, cores skip host-mode entirely and behave like edges. Default `true` for cores.
611+
* - `unregistered`: TTL/byte-cap for CGs the core knows about but that aren't on-chain registered yet.
612+
* - `registered`: TTL/byte-cap for on-chain registered CGs (typically larger).
613+
* - `pruneIntervalMs`: how often the TTL/cap sweep runs.
614+
* - `reconcileIntervalMs`: how often the host-mode subscription reconciler ensures cores are subscribed to all known curated CGs.
615+
*/
616+
swmHostMode?: {
617+
enabled?: boolean;
618+
unregistered?: SwmHostModeStoreLimits;
619+
registered?: SwmHostModeStoreLimits;
620+
pruneIntervalMs?: number;
621+
reconcileIntervalMs?: number;
622+
};
603623
/** Durable local store for subscribed context-graph runtime state. */
604624
contextGraphSubscriptionStore?: ContextGraphSubscriptionStore;
605625
/** Durable local cache for nodes/agents known to be members of a context graph. */

0 commit comments

Comments
 (0)