You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -116,12 +116,19 @@ For any other error (peer not found, timeout, network), retry once before bother
116
116
-**Don't loop** — if you send a message and the response comes back asking another question, surface it to the operator before auto-replying. Phase 1 is operator-in-the-loop by design; Phase 3 (autonomous bridge) is a future RFC.
117
117
-**Don't conflate chat with the `chat` sub-graph.** The `chat` sub-graph (captured by `capture-chat.mjs`) is the operator's conversation with you; `dkg_send_message` is *your* conversation with another agent. They're separate channels for now.
118
118
119
+
## Universal Messenger (v10.0.0-rc.9)
120
+
121
+
DKG's short peer-to-peer protocols (chat, skill request, query-remote, swm-sender-key, private-access, join-request, storage-ack, verify-proposal) all route through a single reliability substrate called the **Universal Messenger**. Architecture: [`docs/messenger.md`](./docs/messenger.md). Operator-facing surfaces: [`docs/messenger-operator.md`](./docs/messenger-operator.md). Migration recipe for a hypothetical 9th protocol: [`docs/messenger-add-protocol.md`](./docs/messenger-add-protocol.md).
122
+
123
+
**Convergence rule for agents working on this codebase**: route any new short-message protocol through `Messenger.sendReliable` and register handlers via `Messenger.register` — never `ProtocolRouter.send` / `ProtocolRouter.register` directly. The substrate gives you sender-side durable retry (SQLite outbox surviving daemon restart), receiver-side dedup keyed by `messageId`, sender-side response cache, stale-snapshot-safe retries, opportunistic flush on `connection:open`, DHT-walk-on-stall, and observability via `/api/slo`. Bypassing it loses every one of those properties. The migration recipe lives at `docs/messenger-add-protocol.md` and includes the worked example from PR-3 (chat).
124
+
119
125
## Things to NOT do
120
126
121
127
-**Don't fabricate URIs.** Every URI in `mentions` must come from `dkg_search` or be freshly minted via the look-before-mint protocol.
122
128
-**Don't skip turns to "save tokens".** One annotation call per turn is cheap (~few hundred ms). Coverage wins.
123
129
-**Don't publish to VM via MCP.** That's `dkg_request_vm_publish` (marker for human review), not `/api/shared-memory/publish`. The agent is never the gating actor for on-chain commitment.
124
130
-**Don't normalise slugs in your `dkg_search` query.** Pass the unnormalised label so the daemon's fuzzy match has the most signal; only normalise when comparing for reuse-vs-mint.
131
+
-**Don't call `ProtocolRouter.send` directly for new short-message protocols.** Use `Messenger.sendReliable` (see Universal Messenger section above).
Copy file name to clipboardExpand all lines: CHANGELOG.md
+37Lines changed: 37 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,43 @@ All notable changes to the DKG V9 node are documented here. The format is based
4
4
5
5
## [Unreleased]
6
6
7
+
## [10.0.0-rc.9] - 2026-05-17
8
+
9
+
**Universal Messenger substrate**: every short peer-to-peer DKG protocol (chat, skill request, query-remote, swm-sender-key, private-access, join-request, storage-ack, verify-proposal) now routes through a single reliability layer with at-least-once delivery, receiver-side dedup keyed by `messageId`, sender-side response cache (256 KiB inline; mark-only beyond), SQLite-persisted retry outbox surviving daemon restart, opportunistic flush on `connection:open`, DHT-walk-on-stall recovery, and per-protocol latency observability via `/api/slo`. Architecture in [`docs/messenger.md`](./docs/messenger.md); operator surfaces in [`docs/messenger-operator.md`](./docs/messenger-operator.md); migration recipe in [`docs/messenger-add-protocol.md`](./docs/messenger-add-protocol.md).
10
+
11
+
**Wire-format break**: all 8 substrate-routed protocols moved from `/dkg/10.0.0/*` to `/dkg/10.0.1/*`. Both daemons in a pair must be on rc.9 for any of these protocols to negotiate between them; mixed-pair deploys (one node rc.8, one rc.9) surface as `delivered: false, queued: true` outbox entries that drain once both sides upgrade. No backward-compatibility codepath ships — hard cutover keeps the substrate's correctness proofs simple. `/dkg/10.0.0/verify-approval` is the sole `/dkg/10.0.0/*` survivor (not a substrate caller; left bare). See the upgrade order in `docs/messenger-operator.md` § "Upgrade from rc.8 to rc.9".
12
+
13
+
**V12 + V13 SQLite migrations** run automatically on first rc.9 boot. V12 adds `message_idempotency` + `protocol_outbox` tables (additive — chat continues to write to its V11 column until V13 cuts over). V13 drops the V11 `idx_chat_msgid` partial unique index in favour of the substrate-owned dedup. `chat_messages.message_id` column is **preserved nullable** for hot-rollback safety: rc.8 finds a column it recognises if you have to downgrade. In-flight rc.8 chat-outbox entries should be drained before upgrade (let the daemon idle for one tick cycle, typically 30s); new sends post-upgrade route via the substrate outbox.
14
+
15
+
### Added — Universal Messenger substrate
16
+
17
+
-**PR-1 (#542): substrate primitives** (`packages/core/src/proto/reliable-envelope.ts`, `packages/core/src/messenger-types.ts`, `packages/core/src/protocol-outbox.ts`, `packages/node-ui/src/db.ts`, `docs/messenger.md`): introduces the `ReliableEnvelope` Protobuf wire wrapper (`{ messageId, version, tsMs, payload }`), the `MessageIdempotencyStore` + `ProtocolOutboxStore` ports, the generic `ProtocolOutbox` retry helper (5s → 15s → 30s → 60s → 5m → 30m → 2h ladder; per-key inflight guard; stale-snapshot guard lifted from rc.8 #538), and SQLite-backed `SqliteMessageIdempotencyStore` + `SqliteProtocolOutboxStore` against a V12 schema migration. 256 KiB inline response cache budget; oversize responses stored mark-only and surface `RESPONSE_GONE` to duplicate receivers.
18
+
-**PR-2 (#543): Messenger evolution + lifecycle wiring** (`packages/agent/src/p2p/messenger.ts`, `packages/agent/src/dkg-agent.ts`, `packages/cli/src/daemon/lifecycle.ts`): the `Messenger` class gains `sendReliable` + `register` substrate surfaces wrapping envelope encode + sender/receiver idempotency + outbox enqueue on recoverable failure; legacy `sendToPeer` path preserved bitwise-compatible for any `/dkg/10.0.0/*` caller. `DKGAgent` wires a `messengerOutboxTimer` periodic tick and piggy-backs `messenger.processOutboxOnConnect` onto its existing `connection:open` handler. `lifecycle.ts` instantiates the SQLite stores against the shared `DashboardDB` and routes `ackTransportFactory.sendP2P` through the Messenger (semantics-identical until `/storage-ack` migrates in PR-11).
19
+
-**PR-3 (#544): pilot migration — chat + skill onto `/dkg/10.0.1/message`** (`packages/core/src/constants.ts`, `packages/agent/src/messaging.ts`, `packages/agent/src/dkg-agent.ts`, `packages/node-ui/src/db.ts`, `docs/messenger-add-protocol.md`): `PROTOCOL_MESSAGE` prefix bumped from `/dkg/10.0.0/message` to `/dkg/10.0.1/message`. `MessageHandler.sendChat` / `sendSkillRequest` route through `messenger.sendReliable`; the in-process `MessageOutbox` chat-specific queue + its periodic tick + opportunistic-flush + stale-snapshot guard are **deleted** in favour of the substrate's generic `ProtocolOutbox`. V13 SQLite migration drops the V11 `idx_chat_msgid` partial unique index (receiver-side dedup now owned by the substrate's `message_idempotency` table). MCP `dkg_send_message` queued/attempts/nextAttemptAtMs operator surface preserved end-to-end — sourced from the substrate outbox.
20
+
-**PR-4 (#545): `ProtocolRouter.send` parallelPaths option** (`packages/core/src/protocol-router.ts`, `docs/messenger.md`): `send()` accepts a `SendOptions { parallelPaths?: number, timeoutMs?: number }` object (the legacy `timeoutMs: number` arg is still accepted for backward compat). When `parallelPaths > 1`, opens N concurrent `newStream` attempts across enumerated live connections via `Promise.any`; first success wins, losers aborted. Safe only on `/dkg/10.0.1/*` where receiver dedup is mandatory. Defaults to 1 for app-level fan-out protocols (storage-ack / verify-proposal); chat opts in at 2.
21
+
-**PR-5 (#546): DHT-walk-on-outbox-stall recovery** (`packages/agent/src/p2p/messenger.ts`, `packages/agent/src/dkg-agent.ts`, `docs/messenger.md`): outbox entries that reach `OUTBOX_STALL_THRESHOLD = 5` attempts of "no valid addresses for peer" trigger a time-bounded (`DHT_WALK_TIMEOUT_MS = 10s`), rate-limited (`DHT_WALK_RATE_LIMIT_MS = 5min/peer`) `libp2p.peerRouting.findPeer()` to refresh the receiver's addresses in the peerStore. Runs in the background so retries continue uninterrupted; the next outbox tick re-enumerates against the refreshed addresses.
22
+
-**PR-7 (#548): `--relay-preferred` CLI flag + `preferredRelays` config + relay-setup playbook** (`packages/cli/src/cli.ts`, `packages/cli/src/config.ts`, `packages/cli/src/daemon/lifecycle.ts`, `packages/cli/README.md`, `docs/messenger-operator.md`): operators can prioritise relays they control via `dkg start --relay-preferred /ip4/.../p2p/...` (repeatable) or by writing `preferredRelays: string[]` into `~/.dkg/config.json`. The new `mergePreferredRelays` helper parses both sources, dedupes (first-seen order), and prepends to the network relay list — public testnet relays remain as fallback. Operator playbook for standing up your own relay infrastructure ships in `packages/cli/README.md` § "Operator relays". PR-6 (gossip peer-hints) cancelled per Gate B; DHT walk + inbound-from-receiver are sufficient.
23
+
-**PR-8 (#550): migrate `/swm-sender-key` + `/private-access` onto the substrate** (`packages/core/src/constants.ts`, `packages/agent/src/dkg-agent.ts`, `packages/publisher/src/access-client.ts`, `packages/publisher/test/_helpers/substrate.ts`, `docs/messenger.md`): both protocols bumped to `/dkg/10.0.1/*` and routed through `messenger.register` / `messenger.sendReliable`. `AccessClient` is refactored to accept a minimal `AccessSendSurface` interface (defined locally in `access-client.ts`) instead of importing `Messenger` directly, avoiding a `publisher → agent` circular dependency. A test-only `publisher/test/_helpers/substrate.ts` shim provides substrate semantics for publisher tests without the agent package dependency.
24
+
-**PR-9 (#551): migrate `/query-remote` onto the substrate with `RESPONSE_GONE` retry recipe** (`packages/core/src/constants.ts`, `packages/agent/src/dkg-agent.ts`, `docs/messenger.md`): protocol bumped to `/dkg/10.0.1/query-remote`. New `sendQueryReliable()` helper wraps `messenger.sendReliable` and re-issues with a fresh `messageId` if the previous attempt returns `RESPONSE_GONE` (cap 2 attempts). SPARQL queries are app-layer idempotent, so the fresh-`messageId` re-issue is safe; the cap prevents infinite loops if every response is over the 256 KiB cache budget.
25
+
-**PR-10 (#554): migrate `/join-request` onto the substrate; delete `JoinApprovalRetryQueue`** (`packages/core/src/constants.ts`, `packages/agent/src/dkg-agent.ts`, `packages/agent/src/join-approval-retry-queue.ts`, `docs/messenger.md`): all three `messenger.sendToPeer` call sites for `/dkg/10.0.0/join-request` (private notification, curator-targeted forward, broadcast) replaced with `messenger.sendReliable`; protocol bumped to `/dkg/10.0.1/join-request`. The in-memory `JoinApprovalRetryQueue` + its 30s timer + on-connect handler + processor methods are **deleted** — substrate's SQLite outbox now owns retry persistence across restart. Net `-554 / +136` LOC. `listPendingJoinApprovalRetries` stubbed to return `[]` (operator diagnostic re-built atop substrate outbox in PR-12).
26
+
-**PR-11 (#555): migrate `/storage-ack` + `/verify-proposal` onto the substrate** (`packages/core/src/constants.ts`, `packages/agent/src/dkg-agent.ts`, `packages/cli/src/daemon/lifecycle.ts`, `docs/messenger.md`): both protocols bumped to `/dkg/10.0.1/*` and routed via `messenger.register` / `messenger.sendReliable`. `ACKCollector` + `VerifyCollector` quorum logic is unchanged; only the transport rewires. Three `sendP2P` wirings swap `sendToPeer → sendReliable` with `queued`-as-per-peer-throw so the collectors' existing retry loops keep their semantics. `parallelPaths` stays at 1 to avoid 9x fan-out amplification on top of the existing app-layer quorum. `/dkg/10.0.0/verify-approval` intentionally stays bare (not a substrate caller).
27
+
-**PR-12 (#557): per-message latency histogram + `/api/slo` endpoint + soak script extension** (`packages/agent/src/p2p/messenger.ts`, `packages/agent/src/dkg-agent.ts`, `packages/cli/src/daemon/routes/agent-chat.ts`, `scripts/libp2p-soak-test.sh`, `docs/messenger.md`, `docs/messenger-operator.md`): in-memory per-protocol histogram of `sendReliable` invoke → `delivered: true` latency across queue + retries (1000-sample sliding window via `DEFAULT_SLO_WINDOW_SAMPLES`). Exposes `{ samples, p50Ms, p95Ms, p99Ms, delivered, queued }` per protocol via `GET /api/slo` (localhost-only by default; same `Authorization: Bearer` requirement as every other `/api/*` route). Soak script extended with per-cycle `/api/slo` snapshot (`slo.jsonl` + human-readable summary line in `main.log`) covering all 8 protocols. Source of truth for the ship-gate SLO measurement.
28
+
-**PR-13: docs consolidation pass** (`docs/messenger.md`, `docs/messenger-operator.md`, `AGENTS.md`, `CHANGELOG.md`): sequence diagrams (topology + happy-path + recovery + multi-path) polished into `docs/messenger.md`; per-protocol coverage table marked all-rc.9-shipped; SLO reading guide folded into `docs/messenger-operator.md`; `AGENTS.md` gets a Universal Messenger paragraph + convergence rule (route new short-message protocols via `Messenger.sendReliable`, never `ProtocolRouter.send` directly); this CHANGELOG entry.
29
+
30
+
### Cancelled
31
+
32
+
-**PR-6 (gossip peer-hints, conditional)**: skipped per Gate B decision — DHT walk + inbound-from-receiver are sufficient for the rc.9 SLO target. If a post-ship soak surfaces a reliability tail that DHT walk doesn't close, PR-6 lands as a fast follow-up under the original gossip-hints design (signed topic, 5min publish cadence, 10min replay window).
33
+
34
+
### Carry-over from rc.8 (preserved by the substrate)
35
+
36
+
The substrate explicitly preserves five rc.8 invariants surfaced by PRs #533 / #534 / #536 / #537 / #538:
37
+
38
+
1.**Receiver-side dedup contract** (rc.8 #534): the generic `Messenger.register` wrapper performs `(peer, protocol, message_id, 'in')` lookup before invoking the handler — exactly-once application semantics.
39
+
2.**Targeted `ON CONFLICT`** (rc.8 #534): `message_idempotency` uses `INSERT ... ON CONFLICT DO NOTHING` only on the dedup case, not as a swallow-all-constraint clause.
40
+
3.**Stale-snapshot guard** (rc.8 #538): `ProtocolOutbox.processOutboxOnConnect` and `processOutboxTick` call `hasEntry()` between `tryBeginAttempt()` and the wire send. Contract test pinned in `messenger-substrate.test.ts`.
41
+
4.**Peer ID normalisation** (rc.8 #533): every libp2p-boundary code path uses `peerId.toString()` consistently for outbox / idempotency / diagnostics lookups.
42
+
5.**Connection reuse logic** (rc.8 #537): PR-4 multi-path enumeration walks `getConnections()` directly rather than peerId-keying.
0 commit comments