Skip to content

Commit f6905cb

Browse files
Branimir Rakiccursoragent
authored andcommitted
refactor(network): split dkgGossipMsgId into raw + libp2p adapter (Codex PR #501 round 5)
Codex review feedback flagged that the round-4 signature accepted a libp2p `Message` directly, which made the "cross-backend dedup" framing aspirational rather than concrete. A future iroh-gossip backend would need to either duplicate the framing logic or shoehorn its messages through the libp2p shape. Split: - `dkgGossipMsgIdRaw({ topic, data, publisherIdBytes, sequenceNumber })` is the backend-agnostic primitive; framing + sha256 lives here once. - `dkgGossipMsgId(msg: libp2p.Message)` is now a thin adapter that unwraps libp2p shape into raw inputs and enforces signed-only. Tests: - Adds a FIXED VECTOR test that pins the exact 32-byte sha256 output for a known input, addressing Codex feedback that the existing `expected()` helper mirrors production logic and would mask encoding drift if both were mutated together. - Adds direct tests for `dkgGossipMsgIdRaw` (length-framing, seqno-in-hash, 32-byte digest). - Adds an adapter↔raw equivalence test proving the libp2p adapter delegates to the primitive on every signed message. Marks the public API `@experimental` in JSDoc — the function is intentionally unwired in v1 (see RFC 07 §5.4 + Phase 5 deferred cutover) and downstream consumers shouldn't rely on the in-process mesh routing through it yet. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 412b584 commit f6905cb

4 files changed

Lines changed: 220 additions & 38 deletions

File tree

packages/core/src/index.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ export {
2222
type ResolveOpts,
2323
PeerResolver,
2424
dkgGossipMsgId,
25+
dkgGossipMsgIdRaw,
26+
type DkgGossipMsgIdInput,
2527
DkgGossipUnsignedMessageError,
2628
} from './network/index.js';
2729
export {

packages/core/src/network/gossip-msg-id.ts

Lines changed: 98 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@
66
* libp2p-gossipsub without protocol-level cooperation. The exact
77
* encoding (after Codex review feedback on PR #501):
88
*
9-
* msgId(topic, payload, fromIdentityId, seqno) :=
9+
* msgId(topic, payload, publisherId, sequenceNumber) :=
1010
* sha256(
1111
* u32_be(len(topic)) ‖ topic
1212
* ‖ u32_be(len(payload)) ‖ payload
13-
* ‖ u32_be(len(fromIdentityId)) ‖ fromIdentityId
14-
* ‖ u64_be(seqno)
13+
* ‖ u32_be(len(publisherId)) ‖ publisherId
14+
* ‖ u64_be(sequenceNumber)
1515
* )
1616
*
1717
* Why length framing
@@ -23,8 +23,8 @@
2323
* the encoding injective — distinct tuples always hash to distinct
2424
* inputs.
2525
*
26-
* Why include seqno
27-
* -----------------
26+
* Why include sequenceNumber
27+
* --------------------------
2828
* The whole point of gossipsub's msgId is dedup-with-retries: a peer
2929
* publishing the same payload twice (e.g. resending after a network
3030
* blip) must produce TWO distinct msgIds, otherwise the second
@@ -33,36 +33,53 @@
3333
* sequence number in the hash preserves that semantic without
3434
* forfeiting cross-backend determinism: every backend with a notion
3535
* of per-publisher monotonic ordering (gossipsub seqno, iroh-gossip
36-
* sequence, etc.) maps the same (topic, payload, from, seq) tuple
37-
* to the same hash.
36+
* sequence, etc.) maps the same (topic, payload, publisher, seq)
37+
* tuple to the same hash.
3838
*
3939
* Why throw on unsigned
4040
* ---------------------
41-
* Codex review feedback on PR #501 round 4: with `type: 'unsigned'`
42-
* the message has no publisher identity (no `from`) and no seqno.
43-
* The earlier draft fell back to `fromBytes = []` and `seqno = 0n`,
44-
* which means two different publishers sending the same payload on
45-
* the same topic produce the SAME msgId — one publish gets falsely
46-
* deduplicated. (The upstream default for unsigned
47-
* `sha256(data)` — has the same property, but a public function
48-
* shouldn't replicate that pitfall in a freshly-shipped contract.)
41+
* Codex review feedback on PR #501 round 4: with no `from` and no
42+
* seqno, two different publishers sending the same payload would
43+
* produce the SAME msgId — false dedup. The upstream default for
44+
* unsigned (`sha256(data)`) has the same property, but a freshly-
45+
* shipped public function shouldn't replicate that pitfall. V10
46+
* configures gossipsub StrictSign by default so unsigned messages
47+
* don't appear in the wild today; throwing makes the unsupported
48+
* case loud and catches accidental misuse via the public re-export.
4949
*
50-
* V10 configures gossipsub with the StrictSign default, so unsigned
51-
* messages don't appear in the wild today. Throwing here makes the
52-
* "unsigned not supported in this msgId scheme" stance explicit:
50+
* Why split into raw + adapter
51+
* ----------------------------
52+
* Codex review feedback on PR #501 round 5: the round-4 signature
53+
* accepted a libp2p `Message` directly, which made the "cross-
54+
* backend dedup" framing aspirational rather than concrete. A
55+
* future iroh-gossip backend would have its own message type with
56+
* its own ways of representing publisher and sequence — at which
57+
* point either we'd duplicate the framing logic (drift risk) or
58+
* have to refactor every consumer to convert through the libp2p
59+
* shape.
5360
*
54-
* - any code path that tries to publish an unsigned message
55-
* fails loudly (easy to debug),
56-
* - external consumers of the exported function can't accidentally
57-
* hit the false-dedup case,
58-
* - a future PR that wants to support unsigned has to deliberately
59-
* extend the scheme with a per-message identity (nonce / hash
60-
* prefix / etc.), pinned by tests.
61+
* Round-5 split:
62+
* - `dkgGossipMsgIdRaw({ topic, data, publisherIdBytes,
63+
* sequenceNumber })` — backend-agnostic primitive over canonical
64+
* value types. Every backend's adapter normalises into this and
65+
* the framing/hash lives here once.
66+
* - `dkgGossipMsgId(msg: libp2p.Message)` — thin libp2p adapter:
67+
* unwraps `from.toMultihash().bytes` and `sequenceNumber`,
68+
* enforces signed-only (because libp2p's unsigned variant has
69+
* no publisher identity to feed in).
70+
* A future `dkgGossipMsgIdIroh(msg: iroh.GossipMessage)` adapter
71+
* goes alongside; the framing lives once in `dkgGossipMsgIdRaw`.
6172
*
62-
* v1 ships only `LibP2PGossipBackend`, so this function only changes
63-
* which sha256 inputs gossipsub feeds itself; nothing observable on
64-
* the wire. Locking the constant in NOW (rather than after a second
65-
* backend ships) avoids a future synchronised mid-flight upgrade.
73+
* Wiring
74+
* ------
75+
* v1 ships only the function and tests. The actual `msgIdFn` wiring
76+
* in `node.ts` is intentionally deferred — see RFC 07 §5.4 + Phase 5
77+
* for the rolling-upgrade rationale and the coordinated-cutover plan.
78+
*
79+
* @experimental Public API but intentionally unwired. The encoding
80+
* is pinned by `gossip-msg-id.test.ts`; downstream consumers may
81+
* import for inspection / future-backend adapters but should not
82+
* rely on the in-process libp2p mesh routing through it yet.
6683
*/
6784
import { sha256 } from '@noble/hashes/sha2.js';
6885
import type { Message } from '@libp2p/gossipsub';
@@ -92,15 +109,38 @@ export class DkgGossipUnsignedMessageError extends Error {
92109
}
93110
}
94111

95-
export function dkgGossipMsgId(msg: Message): Uint8Array {
96-
if (msg.type !== 'signed') {
97-
throw new DkgGossipUnsignedMessageError();
98-
}
112+
/**
113+
* Inputs for the backend-agnostic msgId primitive.
114+
*
115+
* - `topic` — gossip topic string (UTF-8 encoded inside the function).
116+
* - `data` — raw payload bytes.
117+
* - `publisherIdBytes` — canonical bytes identifying the publisher.
118+
* For libp2p, this is `peerId.toMultihash().bytes`. For other
119+
* backends, the equivalent canonical identity bytes.
120+
* - `sequenceNumber` — per-publisher monotonic sequence (gossipsub
121+
* seqno, iroh sequence, etc.).
122+
*
123+
* @experimental
124+
*/
125+
export interface DkgGossipMsgIdInput {
126+
topic: string;
127+
data: Uint8Array;
128+
publisherIdBytes: Uint8Array;
129+
sequenceNumber: bigint;
130+
}
99131

100-
const topicBytes = new TextEncoder().encode(msg.topic);
101-
const data = msg.data;
102-
const fromBytes = msg.from.toMultihash().bytes;
103-
const seqno = msg.sequenceNumber;
132+
/**
133+
* Backend-agnostic msgId primitive. Every gossip backend adapter
134+
* normalises into `DkgGossipMsgIdInput` and the framing + hash
135+
* lives here once.
136+
*
137+
* @experimental
138+
*/
139+
export function dkgGossipMsgIdRaw(input: DkgGossipMsgIdInput): Uint8Array {
140+
const topicBytes = new TextEncoder().encode(input.topic);
141+
const data = input.data;
142+
const fromBytes = input.publisherIdBytes;
143+
const seqno = input.sequenceNumber;
104144

105145
const total =
106146
4 + topicBytes.length +
@@ -123,3 +163,24 @@ export function dkgGossipMsgId(msg: Message): Uint8Array {
123163

124164
return sha256(buf);
125165
}
166+
167+
/**
168+
* libp2p-gossipsub adapter. Suitable as the `msgIdFn` parameter of
169+
* `gossipsub({ ... })` when (eventually) wired in `node.ts`.
170+
*
171+
* Throws `DkgGossipUnsignedMessageError` if `msg.type !== 'signed'`.
172+
*
173+
* @experimental Public but intentionally unwired in v1; see file
174+
* doc-comment + RFC 07 §5.4 for the rollout plan.
175+
*/
176+
export function dkgGossipMsgId(msg: Message): Uint8Array {
177+
if (msg.type !== 'signed') {
178+
throw new DkgGossipUnsignedMessageError();
179+
}
180+
return dkgGossipMsgIdRaw({
181+
topic: msg.topic,
182+
data: msg.data,
183+
publisherIdBytes: msg.from.toMultihash().bytes,
184+
sequenceNumber: msg.sequenceNumber,
185+
});
186+
}

packages/core/src/network/index.ts

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,9 @@ export type {
1919
} from './peer-resolver.js';
2020
export { PeerResolver } from './peer-resolver.js';
2121

22-
export { dkgGossipMsgId, DkgGossipUnsignedMessageError } from './gossip-msg-id.js';
22+
export type { DkgGossipMsgIdInput } from './gossip-msg-id.js';
23+
export {
24+
dkgGossipMsgId,
25+
dkgGossipMsgIdRaw,
26+
DkgGossipUnsignedMessageError,
27+
} from './gossip-msg-id.js';

packages/core/test/gossip-msg-id.test.ts

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import { describe, it, expect } from 'vitest';
22
import { sha256 } from '@noble/hashes/sha2.js';
33
import {
44
dkgGossipMsgId,
5+
dkgGossipMsgIdRaw,
56
DkgGossipUnsignedMessageError,
67
} from '../src/network/gossip-msg-id.js';
78

@@ -173,4 +174,117 @@ describe('dkgGossipMsgId (RFC 07 §5.4)', () => {
173174
});
174175
expect(a).toEqual(b);
175176
});
177+
178+
// Codex review feedback on PR #501 round 5: the encoding can drift if
179+
// both production and test are mutated together via `expected()`. This
180+
// pins the exact 32-byte SHA256 output for a known input so any
181+
// change to the framing/hash bytes is caught even if the helper is
182+
// wrong in the same way the prod code is.
183+
//
184+
// Vector:
185+
// topic = 'dkg/test/1.0.0'
186+
// data = [0x01, 0x02, 0x03, 0x04]
187+
// from = [0x12, 0x34, 0x56]
188+
// seq = 7n
189+
// Pre-hash:
190+
// 0000000e 646b672f746573742f312e302e30 00000004 01020304
191+
// 00000003 123456 0000000000000007
192+
// SHA256:
193+
// 17dc679d5ac2b669fc946ead91f08728a2dc33f8799f3b3bf3df04384959caa8
194+
it('FIXED VECTOR: pinned 32-byte SHA256 for a known input (libp2p adapter)', () => {
195+
const knownFrom = {
196+
toMultihash: () => ({ bytes: new Uint8Array([0x12, 0x34, 0x56]) }),
197+
} as unknown as Parameters<typeof dkgGossipMsgId>[0] extends infer M
198+
? M extends { from: infer P } ? P : never : never;
199+
const id = dkgGossipMsgId({
200+
type: 'signed',
201+
topic: 'dkg/test/1.0.0',
202+
data: new Uint8Array([0x01, 0x02, 0x03, 0x04]),
203+
from: knownFrom,
204+
sequenceNumber: 7n,
205+
signature: new Uint8Array(),
206+
key: {} as never,
207+
});
208+
const hex = Array.from(id, (b) => b.toString(16).padStart(2, '0')).join('');
209+
expect(hex).toBe(
210+
'17dc679d5ac2b669fc946ead91f08728a2dc33f8799f3b3bf3df04384959caa8',
211+
);
212+
});
213+
});
214+
215+
// Codex review feedback on PR #501 round 5: the libp2p-shaped function
216+
// alone makes the "cross-backend dedup" framing aspirational. Pinning
217+
// the backend-agnostic primitive separately, plus asserting the libp2p
218+
// adapter delegates to it, locks in the contract: any future backend
219+
// adapter (iroh-gossip, etc.) just needs to feed canonical bytes into
220+
// `dkgGossipMsgIdRaw` and gets the same dedup behaviour.
221+
describe('dkgGossipMsgIdRaw (RFC 07 §5.4 — backend-agnostic primitive)', () => {
222+
it('FIXED VECTOR: pinned 32-byte SHA256 (matches libp2p adapter vector)', () => {
223+
const id = dkgGossipMsgIdRaw({
224+
topic: 'dkg/test/1.0.0',
225+
data: new Uint8Array([0x01, 0x02, 0x03, 0x04]),
226+
publisherIdBytes: new Uint8Array([0x12, 0x34, 0x56]),
227+
sequenceNumber: 7n,
228+
});
229+
const hex = Array.from(id, (b) => b.toString(16).padStart(2, '0')).join('');
230+
expect(hex).toBe(
231+
'17dc679d5ac2b669fc946ead91f08728a2dc33f8799f3b3bf3df04384959caa8',
232+
);
233+
});
234+
235+
it('libp2p adapter agrees with raw primitive on every signed message', () => {
236+
const fromBytes = new Uint8Array([0xAA, 0xBB, 0xCC]);
237+
const peerId = {
238+
toMultihash: () => ({ bytes: fromBytes }),
239+
} as unknown as Parameters<typeof dkgGossipMsgId>[0] extends infer M
240+
? M extends { from: infer P } ? P : never : never;
241+
const adapterId = dkgGossipMsgId({
242+
type: 'signed',
243+
topic: 'cg/topic-x/1.0.0',
244+
data: new Uint8Array([9, 9, 9]),
245+
from: peerId,
246+
sequenceNumber: 1234n,
247+
signature: new Uint8Array(),
248+
key: {} as never,
249+
});
250+
const rawId = dkgGossipMsgIdRaw({
251+
topic: 'cg/topic-x/1.0.0',
252+
data: new Uint8Array([9, 9, 9]),
253+
publisherIdBytes: fromBytes,
254+
sequenceNumber: 1234n,
255+
});
256+
expect(adapterId).toEqual(rawId);
257+
});
258+
259+
it('length-framing collision check applies at the raw level too', () => {
260+
const a = dkgGossipMsgIdRaw({
261+
topic: 'ab', data: new TextEncoder().encode('c'),
262+
publisherIdBytes: new Uint8Array([1]), sequenceNumber: 0n,
263+
});
264+
const b = dkgGossipMsgIdRaw({
265+
topic: 'a', data: new TextEncoder().encode('bc'),
266+
publisherIdBytes: new Uint8Array([1]), sequenceNumber: 0n,
267+
});
268+
expect(a).not.toEqual(b);
269+
});
270+
271+
it('seqno enters the hash at the raw level too', () => {
272+
const a = dkgGossipMsgIdRaw({
273+
topic: 't', data: new Uint8Array([1]),
274+
publisherIdBytes: new Uint8Array([2]), sequenceNumber: 1n,
275+
});
276+
const b = dkgGossipMsgIdRaw({
277+
topic: 't', data: new Uint8Array([1]),
278+
publisherIdBytes: new Uint8Array([2]), sequenceNumber: 2n,
279+
});
280+
expect(a).not.toEqual(b);
281+
});
282+
283+
it('returns a 32-byte SHA256 digest', () => {
284+
const id = dkgGossipMsgIdRaw({
285+
topic: 't', data: new Uint8Array(),
286+
publisherIdBytes: new Uint8Array(), sequenceNumber: 0n,
287+
});
288+
expect(id.length).toBe(32);
289+
});
176290
});

0 commit comments

Comments
 (0)