Skip to content

Commit ec33107

Browse files
committed
TOSQUASH consensus: refine sketch implementation plan for Ouroboros Genesis
1 parent 59ac021 commit ec33107

File tree

1 file changed

+136
-0
lines changed

1 file changed

+136
-0
lines changed

ouroboros-consensus/docs/GenesisDecomposition.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,3 +187,139 @@ I've also updated this file after a discussion with Duncan on 2021 Apr 13.
187187

188188
* TODO testing etc - we'd _very much really_ like to use the ThreadNet
189189
rewrite for this
190+
191+
-----
192+
193+
Updated on 2021 August 9, after much additional thought and broader
194+
reconsiderations, kicked off by Javier Sagredo's observation of a stalling
195+
attack vector in the original sketch above.
196+
197+
This new sketch updates much-but-not-all of the origial sketch above.
198+
199+
- Execution begins in the _Syncing_ state.
200+
201+
- While we are Syncing:
202+
203+
- If our valency falls below some threshold, then BlockFetch stops sending
204+
new fetch requests until sufficient valency is recovered.
205+
206+
- BlockFetch can only download blocks from the headers that the density
207+
rule approves.
208+
209+
- The density rule is: compare header chains based on the number of
210+
headers in the relevant Genesis window (the 3k/f slots after the
211+
intersection), though if the headers do not span the Genesis window
212+
and the peer claims to have more headers we must wait for them
213+
(because they might also be in the window).
214+
215+
- The Ouroboros Genesis paper proves -- excepting only disasterous
216+
intervals -- that density rule will always strictly prefer the honest
217+
chain over any possible alternative.
218+
219+
- Therefore, we require that each peer's highwater blockno is increasing
220+
"fast enough on average" until we're at their tip, with the only
221+
exceptional circumstance being when their latest header is beyond our
222+
forecast range (since we don't even request a next header while that is
223+
true).
224+
225+
- TODO Do we actually need that exception? Under what circumstances
226+
would it be relevant, during Syncing?
227+
228+
- TODO I'm anticipating a token bucket for enforcing "fast enough on
229+
average", but there remain plenty of details and thresholds to
230+
consider.
231+
232+
- A possible refinement: if they can promise to send a specific k+1st
233+
block (which the honest nodes would always do, up to their immutable
234+
tip), then they're allowed to be somewhat slower, since we'll
235+
disconnect from them if either they don't deliver that block or if
236+
the eventual densest chain does not include that block.
237+
238+
- A possible refinment: each peer can offer _jump points_ that are
239+
usefully ahead of their latest header. If some other peer has already
240+
sent the jump point's header, then we can advance the slower peer's
241+
ChainSync state accordingly. This can help a relatively slow
242+
redundant peer remain connected.
243+
244+
- Transition from Syncing to _CaughtUp_ whenever all of:
245+
246+
- No peer has sent a header binary-preferable to my selection.
247+
248+
- No peer has sent >k headers from an intersection with my selection.
249+
250+
- We see every peer to its tip.
251+
252+
- TODO To what extent can the adversary abuse this to prevent our
253+
transition? Even supposing validated, uninterruptible ChainSync
254+
switches?
255+
256+
- TODO Perhaps we don't need it, since we assume we'll have at least
257+
one honest peer. Their stream of headers should race ahead of the
258+
corresponding stream of blocks until we're CaughtUp, and so that'll
259+
hold back at least one of the other conjuncts. On the other hand, it
260+
seems fine if we do need this, because of the timeout discussed
261+
above.
262+
263+
- While we are CaughtUp:
264+
265+
- BlockFetch is free to download the blocks from any of our peers' headers.
266+
It has two primary requirements, which are in tension.
267+
268+
- The ultimate goal of BlockFetch is to get the best blocks ASAP.
269+
However, an imperfect best effort is tolerable, up to a point; we
270+
consider the only consequences of the best effort's inefficiency to
271+
be additional chain propagation delay.
272+
273+
- The Ouroboros protocol only considers chain length. Tiebreakers
274+
are out of scope, so "best block" in the requirement above only
275+
means greatest blockno. (BlockFetch is free to also consider
276+
tiebreakers; the protocol does not care.)
277+
278+
- Note that the adversary claiming to have additional headers but
279+
refusing to send them has no effect on BlockFetch while we are
280+
CaughtUp. Only received headers matter. The worst the adversary
281+
could do by withholding headers is intentionally timeout in order
282+
to decrement our valency (which we might choose to require stays
283+
about some value, see below) -- but presumably they can't ensure
284+
we reconnect to them, so they've revealed their nature, losing
285+
access to us, in order to possibly create a short delay.
286+
287+
- BlockFetch should avoid unnecessary downloads (the same block more
288+
than once or a block we'll never select).
289+
290+
- When CaughtUp, we have a high priority design goal that
291+
worst-case resource utilization is approximately the same as
292+
average-case. If not, even well-meaning node operators will
293+
eventually prune their node's allocated resources, thereby
294+
creating a DoS attack vector.
295+
296+
- This is why we can't simply download "all blocks ASAP" or even the
297+
same block from all peers currently offering it. Recall that the
298+
adversary can forge arbitrarily many blocks whenever it is
299+
elected, just not on the same chain.
300+
301+
- Transition from CaughtUp to Syncing whenever any of:
302+
303+
- The wallclock is "too far ahead" of the latest "meaningful" peer
304+
interaction.
305+
306+
- TODO Sketch: we transition as soon N (?) of our peers' tips have a
307+
time point that is more than LIM (?) behind our wallclock.
308+
309+
- TODO Our ChainSync timeouts will disconnect naturally, right? And so
310+
maybe this is really just another valency limit, like that of Syncing
311+
above.
312+
313+
- TODO It's safe to assume the computer has access to "inertial
314+
reckoning" via a real-time clock hardware, right? If so, we can
315+
immediately detect this even upon eg the machine waking from a
316+
hibernation state. IE instead of totally relying an NTP connection,
317+
which could also be compromised.
318+
319+
- Some peer sends >k headers from an intersection with my selection.
320+
321+
- This rule is a failsafe: We assume this shouldn't happen under
322+
nominal circumstances (by the Common Prefix theorem in the Ouroboros
323+
Praos paper; TODO Confirm with researchers), so we downgrade to the
324+
more conservative state if we do observe it, since we must have
325+
somehow fallen "too far" behind again without otherwise noticing.

0 commit comments

Comments
 (0)