@@ -187,3 +187,139 @@ I've also updated this file after a discussion with Duncan on 2021 Apr 13.
187187
188188 * TODO testing etc - we'd _ very much really_ like to use the ThreadNet
189189 rewrite for this
190+
191+ -----
192+
193+ Updated on 2021 August 9, after much additional thought and broader
194+ reconsiderations, kicked off by Javier Sagredo's observation of a stalling
195+ attack vector in the original sketch above.
196+
197+ This new sketch updates much-but-not-all of the origial sketch above.
198+
199+ - Execution begins in the _ Syncing_ state.
200+
201+ - While we are Syncing:
202+
203+ - If our valency falls below some threshold, then BlockFetch stops sending
204+ new fetch requests until sufficient valency is recovered.
205+
206+ - BlockFetch can only download blocks from the headers that the density
207+ rule approves.
208+
209+ - The density rule is: compare header chains based on the number of
210+ headers in the relevant Genesis window (the 3k/f slots after the
211+ intersection), though if the headers do not span the Genesis window
212+ and the peer claims to have more headers we must wait for them
213+ (because they might also be in the window).
214+
215+ - The Ouroboros Genesis paper proves -- excepting only disasterous
216+ intervals -- that density rule will always strictly prefer the honest
217+ chain over any possible alternative.
218+
219+ - Therefore, we require that each peer's highwater blockno is increasing
220+ "fast enough on average" until we're at their tip, with the only
221+ exceptional circumstance being when their latest header is beyond our
222+ forecast range (since we don't even request a next header while that is
223+ true).
224+
225+ - TODO Do we actually need that exception? Under what circumstances
226+ would it be relevant, during Syncing?
227+
228+ - TODO I'm anticipating a token bucket for enforcing "fast enough on
229+ average", but there remain plenty of details and thresholds to
230+ consider.
231+
232+ - A possible refinement: if they can promise to send a specific k+1st
233+ block (which the honest nodes would always do, up to their immutable
234+ tip), then they're allowed to be somewhat slower, since we'll
235+ disconnect from them if either they don't deliver that block or if
236+ the eventual densest chain does not include that block.
237+
238+ - A possible refinment: each peer can offer _ jump points_ that are
239+ usefully ahead of their latest header. If some other peer has already
240+ sent the jump point's header, then we can advance the slower peer's
241+ ChainSync state accordingly. This can help a relatively slow
242+ redundant peer remain connected.
243+
244+ - Transition from Syncing to _ CaughtUp_ whenever all of:
245+
246+ - No peer has sent a header binary-preferable to my selection.
247+
248+ - No peer has sent >k headers from an intersection with my selection.
249+
250+ - We see every peer to its tip.
251+
252+ - TODO To what extent can the adversary abuse this to prevent our
253+ transition? Even supposing validated, uninterruptible ChainSync
254+ switches?
255+
256+ - TODO Perhaps we don't need it, since we assume we'll have at least
257+ one honest peer. Their stream of headers should race ahead of the
258+ corresponding stream of blocks until we're CaughtUp, and so that'll
259+ hold back at least one of the other conjuncts. On the other hand, it
260+ seems fine if we do need this, because of the timeout discussed
261+ above.
262+
263+ - While we are CaughtUp:
264+
265+ - BlockFetch is free to download the blocks from any of our peers' headers.
266+ It has two primary requirements, which are in tension.
267+
268+ - The ultimate goal of BlockFetch is to get the best blocks ASAP.
269+ However, an imperfect best effort is tolerable, up to a point; we
270+ consider the only consequences of the best effort's inefficiency to
271+ be additional chain propagation delay.
272+
273+ - The Ouroboros protocol only considers chain length. Tiebreakers
274+ are out of scope, so "best block" in the requirement above only
275+ means greatest blockno. (BlockFetch is free to also consider
276+ tiebreakers; the protocol does not care.)
277+
278+ - Note that the adversary claiming to have additional headers but
279+ refusing to send them has no effect on BlockFetch while we are
280+ CaughtUp. Only received headers matter. The worst the adversary
281+ could do by withholding headers is intentionally timeout in order
282+ to decrement our valency (which we might choose to require stays
283+ about some value, see below) -- but presumably they can't ensure
284+ we reconnect to them, so they've revealed their nature, losing
285+ access to us, in order to possibly create a short delay.
286+
287+ - BlockFetch should avoid unnecessary downloads (the same block more
288+ than once or a block we'll never select).
289+
290+ - When CaughtUp, we have a high priority design goal that
291+ worst-case resource utilization is approximately the same as
292+ average-case. If not, even well-meaning node operators will
293+ eventually prune their node's allocated resources, thereby
294+ creating a DoS attack vector.
295+
296+ - This is why we can't simply download "all blocks ASAP" or even the
297+ same block from all peers currently offering it. Recall that the
298+ adversary can forge arbitrarily many blocks whenever it is
299+ elected, just not on the same chain.
300+
301+ - Transition from CaughtUp to Syncing whenever any of:
302+
303+ - The wallclock is "too far ahead" of the latest "meaningful" peer
304+ interaction.
305+
306+ - TODO Sketch: we transition as soon N (?) of our peers' tips have a
307+ time point that is more than LIM (?) behind our wallclock.
308+
309+ - TODO Our ChainSync timeouts will disconnect naturally, right? And so
310+ maybe this is really just another valency limit, like that of Syncing
311+ above.
312+
313+ - TODO It's safe to assume the computer has access to "inertial
314+ reckoning" via a real-time clock hardware, right? If so, we can
315+ immediately detect this even upon eg the machine waking from a
316+ hibernation state. IE instead of totally relying an NTP connection,
317+ which could also be compromised.
318+
319+ - Some peer sends >k headers from an intersection with my selection.
320+
321+ - This rule is a failsafe: We assume this shouldn't happen under
322+ nominal circumstances (by the Common Prefix theorem in the Ouroboros
323+ Praos paper; TODO Confirm with researchers), so we downgrade to the
324+ more conservative state if we do observe it, since we must have
325+ somehow fallen "too far" behind again without otherwise noticing.
0 commit comments