Skip to content

consensus: update sketch implementation plan for Ouroboros Genesis #487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
321 changes: 321 additions & 0 deletions docs/website/contents/for-developers/GenesisDecomposition.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ This file sketches the major points in the decomposition of the Genesis
implementation. Edsko de Vries and Nicolas Frisby developed this plan of attack
in 2021 March using chapter 21 of the Consensus Report.

I've also updated this file after a discussion with Duncan on 2021 Apr 13.

* add _prefix selection_ in between ChainSync -> BlockFetch (Sections
21.1-21.4, 21.6.2 in the Chapter)

Expand Down Expand Up @@ -109,6 +111,13 @@ in 2021 March using chapter 21 of the Consensus Report.
* We anticipate adding an `:: STM IsAlert` field to the
`BlockFetchConsensusInterface` record.

* In particular, this flag can indicate that the Network Layer can "relax"
with regards to its set of peers. When we're catching up, the Network
Layer needs to take all sorts of precautions (eg higher valency) to
ensure we have at least one peer offering the honest chain. This
distinction is more severe than but similar to the existing
`FetchModeBulkSync` versus `FetchModeDeadline` distinction.

* clock skew in relation to DoS attacks (Section 21.5.3)

* The `Left PastHorizonException{} -> retry` forecast case in CS client
Expand All @@ -127,9 +136,321 @@ in 2021 March using chapter 21 of the Consensus Report.

* This realization saved a lot of development time.

* Risk 1: availability attack

* The design written above, would prevent BlockFetch from pessimistically
prefetching blocks from chains that are *almost* the best. EG Today
BlockFetch can fetch the longest _available_ chain, as a fallback for
when the only peers that sent headers indicating the best chain are slow
to actually provide the promised blocks.

* The idea to intersect all candidates with the result of `selectPrefix`
would prohibit that. And rightly so, in the sense of actually
implementing Genesis. (TODO NSF: why cann't we make some appeal to `k`
here, to reduce the severity of this potential spec divergence?)

* Duncan raised the concern that an attack could -- when the attackers'
and victims' leader schedules align in a particular way -- cause the
victim to forge a block that extends a recent prefix of the honest
chain. In other words, the attacker may be able to induce a some more
short forks than would naturally arise, thereby increasing the number of
"wasted blocks".

* Any mitigation of this would have to be guarded by `IsAlert`: the plan
written above (ie faithful Genesis) is in principle the only valid
option for use during catch up.

* Relevant reminder: even when it's caught up to the wallclock, Genesis
prefix selection only ever chooses a single chain. Any fluctuation in
that choice is based on the peers' chains being at their reported tip
within the density window.

* Risk 2: performance regression

* ChainSync validating headers is currently the bottleneck during bulk
fetch. EG 2 peers bulk fetchs quite a bit faster than 10 peers.

* The suggested delta is the exact opposite of the plan above: when
catching up, we need many many peers in order to increase the chance
that at least one has the honest chain. (We haven't considered actual
counts rigourously.)

* Mitigation plan: the ChainSync clients need to share some work, so that
we don't revalidate the same header once-per-peer.

* This violates avg-case = worst-case, but that's OK when bulk syncing.
Perhaps we should guard this work sharing by `NotIsAlert`.

* SEPARATE WORK:

* TODO potential optimizations (Section 21.6.3)

* TODO testing etc - we'd _very much really_ like to use the ThreadNet
rewrite for this

-----

Updated on 2021 August 9, after much additional thought and broader
reconsiderations, kicked off by Javier Sagredo's observation of a stalling
attack vector in the original sketch above.

This new sketch updates much-but-not-all of the origial sketch above.

- Execution begins in the _Syncing_ state.

- While we are Syncing:

- If our valency falls below some threshold, then BlockFetch stops sending
new fetch requests until sufficient valency is recovered.

- BlockFetch can only download blocks from the headers that the density
rule approves.

- The density rule is: compare header chains based on the number of
headers in the relevant Genesis window (the 3k/f slots after the
intersection), though if the headers do not span the Genesis window
and the peer claims to have more headers we must wait for them
(because they might also be in the window).

- The Ouroboros Genesis paper proves -- excepting only disasterous
intervals -- that density rule will always strictly prefer the honest
chain over any possible alternative.

- Therefore, we require that each peer's highwater blockno is increasing
"fast enough on average" until we're at their tip, with the only
exceptional circumstance being when their latest header is beyond our
forecast range (since we don't even request a next header while that is
true).

- TODO Do we actually need that exception? Under what circumstances
would it be relevant, during Syncing?

- TODO I'm anticipating a token bucket for enforcing "fast enough on
average", but there remain plenty of details and thresholds to
consider.

- A possible refinement: if they can promise to send a specific k+1st
block (which the honest nodes would always do, up to their immutable
tip), then they're allowed to be somewhat slower, since we'll
disconnect from them if either they don't deliver that block or if
the eventual densest chain does not include that block.

- A possible refinment: each peer can offer _jump points_ that are
usefully ahead of their latest header. If some other peer has already
sent the jump point's header, then we can advance the slower peer's
ChainSync state accordingly. This can help a relatively slow
redundant peer remain connected.

- Transition from Syncing to _CaughtUp_ whenever all of:

- No peer has sent a header binary-preferable to my selection.

- No peer has sent >k headers from an intersection with my selection.

- We see every peer to its tip.

- TODO To what extent can the adversary abuse this to prevent our
transition? Even supposing validated, uninterruptible ChainSync
switches?

- TODO Perhaps we don't need it, since we assume we'll have at least
one honest peer. Their stream of headers should race ahead of the
corresponding stream of blocks until we're CaughtUp, and so that'll
hold back at least one of the other conjuncts. On the other hand, it
seems fine if we do need this, because of the timeout discussed
above.

- While we are CaughtUp:

- BlockFetch is free to download the blocks from any of our peers' headers.
It has two primary requirements, which are in tension.

- The ultimate goal of BlockFetch is to get the best blocks ASAP.
However, an imperfect best effort is tolerable, up to a point; we
consider the only consequences of the best effort's inefficiency to
be additional chain propagation delay.

- The Ouroboros protocol only considers chain length. Tiebreakers
are out of scope, so "best block" in the requirement above only
means greatest blockno. (BlockFetch is free to also consider
tiebreakers; the protocol does not care.)

- Note that the adversary claiming to have additional headers but
refusing to send them has no effect on BlockFetch while we are
CaughtUp. Only received headers matter. The worst the adversary
could do by withholding headers is intentionally timeout in order
to decrement our valency (which we might choose to require stays
about some value, see below) -- but presumably they can't ensure
we reconnect to them, so they've revealed their nature, losing
access to us, in order to possibly create a short delay.

- BlockFetch should avoid unnecessary downloads (the same block more
than once or a block we'll never select).

- When CaughtUp, we have a high priority design goal that
worst-case resource utilization is approximately the same as
average-case. If not, even well-meaning node operators will
eventually prune their node's allocated resources, thereby
creating a DoS attack vector.

- This is why we can't simply download "all blocks ASAP" or even the
same block from all peers currently offering it. Recall that the
adversary can forge arbitrarily many blocks whenever it is
elected, just not on the same chain.

- Transition from CaughtUp to Syncing whenever any of:

- The wallclock is "too far ahead" of the latest "meaningful" peer
interaction.

- TODO Sketch: we transition as soon N (?) of our peers' tips have a
time point that is more than LIM (?) behind our wallclock.

- TODO Our ChainSync timeouts will disconnect naturally, right? And so
maybe this is really just another valency limit, like that of Syncing
above.

- TODO It's safe to assume the computer has access to "inertial
reckoning" via a real-time clock hardware, right? If so, we can
immediately detect this even upon eg the machine waking from a
hibernation state. IE instead of totally relying an NTP connection,
which could also be compromised.

- Some peer sends >k headers from an intersection with my selection.

- This rule is a failsafe: We assume this shouldn't happen under
nominal circumstances (by the Common Prefix theorem in the Ouroboros
Praos paper; TODO Confirm with researchers), so we downgrade to the
more conservative state if we do observe it, since we must have
somehow fallen "too far" behind again without otherwise noticing.

-----

Updated on 2021 August 12, after a couple more conversations. In particular,
even with my recent idea of letting nodes annotate immutable headers, Edsko
observed that wouldn't let us detect eg that should transition CaughtUp ->
Syncing soon enough, eg before the adversary can send >k blocks. So in some
sense that's redundant wrt the wall-clock trigger. So what follows is the
current plan, and our work is to 1) brainstorm ways in which the wall-clock
trigger is insufficient and 2) fill in the details of eg the timeout for sending
claimed headers during Syncing.

The only differences between Syncing and CaughtUp are as follows.

- If Syncing, I won't alter my ChainDB selection (even when I mint) unless my
number of server peers exceeds a threshold.

- The Network Layer chooses this threshold in such a way that justifies us
assuming that we have at least one healthy connection to at least one peer
serving a contemporary honest chain.

- If Syncing, I'll wait for you to send me more headers if you claim to have
them, as long as you can send them at a sufficient rate (enforcement TBD).
Specifically, my ChainDB will not select >k blocks of a chain unless it has
the best density among all of my current peers' claimed headers.

- Note in particular that it's only possible to select >k blocks at time T1
from a chain that is later determined to not be the densest if the peers
offering the more dense chain had claimed at time T1 to have no more
headers to send. Honest peers would only do so if the difference-making
blocks did not exist at time T1. In which case the Praos Common Prefix
Theorem reveals a contradiction: no adversarial chain should ever be >k
blocks ahead of the best available honest chain. So this scenario only
arises if eg I wasn't connected to any honest peers at time T1.

- If CaughtUp, I don't wait for claimed headers, since I don't want the
adversary to be able to slow me down by falsely claiming to have more. The
Praos Common Prefix Theorem says I don't need to wait and even shouldn't wait;
∆ is what it is. Relatedly, BlockFetch's best effort to continually maximize
my selection's blockno is sufficient; any imperfect BlockFetch decisions
merely inflate ∆.

- BlockFetch cannot simply request all blocks from all peers, because that
opens up a variety of DoS attacks. In particular, BlockFetch should only
download necessary blocks and only download each block once. That again
limits DoS attacks to adding some delay whose per adversarial connection
severity is bounded by a timeout.

- Maybe the minimum valency for Syncing is higher than the minimum valency for
CaughtUp (see below).

- During Syncing, we allow ChainSync clients to share work.

- We'll validate a header only once even if we receive it from multiple
peers.

- We'll enrich the ChainSync protocol so that servers can offer jump points
they're committed to: we can tell a peer to jump ahead if some faster peer
already sent the header for one of the offered jump points.

- Notably, if all our peers are on the same chain, we'll be processing
headers primarily only from the fastest peer.

- We don't allow ChainSync clients to share work during CaughtUp to avoid
tempting node operators to reduce their node's resource allotment.

- If the did that, they'd fall prey to a DoS attack where the adversary
waits a while and then suddenly increases the load from the average-case
to worst-case, thereby crashing the node that now has fewer resources.
This crash harms the security of the whole network.

We transition from Syncing to CaughtUp once both of the following hold.

- No peer's latest header is binary-preferable to our current selection.

- No peer claims to have more headers to send.

- Because we wait for claimed headers during Syncing, the adversary can
delay the transition to CaughtUp by claiming to have more headers but not
sending them. A design that allows this minor DoS attack is acceptable as
long as the adversary can't do it indefinitely, since we should be
spending most of our time in CaughtUp. EG A timeout that requires a steady
increase in the blockno of the best header they've sent would suffice as
long as that timeout on average expires faster than a 50% adversary can
mint another block (ie requiring approximately <40 seconds per header on
average). We'll likely assume the honest peers can do much better than
that, so we'll likely have an even more aggressive (ie still sufficient)
timeout.

We transition from CaughtUp to Syncing once any of the following hold.

- We observe something we consider impossible while CaughtUp.

- One example is: receiving headers that show a chain that is >k ahead of
its intersection with our current selection.

- These are failsafe measures. We regard them as contributions to the
Network Layer's heuristics for the connection quality.

- Our number of server peers falls too low.

- We are assuming the Network Layer will disconnect from a peer if the
connection quality degrades too far (eg the KeepAlive falters). In
particular, we assume the Network Layer recognizes that in a sufficiently
prompt way. EG If my node loses all network connectivity, I expect my
valency to fall below the threshold well before I'm likely to mint too
many blocks.

- Ideally the Network Layer would detect an unhealthy connection (and
decrement our valency) before an adversary could trick me into selecting
>k blocks. How long that would actually take unfortunately depends on many
factors. If I was recently connected to an CaughtUp honest node, then it
should take quite a while for the adversary to even mint such a long
chain. But if, for example, I'm running on a computer that just woke up
from a several day sleep, I need to revert to Syncing ASAP.

- The threshold must be greater than one, since a common setup for
block-producer involves running two local nodes, one proxying for the
other. At least the proxy node of that pair needs to transition back to
Syncing if that it loses connection to all other nodes, even if the pair
remain connected to each other.

- That same block-producing pair example suggests we need to allow an
override for the valency thresholds: the internal node in that pair will
only ever have exactly one connection, and that connection is to be
trusted. In fact, that internal peer never needs to be in the Syncing
state.

Note that the only way for an adversarial peer to cause me to transition from
CaughtUp to Syncing is to sufficiently lower my valency by degrading the quality
of my network connections.