Fix amnesia faults vulnerability #308#310
Conversation
| VoteExtensionSignature []byte `json:"vote_ext_signature,omitempty"` | ||
|
|
||
| // Consensus lock tracking to prevent amnesia faults | ||
| ConsensusLock ConsensusLock `json:"consensus_lock,omitempty"` |
There was a problem hiding this comment.
omitzero? omitempty does not apply to custom types: https://pkg.go.dev/encoding/json
There was a problem hiding this comment.
It will show up as an empty struct {} when not locked now.
There was a problem hiding this comment.
exactly, with omitzero it won't show up at all.
|
|
||
| // IsLocked returns true if there is an active consensus lock | ||
| func (lock *ConsensusLock) IsLocked() bool { | ||
| return lock.Height > 0 && lock.Round >= 0 |
There was a problem hiding this comment.
it should be possible to lock at height 0 and round 0, no?
| Height int64 `json:"height"` | ||
| Round int64 `json:"round"` | ||
| Value []byte `json:"value,omitempty"` // The locked block hash/value | ||
| ValueType string `json:"value_type"` // "block" or "nil" |
There was a problem hiding this comment.
how about having just Value? nil for nil locked, non-nil for block locked.
|
|
||
| // ConsensusLockViolationError represents an error when trying to sign a block that violates a consensus lock | ||
| type ConsensusLockViolationError struct { | ||
| msg string |
There was a problem hiding this comment.
nit: you can quickly define a custom error types as type ErrName { error } and then construct it using ErrName{fmt.Errorf(...)}. And the most proper way of defining structured errors is type ErrName { <custom fields> } and postponing the fmt.Sprintf to func (e *ErrName) Error() { return fmt.Sprintf() }, so that message rendering can be skipped altogether if not needed.
| func IsConsensusLockStepViolationError(err error) bool { | ||
| _, ok := err.(*ConsensusLockStepViolationError) | ||
| return ok | ||
| } |
There was a problem hiding this comment.
nit: the canonical way to casting errors is by using errors.As().
| // For now, we'll use a placeholder that returns the first 32 bytes as a hash | ||
| if len(signBytes) < 32 { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
that needs to be fixed before merging, no?
There was a problem hiding this comment.
and we need an e2e test that it actually works
|
Thanks @pompon0 I addressed your comments |
| } | ||
|
|
||
| // Handle consensus lock updates according to Tendermint rules | ||
| signStateConsensus.ConsensusLock = updateConsensusLock(ccs.lastSignState.ConsensusLock, hrst.HRSKey(), req.SignBytes) |
There was a problem hiding this comment.
it is misleading that you are setting ConsensusLock to the old value and then immediately overwriting it. inline the updateConsensusLock call instead.
| } | ||
|
|
||
| // Handle consensus lock updates according to Tendermint rules | ||
| signStateConsensus.ConsensusLock = updateConsensusLock(ccs.lastSignState.ConsensusLock, hrst.HRSKey(), req.SignBytes) |
There was a problem hiding this comment.
the "update" prefix is implying that updateConsensusLock is mutating its argument(s). Consider something like "nextConsensusLock" or some other name which would imply that it is a pure function.
|
|
||
| // If we're signing for a different height, the lock is no longer relevant | ||
| if hrs.Height != signState.ConsensusLock.Height { | ||
| return nil |
There was a problem hiding this comment.
nit: hrs.Height > ... for defense-in-depth
| // For PROPOSAL and PREVOTE messages in rounds R' >= R, only allow signing for the locked value V | ||
| if (hrs.Step == stepPropose || hrs.Step == stepPrevote) && hrs.Round >= signState.ConsensusLock.Round { | ||
| // Extract the block hash from the sign bytes to compare with the locked value | ||
| blockHash := extractBlockHashFromSignBytes(signBytes, hrs.Step) |
There was a problem hiding this comment.
make it return an error if parsing fails
| blockHash := extractBlockHashFromSignBytes(signBytes, hrs.Step) | ||
| if blockHash == nil { | ||
| // If we can't extract the block hash, allow signing (fallback to existing behavior) | ||
| return nil |
There was a problem hiding this comment.
imo, it is more desirable to fail when we don't know what we are signing.
| } | ||
|
|
||
| // For PROPOSAL and PREVOTE messages in rounds R' >= R, only allow signing for the locked value V | ||
| if (hrs.Step == stepPropose || hrs.Step == stepPrevote) && hrs.Round >= signState.ConsensusLock.Round { |
There was a problem hiding this comment.
I think we have a problem here - we should compare pol_round against ConsensusLock.Round here, not current round. For proposal signing we can extract it, but prevotes do not include pol round.
There was a problem hiding this comment.
This is Tendermint type of stuff. What matters for Horcrux signing is consistency, which is only relative to the round of the latest signed precommit. That is, one thing is the highest round where Tendermint state sees the highest POL. But for singing on Horcrux what matters is that whatever is proposed for signing next is consistent with what has been proposed for signing before, not the POL. And the only time that we need to update and carry over locks when it comes to signing is after a Precommit message.
This implementation of Horcrux should not conflict with a correct Tendermint signer since nodes will lock on a value only when sending a Precommit message, and only will sign Prevotes or send Proposals for the locked value until the next Precommit message (if the Tendermint implementation follows the Tendermint spec from this paper as the official documentation says).
There was a problem hiding this comment.
discussed offline. We need to amend API to handle branch on line 28 of https://arxiv.org/pdf/1807.04938
|
Thanks for the comments, just addressed them, except for pushback on one of them (see the conversation in the related comment). |
cf1df5f to
800e8e7
Compare
|
|
||
| // Prevote quorums tracking for Tendermint Algorithm Line 28 | ||
| // Map: height -> (value -> oldest round that received 2f+1 prevotes) | ||
| PrevoteQuorums map[string]map[string]int64 `json:"prevote_quorums"` // height -> (value -> round) |
There was a problem hiding this comment.
why do you use string to index by height?
|
Reverted to the version we discussed in sync that uses the POL round. |
|
Been watching this PR closely and it seems like this has stalled out. Is the original concern still a problem and is this expected to be merged? |
Problem Description
This implementation addresses a critical safety vulnerability (#308 ) in Horcrux that can lead to chain forks in Tendermint-based networks. The vulnerability, known as the "amnesia fault," occurs when Horcrux nodes in a distributed validator setup lose track of their consensus locks due to network partitions, allowing them to sign conflicting blocks.
The Amnesia Fault Scenario
Root Cause
The root cause is that Horcrux is stateless with respect to Tendermint consensus locks. The remote signer only tracks what it has signed (high watermark) but doesn't maintain the consensus-critical state of what the validator is locked on. This allows the validator to "forget" its lock and sign conflicting blocks.
Solution Overview
This implementation adds consensus lock tracking to Horcrux, making the remote signer consensus-aware. The solution ensures that:
Tendermint Locking Rules Implementation
The implementation follows the correct Tendermint consensus locking rules:
If a PRECOMMIT for value V is signed in round R for tendermint's consensus instance Id then:
Key Points:
Implementation Details
Data Structures
ConsensusLock
Updated SignState
The
SignStatestructure now includes aConsensusLockfield to track the current consensus lock state.Key Functions
ValidateConsensusLock
Lock Update Logic
The consensus lock is updated in three scenarios:
Integration Points
ThresholdValidator.Sign()
LocalCosigner.sign()
SignState.blockDoubleSign()