Skip to content

Bug: Invalid partial signatures cause deadlock #296

@sh-cha

Description

@sh-cha

Issue Description

Partial signatures are being used for double sign checks without verification of whether they've been successfully combined into a final signature.

Reproduction Scenario

We use two Tendermint nodes (node0, node1) with a 2/3 threshold Horcrux setup:

  1. node0 dies
  2. node1 generates a block, all cosigners synchronize with the proposal
  3. node1 requests signatures for Prevote
  4. cosigner1 is the Leader and receives the signature request
    • cosigner0 is dead
    • SetNoncesAndSign is only received by cosigner1 and cosigner2
  5. cosigner1 (Leader) dies before combining and sending the signature
    • ShareSigned event is never emitted
  6. node1 dies

At this point:

  • cosigner0: at step1
  • cosigner1, cosigner2: at step2

Recovery Attempt and Failure

  1. node0 and node1 come back online
    • node0 receives blocks through node1 but attempts nil vote due to timeout
  2. cosigner2 is now the Leader
    • It receives the request and attempts SetNoncesAndSign on each cosigner
  3. cosigner1 and cosigner2 produce error:
    already signed vote with non-nil BlockID. refusing to sign vote on nil BlockID
    
  4. cosigner0 signs with nil BlockID
  5. All cosigners are at step 2, but cosigner0 has different signbytes
  6. node1 attempts to replay via WAL, tries Prevote again with the existing BlockID
  7. Process halts with error:
    error saving last sign state initiated: same block error, but we are not still waiting for signature: already signed vote with nil BlockID. refusing to sign vote on non-nil BlockID
    
sequenceDiagram
    participant node0
    participant node1
    participant cosigner0
    participant cosigner1 as cosigner1 (Leader)
    participant cosigner2
    
    Note over node0: Dies
    node1->>cosigner1: Request Prevote signature
    cosigner1->>cosigner1: Becomes leader
    cosigner1->>cosigner1: SetNoncesAndSign
    cosigner1->>cosigner2: SetNoncesAndSign
    Note over cosigner0: Dead, no participation
    Note over cosigner1: Dies before combining signatures
    Note over node1: Dies
    
    Note over node0, cosigner2: Recovery Phase
    Note over node0, node1: Both restart
    node0->>cosigner2: Request nil vote (timeout)
    Note over cosigner2: Now Leader
    cosigner2->>cosigner0: SetNoncesAndSign
    cosigner2->>cosigner1: SetNoncesAndSign
    cosigner2->>cosigner2: SetNoncesAndSign
    
    cosigner0->>cosigner2: Signs with nil BlockID
    cosigner1--xcosigner2: Error: already signed non-nil BlockID
    cosigner2--xcosigner2: Error: already signed non-nil BlockID
    
    node1->>cosigner2: Replay Prevote with existing BlockID
    cosigner2--xnode1: Error: already signed vote with nil BlockID
Loading

Required Mechanisms

  1. Each cosigner needs a mechanism to know whether the final signature was successful or failed, and whether their partial signature was valid or invalid.
  2. Invalid partial signatures should not be used for double sign checks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions