Skip to content

Batch attestation slashability checking #1914

Open
@michaelsproul

Description

@michaelsproul

Description

With large numbers of validators and heavy disk congestion, we've observed the slashing protection database timing out with errors like this:

: Nov 16 00:50:31.056 INFO Successfully published attestation      type: unaggregated, slot: 39850, committee_index: 2, head_block: 0x3bd75ebc65de718b36911eaab7dad3a9ef7ca44c7ba03d32c7092195c43bcf43, service: attestation
: Nov 16 00:50:32.838 CRIT Not signing slashable block             error: SQLPoolError("Error(None)")
: Nov 16 00:50:32.838 CRIT Error whilst producing block            message: Unable to sign block, service: block

The default timeout is 5 seconds.

Presently, we sign attestations one at a time and broadcast them, which means that each attester requires a new SQLite database transaction.

https://github.com/sigp/lighthouse/blob/master/validator_client/src/attestation_service.rs#L332-L404

To alleviate the congestion slightly, we could switch to an algorithm like:

  1. Begin an SQLite transaction txn
  2. Check and sign all attestations as part of txn
  3. Commit txn
  4. Broadcast attestations

That way we preserve the property that an attestation is only broadcast if its signature has been persisted to disk (which is crash-safe). Broadcasting after each check but before the transaction commits would violate this property.

Version

Lighthouse v0.3.4

Metadata

Metadata

Assignees

Labels

databaseoptimizationSomething to make Lighthouse run more efficiently.v7.1.0Post-Electra releaseval-clientRelates to the validator client binary

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions