Skip to content

Redesign the indexing process to sync bitcoin faster #180

@AlexITC

Description

@AlexITC

We need to optimize the blockchain syncing process.

Expected behavior

Indexing Bitcoin should take at most 1 week.

Actual behavior

Indexing Bitcoin takes months!

Steps to reproduce the behavior

Just mount a full Bitcoin node, and follow the steps to sync the explorer with it.

Notes

This is a very complex task, which involves work from the infra side to the backend work.

On the infra side, we need to use a load balancer for the bitcoind RPC API, based on previous experiences, the minimum requirements are:

  • Each node has 8 CPUs, and 8GB on ram, and SSD.
  • 3 bitcoind instances.
  • The necessary config on bitcoind should be tweaked to accept lots of concurrent calls.

On the explorer side:

  • A huge server with lots of CPUs (potentially 32/64 at least).
  • The postgres instance should be tweaked accordingly, it's still unknown what's the ideal server capacity, but should handle 2TB of data properly.

On the approach to take, the syncing process should be done in several stages (looks like a good candidate for akka-streams):

  • Block headers (mandatory before any other stage).
  • Transaction headers.
  • Transaction outputs (depends on transaction headers).
  • Transaction inputs (depends on the outputs).
  • Block filter (depends on the outputs).
  • TPoS contracts (depends on the outputs)
  • Block rewards (depends on the outputs, potentially could be synced after the block headers).
  • Address balances (depends on the inputs)
  • Address transaction details (depends on the inputs)
    keeping 3 nodes at minimum, with 8GB on ram or more, and

As we don't require the whole data to be indexed, ideally we should be able to disable some stages to speed up the process, and save space, these are good candidates (sql tables):

  • balances.
  • tpos_contracts.
  • block_rewards
  • address_transaction_details

All of this would affect the exposed API, because we shouldn't return blocks that aren't fully synced, also, it's important to consider potential rollbacks while syncing the data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededroadmapA feature that will be developedserverChanges required on the server project

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions