Skip to content

Conversation

@gilescope
Copy link
Contributor

Grandpa has to finalise the start of each session. Ergo if you take the most recent finalised head from a node, even if you ask another node (due to dns load balancing or whatever) that hasn't updated itself to the just announced finality, then you can guarantee that if you're asking more than a session height below that then asking for blocks by number is safe because whichever node you hit will have the correct answer.

This PR makes use of this knowledge to reduce the indexer start time from many minutes, to being able to start indexing from the first block within seconds.

TODO:

  • Simplify
  • Productionise error handling

@hseeberger
Copy link
Collaborator

This makes some assumptions, e.g. always using Grandpa, session height, etc. Is it guaranteed that these assumptions will always hold?

Also, the vast majority of time spent catching up with a Node that is adhead of the Indexer is not during traversing back, but during actually indexing the missing blocks (moving forward from the last indexed to the chain head). So this optimization does not buy us too much – maybe reduces the overall time to 90% – but makes the implementation harder to understand/maintain and maybe brittle (see the above question).

@hseeberger
Copy link
Collaborator

@gilescope, during today's reset on preview, traversing back went at about 600 blocks/sec. Typical indexing rates for non-trivial (empty) blocks are 10 blocks/sec or less. So this "optimization" would save us less than 2% of catch up time. Therefore I'd like to keep the current easier to understand/maintain approach.

@cosmir17
Copy link
Contributor

Thanks for this, Giles. The insight about GRANDPA session finalisation is interesting - I didn't know blocks older than a session could be safely fetched by number.
I leave the PR discussion to you and Heiko 🙏

@gilescope
Copy link
Contributor Author

It's reducing the load on the node by tens of thousands of network calls.

@hseeberger
Copy link
Collaborator

It's reducing the load on the node by tens of thousands of network calls.

@gilescope, this is only the case for full resets. We are getting close to mainnet and the most likely and most frequent scenarios where catching up happens are redeployments of the Indexer where it will have to catch up some 10 blocks (60 seconds). In these cases the load on the node can be neglected.

Even in the presence of a desaster recovery base on a daily database backup we are talking about about 10,000 blocks. Well, these are not 10,000 requests, but just 10,000 block headers (!) conveyed via Websocket. As block headers are small and Websocket connections are permanent, this should not put too much load on the Node. And hopefully this only happens occasionally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants