Skip to content

[Tracking] 1M TPS benchmarks #13130

Open
Open
@Trisfald

Description

@Trisfald

Tracking issues for all tasks related to benchmarks, for the 1 million TPS initiative.

Current Max TPS

State size With RPC nodes Number of shards Action TPS
4 accounts / shard no 70 native transfer 108.5k

Reference benchmark

A 20 shards benchmark which we can use as a baseline to measure performance improvements in neard.

Unlimited config

Baseline max TPS: 56k

Benchmark run

  • Accounts per shard: 2
  • Block time: 2.5s
  • Max block time: 6s
  • produce_chunk_add_transactions_time_limit: 800ms
  • Validator mandates: 1
  • Unlimited setup (no state witness limits, no bandwidth scheduler, no gas limit, no congestion control)
  • All VMs are deployed in the same zone
  • Transactions are injected into chunk producers

Realistic config

Baseline max TPS: 14k

Benchmark run

  • Accounts per shard: 50k
  • Block time: 1.3s
  • Max block time: 4s
  • produce_chunk_add_transactions_time_limit: 400ms
  • Validator mandates: 14 (2/3 of stake)
  • Unlimited setup (no state witness limits, no bandwidth scheduler, no gas limit, no congestion control)
  • All VMs are deployed in the same zone
  • Transactions are injected into chunk producers

Tasks

High priority

  • Use different hardware
    We currently use N2D class machines. We can try something with faster CPU.
    Using machines with better single threaded performance (C2D) yielded an improvement in TPS of ~%15 dashboard
  • Investigate chunk misses bottleneck
    The chain breaks when there's too much load and chunks can't be endorsed in time.
  • Run a more realistic network @Trisfald
    • 20 shards
    • large state
    • mainnet-like config
  • Investigate how chuck validators impact performance

Low priority

  • Attempt to create more shards
    I've stopped at 70 shards right now. With 80 shards I had trouble even to create accounts. It might be possible to push the number of shards to 75 or, with some tweaks, higher.
  • Try even higher block time
    Especially on larger network, increasing block time has been an efficient way to improve TPS. We can try more aggressive configurations.
    Not very successful: increasing block time to very large number didn't help TPS

Benchmarks

State generation

Create genesis, adjust nodes configuration, and build a suitable initial database state.

  • Tool to generate uniform state locally, for 50 shards
    • Minimal state (a few accounts): synth-bm from CRT
    • Large state
  • Tool to generate uniform state in forknet, for 50 shards
    • Minimal state (a few accounts)
    • Large state

Traffic generation

Generate transactions to stress the network.

  • Native token transfers
    • Evaluate existing tools made by CRT
    • Tool to generate intra-shard traffic locally
    • Tool to generate cross-shard traffic locally
    • Tool to generate intra-shard traffic in forknet
    • Tool to generate cross-shard traffic in forknet
  • Fungible token transfers:
    TODO

Benchmark setup

  • Automation to run local benchmarks
    • Support for native token transfers
    • Support for fungible token transfers
  • Automation to run multi-node benchmarks (forknet)
    • Support for native token transfers
    • Support for fungible token transfers

Benchmark runs

  • Native token transfers
    • Intra-shard traffic
      • Single node benchmark
      • Multi node local benchmark
      • Forknet benchmark
    • Cross-shard traffic
      • Single node benchmark
      • Multi node local benchmark
      • Forknet benchmark

Issues found

  • High priority

    • [Low number of shards] Client actor bottleneck: TPS is suboptimal due to the single threaded design of client actor. See also ClientActor ingress saturates at ~5K TPS #12963.
    • [High number of shards] Chunk production / endorsement bottleneck: the chain TPS are limited by the appearance of chunk misses, that create a snowball effect and keep increasing in number. See Decouple chunk endorsement processing from client actor thread #13190
    • RPC nodes are unable to keep up with the network, because they track all shards. In a real network they also need to sustain the extra load of accepting user transactions. Even without sending any transaction to the RPC node, and with memtries, I have observed TX application speed of 15k-8k TPS while the chain can go above 40k TPS
  • Medium priority

    • The number of peer connections must be equal or higher than the number of chunk producers in the network. If not, the chain TPS degrades significantly.
    • The bigger the state, the lower the TPS. This affects single shard performance. Which is expected, but perhaps not to this extent.
      1k accounts -> 4k TPS
      100k accounts -> 2.7k TPS
    • [x] The chain breaks at relatively low TPS due to exponentially growing block times, caused by chunk misses because of lack of endorsements. This is a side effect of the client actor bottleneck: endorsements are not processed in time. Does not happen if proper gas limits are in place.
  • Low priority

    • Sending transactions directly to chunk validators doesn't work, if they don't track all shards, due to timeouts. Workaround: Inject transactions with the transaction generator tool

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions