Open
Description
Tracking issues for all tasks related to benchmarks, for the 1 million TPS initiative.
Current Max TPS
State size | With RPC nodes | Number of shards | Action | TPS |
---|---|---|---|---|
4 accounts / shard | no | 70 | native transfer | 108.5k |
Reference benchmark
A 20 shards benchmark which we can use as a baseline to measure performance improvements in neard
.
Unlimited config
Baseline max TPS: 56k
- Accounts per shard: 2
- Block time: 2.5s
- Max block time: 6s
- produce_chunk_add_transactions_time_limit: 800ms
- Validator mandates: 1
- Unlimited setup (no state witness limits, no bandwidth scheduler, no gas limit, no congestion control)
- All VMs are deployed in the same zone
- Transactions are injected into chunk producers
Realistic config
Baseline max TPS: 14k
- Accounts per shard: 50k
- Block time: 1.3s
- Max block time: 4s
- produce_chunk_add_transactions_time_limit: 400ms
- Validator mandates: 14 (2/3 of stake)
- Unlimited setup (no state witness limits, no bandwidth scheduler, no gas limit, no congestion control)
- All VMs are deployed in the same zone
- Transactions are injected into chunk producers
Tasks
High priority
- Use different hardware
We currently use N2D class machines. We can try something with faster CPU.
Using machines with better single threaded performance (C2D) yielded an improvement in TPS of ~%15 dashboard - Investigate chunk misses bottleneck
The chain breaks when there's too much load and chunks can't be endorsed in time.- One possible fix: Decouple chunk endorsement processing from client actor thread #13190
- Needs more investigation, maybe creating additional metrics
- Run a more realistic network @Trisfald
- 20 shards
- large state
- mainnet-like config
- Investigate how chuck validators impact performance
Low priority
- Attempt to create more shards
I've stopped at 70 shards right now. With 80 shards I had trouble even to create accounts. It might be possible to push the number of shards to 75 or, with some tweaks, higher. - Try even higher block time
Especially on larger network, increasing block time has been an efficient way to improve TPS. We can try more aggressive configurations.
Not very successful: increasing block time to very large number didn't help TPS
Benchmarks
State generation
Create genesis, adjust nodes configuration, and build a suitable initial database state.
- Tool to generate uniform state locally, for 50 shards
- Minimal state (a few accounts):
synth-bm
from CRT - Large state
- Minimal state (a few accounts):
- Tool to generate uniform state in forknet, for 50 shards
- Minimal state (a few accounts)
- Large state
Traffic generation
Generate transactions to stress the network.
- Native token transfers
- Evaluate existing tools made by CRT
- Tool to generate intra-shard traffic locally
- Tool to generate cross-shard traffic locally
- Tool to generate intra-shard traffic in forknet
- Tool to generate cross-shard traffic in forknet
- Fungible token transfers:
TODO
Benchmark setup
- Automation to run local benchmarks
- Support for native token transfers
- Support for fungible token transfers
- Automation to run multi-node benchmarks (forknet)
- Support for native token transfers
- Support for fungible token transfers
Benchmark runs
- Native token transfers
- Intra-shard traffic
- Single node benchmark
- Multi node local benchmark
- Forknet benchmark
- Cross-shard traffic
- Single node benchmark
- Multi node local benchmark
- Forknet benchmark
- Intra-shard traffic
Issues found
-
High priority
- [Low number of shards] Client actor bottleneck: TPS is suboptimal due to the single threaded design of client actor. See also ClientActor ingress saturates at ~5K TPS #12963.
- [High number of shards] Chunk production / endorsement bottleneck: the chain TPS are limited by the appearance of chunk misses, that create a snowball effect and keep increasing in number. See Decouple chunk endorsement processing from client actor thread #13190
- RPC nodes are unable to keep up with the network, because they track all shards. In a real network they also need to sustain the extra load of accepting user transactions. Even without sending any transaction to the RPC node, and with memtries, I have observed TX application speed of 15k-8k TPS while the chain can go above 40k TPS
-
Medium priority
- The number of peer connections must be equal or higher than the number of chunk producers in the network. If not, the chain TPS degrades significantly.
- The bigger the state, the lower the TPS. This affects single shard performance. Which is expected, but perhaps not to this extent.
1k accounts -> 4k TPS
100k accounts -> 2.7k TPS [x] The chain breaks at relatively low TPS due to exponentially growing block times, caused by chunk misses because of lack of endorsements.This is a side effect of the client actor bottleneck: endorsements are not processed in time. Does not happen if proper gas limits are in place.
-
Low priority
- Sending transactions directly to chunk validators doesn't work, if they don't track all shards, due to timeouts. Workaround: Inject transactions with the
transaction generator
tool
- Sending transactions directly to chunk validators doesn't work, if they don't track all shards, due to timeouts. Workaround: Inject transactions with the
Metadata
Metadata
Assignees
Labels
No labels