-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Describe the bug
ARM64 (AWS Graviton) shows significantly slower trie computation during sync and recovery operations compared to x86_64, despite having hardware SHA extensions and showing faster steady-state EVM execution.
During chaindata rebuild from snapshots, ARM64 computes trie entries at ~175/sec vs ~450/sec on comparable x86_64 hardware. This results in sync times of 2+ hours vs ~35 minutes for the same ~19k blocks.
Steady-state block processing shows ARM64 is actually ~15% faster (14.6-14.9 avg mgas/s vs 12.9-13.2 mgas/s), so the slowdown is specific to trie/commitment computation paths.
System information
Erigon version: v3.3.2
OS & Version: Fedora Linux 43 (kernel 6.x)
Chain/Network: Ethereum Mainnet (archive mode)
Hardware comparison:
- ARM64: AWS x2gd.xlarge (Graviton2, 4 vCPU, 64GB RAM, NVMe)
- x86_64: AWS i7i.xlarge (Sapphire Rapids, 4 vCPU, 32GB RAM, NVMe)
CPU features:
- ARM64:
sha1 sha2 sha3 aes pmull crc32 atomics sve sve2 - x86_64:
sha_ni aes vaes avx512f avx512bw avx512vl
Expected behaviour
Trie computation performance should be comparable across architectures with similar hardware crypto support. Initial sync from same snapshot step should take similar time.
Actual behaviour
During chaindata rebuild (both starting from snapshot step ~2032):
| Metric | x86_64 | ARM64 |
|---|---|---|
| Trie entries/sec | ~450 | ~175 |
| ufdur (unflushed duration) | 6-8 min | 12-15 min |
| Sync time (~19k blocks) | ~35 min | 2+ hours |
Steady-state comparison (same blocks, same time window):
| Metric | x86_64 | ARM64 |
|---|---|---|
| average mgas/s | 12.9-13.2 | 14.6-14.9 |
Steps to reproduce the behaviour
- Run Erigon v3.3.2 in archive mode on both ARM64 and x86_64 instances
- Stop Erigon, remove chaindata:
rm -rf /var/lib/erigon/chaindata/* - Restart Erigon and observe trie computation rates in logs (
[agg] computing trieentries) - Compare
ufdurvalues and overall sync time
Additional context
Likely cause is architecture-specific optimizations in trie/commitment code path - possibly Keccak-256 hashing assembly or MDBX B-tree operations favoring x86_64.
We're happy to cooperate and can provide detailed logs, run profiling, or test patches if that would help investigate this issue.
Related issues
See also #6508 (ARM64 sync performance concerns, closed without resolution)