Skip to content

ARM64: Trie computation performance during sync/recovery #18398

@lemenkov

Description

@lemenkov

Describe the bug

ARM64 (AWS Graviton) shows significantly slower trie computation during sync and recovery operations compared to x86_64, despite having hardware SHA extensions and showing faster steady-state EVM execution.

During chaindata rebuild from snapshots, ARM64 computes trie entries at ~175/sec vs ~450/sec on comparable x86_64 hardware. This results in sync times of 2+ hours vs ~35 minutes for the same ~19k blocks.

Steady-state block processing shows ARM64 is actually ~15% faster (14.6-14.9 avg mgas/s vs 12.9-13.2 mgas/s), so the slowdown is specific to trie/commitment computation paths.

System information

Erigon version: v3.3.2

OS & Version: Fedora Linux 43 (kernel 6.x)

Chain/Network: Ethereum Mainnet (archive mode)

Hardware comparison:

  • ARM64: AWS x2gd.xlarge (Graviton2, 4 vCPU, 64GB RAM, NVMe)
  • x86_64: AWS i7i.xlarge (Sapphire Rapids, 4 vCPU, 32GB RAM, NVMe)

CPU features:

  • ARM64: sha1 sha2 sha3 aes pmull crc32 atomics sve sve2
  • x86_64: sha_ni aes vaes avx512f avx512bw avx512vl

Expected behaviour

Trie computation performance should be comparable across architectures with similar hardware crypto support. Initial sync from same snapshot step should take similar time.

Actual behaviour

During chaindata rebuild (both starting from snapshot step ~2032):

Metric x86_64 ARM64
Trie entries/sec ~450 ~175
ufdur (unflushed duration) 6-8 min 12-15 min
Sync time (~19k blocks) ~35 min 2+ hours

Steady-state comparison (same blocks, same time window):

Metric x86_64 ARM64
average mgas/s 12.9-13.2 14.6-14.9

Steps to reproduce the behaviour

  1. Run Erigon v3.3.2 in archive mode on both ARM64 and x86_64 instances
  2. Stop Erigon, remove chaindata: rm -rf /var/lib/erigon/chaindata/*
  3. Restart Erigon and observe trie computation rates in logs ([agg] computing trie entries)
  4. Compare ufdur values and overall sync time

Additional context

Likely cause is architecture-specific optimizations in trie/commitment code path - possibly Keccak-256 hashing assembly or MDBX B-tree operations favoring x86_64.

We're happy to cooperate and can provide detailed logs, run profiling, or test patches if that would help investigate this issue.

Related issues

See also #6508 (ARM64 sync performance concerns, closed without resolution)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions