Open
Description
Description
During performance testing of the DA bridge node, we discovered that the node is significantly underutilizing available system resources during synchronization, particularly when syncing from scratch.
Network - 32MB-100k
- ODS Block Size -> ~32 MB
- Q4 Block Size -> 128 MB
Current Behavior
- DA/BN Node performs at a flat
~800 Write/OPs
- Network average in:
~62–63 Mb/s
- BBR
Existing Capabilities
DA Bridge Node
- CPU: 32 cores
- Memory 124.0 GiB
- Network: 10 Gbps
- Storage: 16 TB / 16k IOPS, 1000 MiB/s throughput
Validator
- CPU: 32 cores
- Memory 126.0 GiB
- Network: 3.2 Gbps
- Storage: 15k IOPS
DA Configuration used
config.toml
[Node]
StartupTimeout = "2m0s"
ShutdownTimeout = "2m0s"
[Core]
IP = ""
Port = "9090"
[State]
DefaultKeyName = "my_celes_key.info"
DefaultBackendName = "test"
[P2P]
ListenAddresses = ["/ip4/0.0.0.0/udp/2121/quic-v1/webtransport", "/ip6/::/udp/2121/quic-v1/webtransport", "/ip4/0.0.0.0/tcp/2121"]
AnnounceAddresses = []
NoAnnounceAddresses = ["/ip4/127.0.0.1/udp/2121/quic-v1/webtransport", "/ip4/0.0.0.0/udp/2121/quic-v1/webtransport", "/ip6/::/udp/2121/quic-v1/webtransport", "/ip4/0.0.0.0/udp/2121/quic-v1", "/ip4/127.0.0.1/udp/2121/quic-v1", "/ip6/::/udp/2121/quic-v1", "/ip4/0.0.0.0/tcp/2121", "/ip4/127.0.0.1/tcp/2121", "/ip6/::/tcp/2121"]
MutualPeers = []
PeerExchange = true
RoutingTableRefreshPeriod = "1m0s"
[P2P.ConnManager]
Low = 800
High = 1000
GracePeriod = "1m0s"
[RPC]
Address = "0.0.0.0"
Port = "26658"
[Gateway]
Address = "0.0.0.0"
Port = "26659"
Enabled = false
[Share]
UseShareExchange = true
[Share.EDSStoreParams]
GCInterval = "0s"
RecentBlocksCacheSize = 10
BlockstoreCacheSize = 128
[Share.ShrExEDSParams]
ServerReadTimeout = "5s"
ServerWriteTimeout = "1m0s"
HandleRequestTimeout = "1m0s"
ConcurrencyLimit = 10
BufferSize = 32768
[Share.ShrExNDParams]
ServerReadTimeout = "5s"
ServerWriteTimeout = "1m0s"
HandleRequestTimeout = "1m0s"
ConcurrencyLimit = 10
[Share.PeerManagerParams]
PoolValidationTimeout = "2m0s"
PeerCooldown = "3s"
GcInterval = "30s"
EnableBlackListing = false
[Share.Discovery]
PeersLimit = 5
AdvertiseInterval = "1h0m0s"
[Header]
TrustedHash = ""
TrustedPeers = []
[Header.Store]
StoreCacheSize = 4096
IndexCacheSize = 16384
WriteBatchSize = 2048
[Header.Syncer]
TrustingPeriod = "336h0m0s"
[Header.Server]
WriteDeadline = "8s"
ReadDeadline = "1m0s"
RangeRequestTimeout = "10s"
[Header.Client]
MaxHeadersPerRangeRequest = 64
RangeRequestTimeout = "8s"
Investigation Points
- Increase daser parallel workers count
- Tune ConcurrencyLimit for network bandwidth utilization
- Adjust BlockstoreCacheSize for memory usage
- Review WriteBatchSize vs IOPS capacity
- Evaluate BufferSize for throughput optimization
Impact
This significantly affects node operators who need to:
- Relocate nodes
- Perform full sync from scratch
- Recover from data loss scenarios
Would be great to have the ability to increase/fine tune
the DA node configuration parameters in such a way to match the hardware capacity for a faster synchronisation.