Skip to content

[Performance] DA Bridge Node Not Utilising Full Storage/Network Capacity During Sync #4108

Open
@aWN4Y25pa2EK

Description

@aWN4Y25pa2EK

Description

During performance testing of the DA bridge node, we discovered that the node is significantly underutilizing available system resources during synchronization, particularly when syncing from scratch.

Network - 32MB-100k

  • ODS Block Size -> ~32 MB
  • Q4 Block Size -> 128 MB

Current Behavior

  • DA/BN Node performs at a flat ~800 Write/OPs
  • Network average in: ~62–63 Mb/s
  • BBR

Existing Capabilities

DA Bridge Node

  • CPU: 32 cores
  • Memory 124.0 GiB
  • Network: 10 Gbps
  • Storage: 16 TB / 16k IOPS, 1000 MiB/s throughput

Validator

  • CPU: 32 cores
  • Memory 126.0 GiB
  • Network: 3.2 Gbps
  • Storage: 15k IOPS

DA Configuration used

config.toml
[Node]
  StartupTimeout = "2m0s"
  ShutdownTimeout = "2m0s"
[Core]
  IP = ""
  Port = "9090"
[State]
  DefaultKeyName = "my_celes_key.info"
  DefaultBackendName = "test"
[P2P]
  ListenAddresses = ["/ip4/0.0.0.0/udp/2121/quic-v1/webtransport", "/ip6/::/udp/2121/quic-v1/webtransport", "/ip4/0.0.0.0/tcp/2121"]
  AnnounceAddresses = []
  NoAnnounceAddresses = ["/ip4/127.0.0.1/udp/2121/quic-v1/webtransport", "/ip4/0.0.0.0/udp/2121/quic-v1/webtransport", "/ip6/::/udp/2121/quic-v1/webtransport", "/ip4/0.0.0.0/udp/2121/quic-v1", "/ip4/127.0.0.1/udp/2121/quic-v1", "/ip6/::/udp/2121/quic-v1", "/ip4/0.0.0.0/tcp/2121", "/ip4/127.0.0.1/tcp/2121", "/ip6/::/tcp/2121"]
  MutualPeers = []
  PeerExchange = true
  RoutingTableRefreshPeriod = "1m0s"
  [P2P.ConnManager]
    Low = 800
    High = 1000
    GracePeriod = "1m0s"
[RPC]
  Address = "0.0.0.0"
  Port = "26658"
[Gateway]
  Address = "0.0.0.0"
  Port = "26659"
  Enabled = false
[Share]
  UseShareExchange = true
  [Share.EDSStoreParams]
    GCInterval = "0s"
    RecentBlocksCacheSize = 10
    BlockstoreCacheSize = 128
  [Share.ShrExEDSParams]
    ServerReadTimeout = "5s"
    ServerWriteTimeout = "1m0s"
    HandleRequestTimeout = "1m0s"
    ConcurrencyLimit = 10
    BufferSize = 32768
  [Share.ShrExNDParams]
    ServerReadTimeout = "5s"
    ServerWriteTimeout = "1m0s"
    HandleRequestTimeout = "1m0s"
    ConcurrencyLimit = 10
  [Share.PeerManagerParams]
    PoolValidationTimeout = "2m0s"
    PeerCooldown = "3s"
    GcInterval = "30s"
    EnableBlackListing = false
  [Share.Discovery]
    PeersLimit = 5
    AdvertiseInterval = "1h0m0s"
[Header]
  TrustedHash = ""
  TrustedPeers = []
  [Header.Store]
    StoreCacheSize = 4096
    IndexCacheSize = 16384
    WriteBatchSize = 2048
  [Header.Syncer]
    TrustingPeriod = "336h0m0s"
  [Header.Server]
    WriteDeadline = "8s"
    ReadDeadline = "1m0s"
    RangeRequestTimeout = "10s"
  [Header.Client]
    MaxHeadersPerRangeRequest = 64
    RangeRequestTimeout = "8s"

Investigation Points

  • Increase daser parallel workers count
  • Tune ConcurrencyLimit for network bandwidth utilization
  • Adjust BlockstoreCacheSize for memory usage
  • Review WriteBatchSize vs IOPS capacity
  • Evaluate BufferSize for throughput optimization

Impact

This significantly affects node operators who need to:

  • Relocate nodes
  • Perform full sync from scratch
  • Recover from data loss scenarios

Would be great to have the ability to increase/fine tune the DA node configuration parameters in such a way to match the hardware capacity for a faster synchronisation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions