Skip to content

Solana rpc continuously increases memory when using lz4 snapshot-archive-format #5689

Open
@amunt0

Description

@amunt0

Problem

Running solana with snapshot-archive-format lz4 causes a crash.

I am using the latest version: solana-cli 2.1.18 (src:f91c2fca; feat:3271415109, client:Agave)

In short whenever I try to use lz4 instead of default the memory and disk get overloaded no matter what configuration I have.

I have tried:

  • nvmes in raid 0
  • nvmes in raid 10

22 cores cpu and 178 gb memory
44 cores cpu and 346gb memory as in the print screen.

my configs:

cat /etc/systemd/system/solana-validator.service
[Unit]
Description=Solana Validator
After=network.target
After=luks-unlock.service

[Service]
Type=simple
User=root
LimitNOFILE=1000000
LimitMEMLOCK=infinity
Environment="SOLANA_METRICS_CONFIG="
Environment="RUST_BACKTRACE=1"
Environment="RUST_LOG=info"
ExecStart=/root/.local/share/solana/install/active_release/bin/agave-validator \
  --ledger /mnt/solana-ledger/validator-ledger \
  --accounts /mnt/solana-ledger/validator-ledger/accounts \
  --snapshots /mnt/solana-ledger/validator-snapshots \
  --identity /mnt/solana-ledger/validator-identity.json \
  --entrypoint entrypoint.mainnet-beta.solana.com:8001 \
  --entrypoint entrypoint2.mainnet-beta.solana.com:8001 \
  --entrypoint entrypoint3.mainnet-beta.solana.com:8001 \
  --entrypoint entrypoint4.mainnet-beta.solana.com:8001 \
  --entrypoint entrypoint5.mainnet-beta.solana.com:8001 \
  --expected-genesis-hash 5eykt4UsFv8P8NJdTREpY1vzqKqZKvdpKuc147dw2N9d \
  --known-validator 7Np41oeYqPefeNQEHSv1UDhYrehxin3NStELsSKCT4K2 \
  --known-validator GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ \
  --known-validator DE1bawNcRJB9rVm3buyMVfr8mBEoyyu73NBovf2oXJsJ \
  --known-validator CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S \
  --rpc-port 8899 \
  --log /var/log/solana-ledger/mainnet/validator.log \
  --limit-ledger-size \
  --wal-recovery-mode skip_any_corrupted_record \
  --rpc-bind-address 0.0.0.0 \
  --private-rpc \
  --full-rpc-api \
  --snapshot-archive-format lz4 \
  --maximum-local-snapshot-age 500 \
  --snapshot-interval-slots 500 \
  --account-index program-id \
  --account-index spl-token-owner \
  --accounts-db-cache-limit-mb 10240 \
  --dynamic-port-range 8000-8020

 
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target




cat /etc/sysctl.d/99-solana-network.conf
net.core.rmem_max=134217728
net.core.rmem_default=134217728
net.core.wmem_max=134217728
net.core.wmem_default=134217728
net.ipv4.tcp_rmem=4096 131072 134217728
net.ipv4.tcp_wmem=4096 16384 134217728
net.core.netdev_max_backlog=300000

with lz4:
Image

with default archive format:
Image

Proposed Solution

Unfortunately I am forced to move back to clasic snapshot-archive-format .

I don't have an exact error and I believe based on those metrics that the data keeps accumulating until it crashes any instance.

I have the logs saved on my local in case someone wants to follow up on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions